Emulation of burst-based adaptive link rates in NetFPGA towards green networking

: In recent times, energy consumption in communication media has been increasing drastically. In the literature, energy-saving techniques that enable network devices to enter sleep state or limit the data rate have been proposed to reduce energy costs. In our earlier work, we proposed an energy-saving technique called burst-based adaptive link rate (BBALR), the simulation of which assures increased energy savings. In this paper, we have emulated the hardware implementation of BBALR and compared its performance with the outputs of other prominent energy-saving policies based on dynamic link rate adaption. The energy savings are mapped from the measured sleep time and reference power values. We have used NetFPGA as the testbed, which is a research platform for building real-time network hardware prototypes.


Introduction
The Global e-Sustainability Initiative (GeSI) has been researching on cutting down the CO2 emissions caused by the ICT sector towards building a sustainable society. The key points of the GeSI SMARTer 2030 report, which is related to our work, are summarized in this section. It is estimated that 12.8 Gt of CO2 emissions can be cut down by defining energy-efficient practices that the ICT sector needs to incorporate. Predictions say that the energy savings that could be obtained from the ICT sector will be higher than those of the other existing sectors. It could be observed that the energy consumption of the operation of ICT devices is 9.7 times the energy consumption of their manufacturing ( Figure 1). The ratio of operating costs to the footprint keeps increasing consistently on a scale of more than 1.5. The energy consumption of ICT is distributed across end-user devices, data centers, and networks with proportions of 47.2%, 28.8%, and 24%, respectively. By 2030, every end-device like IoT and mobile devices will have access to the Internet. Hence, the scope for saving energy costs of the networking plane will exist as a research challenge at device and data center levels. Through energy savings, society can witness better benefits by saving 330 trillion liters of water, etc. Other detailed explanations can be found in the GeSI reports [1,2].
We have gone through the surveys of green communications. In recent surveys [3], energy-efficient approaches have been classified into node-level and network-level techniques. For this work, we have confined our study to node-level techniques. Link rate adaption is a prominent technique applied at node level to save * Correspondence: shahulhameadh@ssn.edu.in   energy. Cutting-edge networking paradigms like Green-SDN [4] are undergoing integration of link rate adaption with flow-based routing. In the earlier surveys, techniques like reengineering the hardware design and dynamic link rates were forecast as the future of green networks [5,6]. Link rate adaption was proposed as an energysaving technique in NICs, switches, and routers in the last decade [5]. It is now being successfully incorporated in emerging networking paradigms.
In our work, we have emulated the real-time hardware implementation of the latest technique named burst-based adaptive link rate (BBALR). We have compared the performance with the results of the other techniques of adaptive link rate (ALR) and burst transmission (BTR).
The system is implemented on a hardware platform known as NetFPGA, which supports rapid prototyping of networking techniques [7,8]. We have experimented with the BBALR technique to observe the energy savings through the transmission of TCP packets between two hardware units. The benefits yielded through this work would help extend this idea to real routers and switches. The organization of the paper is as follows. Section 2 summarizes the techniques of dynamic link rates. Section 3 specifies the system model adopted for this work. Section 4 introduces the NetFPGA platform and discusses the design and the hardware emulation of the system. Section 5 compares the performance of the system with the existing techniques.

Related work
The line card determines the speed of a router and is therefore one of the major components in a router. However, it is also the most power-consuming component since it has packet processors. A router may contain one or more line cards. Dynamic link rates are implemented by imposing sleep on the line cards and activating them only when required. In this section, three techniques that dynamically change the states of line cards are briefly summarized.

Burst transmission techniques
In this section, we will cover the burst transmission techniques studied in [7,8]. This is a method that coalesces the packets and sends them as a burst instead of sending them one by one. While the packets are being collected in the buffer, line cards are put into sleep to save energy on the operation. Once the line cards are turned on, the collected packets are serviced as a burst. The energy is consumed only when packets are processed in active mode. In Figure 2, after an active period, sleep time starts at t s . At a regular interval, the refreshing of (t r ) happens. During this, the condition to switch to an active state is verified. If the constraints do not hold, the line card remains in a sleep state. Otherwise, wake-up (t w ) happens, which changes the state of the line card to be active. The condition to switch the line cards to the active state is based on three policies, namely burst-full, timeout, and hybrid. In all these policies all the line cards behave synchronously. i.e. they are in active mode together or in sleep mode together. Sleep mode is also called low power idle mode (LPI).

Burst-full Policy
The size of the burst is predefined with a threshold. When the line cards sleep, the received packets are stored in the buffer. For every arriving packet, the size of the queue is verified with the burst size threshold. Once the queue size meets the burst size, the line cards are woken up from sleep and switch their state to the active mode for servicing the burst. Once the packets are forwarded, the line card goes to sleep mode again to save energy.

Timeout policy
The previous policy would lead to unwanted delays if the traffic rate is slow. In this case, the burst will not be full, which would, in turn, postpone the wake-up event. There is a possibility that TCP would go for retransmission as the packets are not delivered to the destination within the stipulated timeout. To handle this scenario, a timeout value is set, which wakes up the line cards from sleep although the burst is not full.

Hybrid Policy
During high traffic, the burst would become full and the packets will get discarded if the timeout policy is imposed. The hybrid policy is a combination of burst-full and timeout policies. A hybrid policy verifies that at least one of the conditions is satisfied. The burst will get serviced either on burst-full or on timeout. Whichever one happens first will wake up the line cards from sleep.

Adaptive link rates
Adaptive link rates were discussed in [9][10][11]. In this policy, the link rates are determined by continuously monitoring the queue size. The number of line cards getting switched to active mode is dependent on the dynamic traffic rate. Assume a router has two line cards. A threshold can be fixed to determine the number of line cards required. If the queue size is less than the threshold, one line card can serve; otherwise, both the line cards can serve. In this policy, at least one line card should be in active mode all the time. Queue size is continuously observed for the change of link rates. If the traffic rate is inconsistent, the line cards undergo frequent switching between active and sleep states. A sudden spike of the link rate would process the packets faster and delete more packets in the queue at the same speed. This can again force the system to reduce the link rate as the queue size is less. Continuation of these operations with a single threshold in a cyclic manner would affect the stability of the system. To avoid this problem, a dual-threshold policy is used. In this policy, the thresholds are maintained as 0 < low threshold < high threshold < maximum queue length. A high link rate is preferred when the size of the queue reaches the high threshold. A low link rate is preferred only when the size of the queue is reduced to a low threshold. A spike from low to high deletes the packets in the queue in a faster manner but the stability of the system is preserved as the subsequent change of link rate happens only when the size of the queue is less than a low threshold. Whenever a line card is switched to an active state, an additional amount of energy is consumed, known as the switching cost. With the dual-threshold policy, the switching cost is reduced.

Burst-based adaptive link rates
Burst transmission techniques save assured energy by imposing forced sleep to all the line cards until the wakeup constraint becomes true, but all the line cards are switched to active mode although a high link rate is not required. Adaptive link rates have the advantage of turning on only the required number of line cards to an active state, but adaptive link rates are not intended for burst-based transmission that guarantees assured energy savings. Burst-based adaptive link rate is a hybrid technique that collectively ensures the benefit of both [12]. In this technique, all the line cards sleep, as happens in burst transmission. At the time of wake-up, a decision is made to determine the number of line cards that are required for the current burst. Instead of turning on all the line cards, only the required numbers are switched on. In this technique, a hybrid policy is preferred for wake-up, which is mentioned in Section 2.1.3. Burst-based adaptive link rates guarantee an assured sleep with appropriate policies at respective states. An old hybrid technique can also be seen in [13].

System model of BBALR
In this section, the system model of BBALR, whose hardware design is the crux of this work, is presented. The state diagram is depicted in Figure 3 The logical conditions for state transitions are outlined in Table 1 Initially, the system is in sleep state S 0 , where all the line cards observe sleep. During this state, the received packets are stored in the buffer. The listeners that identify the occurrence of burst-full and timeout states run concurrently when the system sleeps. When the current time t meets timeout t max , the system enters state S 1 . Alternatively, the current size of queue p might be the same as burst size p max , which makes the system enter state S 2 . In rare cases, both conditions may become true at the same time, which would lead the system to enter state S 3 . States S 1 , S 2 , and S 3 represent active states and S 0 represents the sleep state. Edges a 1 , a 2 , and a 3 represent wake-up events and b 1 , b 2 , and b 3 represent switch-off events. At S 0 , all the line cards sleep. The policy adopted in this state is low power idle (LPI). Since S 1 represents timeout, the queue size would not be the same as burst size. Hence, the adaptive link rate (ALR) policy is preferred at state S 1 . At S 2 , the burst-full state happens and the queue size must be equal to the burst size threshold. Hence, all the line cards are expected to be turned for a higher link rate (HLR). Since S 3 is an event for which both constraints hold, HLR is preferred because of the burst-full property. When the burst gets serviced fully, i.e. p = 0, the state transition happens to S 0 from the rest of the states, which resets the time as t = 0. Present state Logical conditions Action Policy followed s 0 t<t max and p<p max x 0 LPI s 0 t=t max and p<p max a 1 ALR s 0 t<t max and p=p max a 2 HLR s 0 t=t max and p=p max a 3 HLR s 1 t<=t max and p<p max

Energy consumption model
As The value of LUF is measured in the range of 0 to 1. Link rate ratio (LRR) is a measurement that depicts the number of times the other technique measures LUF to the LUF of BBALR. More information about the energy model of BBALR can be found in [12].

System design and hardware implementation
We have identified NetFPGA as a suitable platform to craft real-time implementation of the system. Several NetFPGA implementations have been studied from previous works [14][15][16][17][18][19][20][21][22][23][24][25]. The NetFPGA board is physically connected to the motherboard through the PCI express slot and it can be programmed with hardware logic. The software application from a higher layer can access the hardware through the NetFPGA device driver. In our implementation, we have used the NetFPGA 1G CML board, which has a Xilinx Kintex-7 FPGA, 32bit microcontroller, 4.5 MB SRAM, 512 MB DDR3 DRAM, and 4 Ethernet ports, namely NF0, NF1, NF2, and NF3. Figure 4 illustrates the high-level architecture of NetFPGA inspired by online content. 1   The implementation design (see Figure 5) encompasses modules at node and board levels. The packet generator module is implemented at the application level, which creates packets in PCAP format with the specified arrival rate λ , i.e. the packets are generated continuously with a mean arrival rate of 1/ λ . The generated packets are stored in the queue. The policy base contains energy-saving ideas that are defined as wake-up policies. Generally, policies are composed of queue and clock parameters. The grammar for a policy S is defined as follows: The system parameters are continuously compared with threshold values to verify that the policy constraints are true. A constraint is composed as c → p⋄ τ , where p ε P,⋄ ε R,τ ε Γ . A set of constraints is defined as C = {c1 , c2 , c3 ... }. A policy could be a single constraint or it could be composed of multiple, which is expressed as The wake-up policy for the burst-based adaptive link rate is as follows: Let t be the current time of the system. Let ρ be the current queue size. Let β be the burst-full threshold. Let Υ be the timeout threshold.
Let the burst-full constraint be C b = ( ρ ==β ). Let the timeout constraint be C t =(t== Υ). Let the policy of the system be S sys = C b ∨C t , i.e. wake-up happens when at least one of the constraints is true.
The listener module verifies whether the dynamic parameters of the system satisfy the constraints of policy. The NF Trigger module determines the number of ports to be selected for transmitting the packets. The link rate gets dynamically decided based on the policy that is imposed. In order to switch between low rates and high rate in the burst-based adaptive link rates policy, the number of packets in the burst is expected to be greater than the threshold and it can be less than the threshold for high to low. Once a wake-up happens, the packets are copied from the node memory to the NetFPGA memory. A portion of NetFPGA memory is reserved for the input arbiter in the reference NIC design. A module called Packet distributor transmits the packets from the Input arbiter through the selected NF ports in a cyclic manner.
The NetFPGA ports are directly connected to each other. Data transmission occurs through both the ports if a high link rate is preferred, or else only one port is used for transmission. The policy dynamically mandates the number of ports that should be used. The energy savings obtained through the BBALR policy between the two hardware units can be studied and analyzed for speculating energy savings in any larger topology.

Results and discussion
In this work, we have done the real-time implementations for burst-based adaptive link rate, burst transmission, and adaptive link rate policies. The comparison of the three is discussed in the following section. We have measured the performance of the implementation via various parameters. Though the objective is to measure energy savings, we have measured QoS parameters as well to demonstrate that the functional objective is not compromised. A couple of parameters measure energy whereas some of them measure QoS. A few parameters even act as a balancing trade-off between energy and QoS. The performance parameters and the context of the usage are given in Table 2.
The packet generator module is common for all the policies. It follows the Poisson process with a mean interarrival time (1/ λ ) where λ is the mean arrival rate. In our implementation, we have specified 1/ λ in microseconds. For high arrival rates, we have 1/ λ as low as 10 microseconds. For low traffic, 1/ λ gets delayed with a maximum value of 10000 microseconds. We have measured the performance of the system for high traffic, medium traffic, and low traffic. The mean arrival time of high traffic ranges from 10 to 100 microseconds with an increment of 10 microseconds at each measurement. Medium traffic's interarrival time ranges from 100 to 1000 microseconds with an increment of 100 microseconds. The low traffic ranges between 1000 and 10000 microseconds with an increment of 1000 microseconds at each measurement. In all these graphs (Figures 6,7,8,9,10,11,12,13,and 14), the x-axes represent the mean interarrival time in microseconds. The burst threshold for the hybrid technique has been set as 150 milliseconds, whereas, for others, it is limited to 30 milliseconds. Similarly, the timeout has been set as 50 milliseconds for the hybrid one and 10 milliseconds for the rest of the techniques.
The performance of the hybrid policy is compared with other policies. We have used acronyms in the graphs of BTR for burst transmission rates, ALR for adaptive link rates, and BBALR for burst-based adaptive link rate (hybrid) policies. Performance measurements in the graphs are depicted for 60 seconds. Parameters like throughput are measured directly from the input arbiter part of NetFPGA. Every parameter is divided into three columns of high traffic rate (a), medium traffic rate (b), and low traffic rate (c). Because of heavy traffic, the throughput is slightly unstable as the service gets delayed, whereas for medium and low traffic, the curve is stabilized to the exponential distribution. The throughput is dependent on the arrival rate, which follows the Poisson process. Hence, the interarrival time follows an exponential distribution, which could be observed from the graphs in Figure 6.

Energy
Wake-up count Reason for wake-up. The number of times timeout occurred vs. the number of times burst-full occurred. Frequent wake-up consumes more switching costs.

Energy and QoS
Projected power consumption (PPC) The major parameter discussed in Section 3.1. The reference watts are from Guo et al. [26] and are applied in active and sleep states to get the PPC. It is expressed in watt hours.

Stipulated energy savings
The percentage of the duration of sleep in the total implementation period.

Energy
Mean service length Mean duration of the active period in milliseconds to deliver the packets as the burst.

QoS
Link utilization factor (LUF) The ratio of usage of the ports to the total availability. It is discussed in Section 3.1.

Energy and QoS
Link rate ratio The ratio of LUF of existing technique vs. LUF of BBALR. It is discussed in Section 3.1.

Energy and QoS
In Figure 7, the mean number of packets in a queue follows exponential distribution if the waiting time is fixed. If the service happened due to a timeout event, the waiting times for different traffic rates remain the same. This can be seen with interarrival time greater than 400 microseconds. For high traffic rate, waiting time is reduced due to burst-full wake-ups, which in turn lead to a constant mean number of packets in the queue. This can be observed for mean arrival time less than 400 microseconds (Figure 8). Based on the policy and dynamic parameters of the system, the NF ports are selected for transmission. The selected NF ports are woken from sleep mode to transmit the packets waiting in queue. For low traffic rates, only one port is used. The dynamic usage of ports has been verified with the Wireshark tool.
The sleeping time is computed from the idle time of the NF ports ( Figure 9). As the mean arrival time between the packets gets delayed, the sleep time increases. Since energy savings are directly proportional to the amount of sleep, this can be understood as energy savings increasing with respect to the quantity of sleep. Due to the enforcement of compulsory sleeping time, the mean queue length and waiting time increase up to five times without compromising the throughput. It could be noted that throughput obtained for BBALR is nearly the same as that for the other techniques ( Figure 6).
Switching time (Figure 10) is the time to prepare the ports for transmission; therefore, it could be perceived as the time between idle mode and active mode. For better performance, the total switching time is expected to be less. Through our experiment, we can see that BBALR performs well with reduced switching count compared to ALR and BTR because of the higher payload per active period. ALR and BTR are measured with a lower mean number of packets in the queue and less waiting time, whereas BBALR undergoes delayed waiting time. On the other hand, because of the higher thresholds, BBALR saves more energy through increased sleep time than the other two techniques in spite of increased waiting time. Throughput is almost the same for all, as the packet generator is kept independent of the policies. In Figure 11 we have illustrated the total number of services after wake-ups. The BBALR policy is applied in the first row, ALR policy is applied in the second row, and BTR policy is applied in the third row. The policies are applied for high, medium, and low traffic rates from left to right. Because of the higher threshold values, which are suitable for energy savings, the  number of services is less for BBALR, whereas it is high for ALR and BTR. It could be observed that burst-full wake-ups dominate for high traffic rates and time-out wake-ups control the low traffic rates. For medium traffic rates, the initial control starts with burst-full, and in the middle (300-400), the wake-up control is transferred to the time-out constraint. This behavior has been consistently observed for all the policies through which there is a variation in the number of wake-ups.
The performance parameters for different policies are observed in Figure 12, in which the traffic rate varies in the logarithmic scale. As mentioned earlier, throughput is similar for all the policies. For high rates, it is consistent, and it drops slowly with respect to the increase in delay in mean arrival time (Figure 12a). A curve with similar throughput can be seen for the mean number of packets in the queue for BBALR because of energy-friendly thresholds (see Figure 12b). Other policies follow a reduced mean number of packets in the queue. The average waiting time (Figure 12c) is inversely proportional to the number of packets in the queue because of the aforementioned reason in Figure 7 and Figure 8. Switching time for the policies increases slowly and steadily, which follows a similar pattern. BBALR improves performance with reduced switching costs ( Figure 12d). The sleeping time, an energy-saving factor, starts with similar values for high arrival rates and gains more scope for energy conservation with medium and low arrival rates (Figure 12e).
NetFPGA implementations of the green reference router have been taken from [26,28]. We have traced the extent of the duration for which the NF ports are in active mode and sleep mode. Power consumption is a function that takes the length of the active period as the parameter. Energy savings is a function that takes the length of sleeping time as a parameter. From an earlier NetFPGA experiment, watts consumed for 2 ports, 1 port, and no ports at low and high frequencies are listed in Table 3 With the substitution of these aforementioned values, the power consumption for a time duration in watt-hours for different policies could be projected. We have scaled the power-related parameters for 24 hours duration. Similarly, stipulations about energy savings are possible from the mentioned values substituted for the duration of sleep time. Projected power consumption (PPC) is expressed as the sum of static, dynamic, and switching power costs of the components. From Figure 13 it can be observed that BBALR would consume less power than other policies. There is a difference of 40 watthours between the minimum power consumption of BBALR and other policies. Similarly, BBALR can save energy at rates double the other policies for low traffic rates.
The link utilization factor (LUF) is depicted in Figure 14b. Usually, the default link rate consumes more power as it does not follow energy policy. We have measured it through the ratio of the length of the active period to the cumulative length of the active and sleep period. As LUF depends on incoming traffic, the curve follows an exponential distribution. LUF is high for faster arrival rates and it drops for lower arrival rates. The amount of times power consumed by an existing policy for a particular traffic rate compared to the power consumed by BBALR for the same traffic rate can be understood as the link rate ratio. LRR (existing policy, BBALR) is the ratio of the LUF of an existing policy to the LUF of BBALR. LRR values can be found in Table 4. Through LRR values, it could be observed that LUFs of all the policies are the same for low traffic rates. There is a slight change in medium traffic. When comparing BBALR, the existing policies double its    utilization or energy cost for higher traffic rates. The results of the simulation [12] are based on the increase of difference in mean interarrival time whose units are seconds, whereas the results of this work are based on units in microseconds. Because of this change, PPC, LUF, and LRR vary in magnitude with similar behaviors.
Another important aspect of observation is the measurement of mean service length (see Figure 14a). It could be seen that BBALR is measured with longer service length than the other policies. Because of burst behavior, the service length is comparatively longer. Once the service is done, the device enters into sleep mode until the constraints of the policy become true, which in turn would benefit uninterrupted longer sleep. The ALR policy undergoes a short nap and experiences frequent switching because of its packet-based transmission policy. Though BTR supports smaller bursts, all the ports are unnecessarily switched to active mode even for low traffic rates, which in turn wastes energy instead of saving it during the active period. Hence, the sleeping quality is maintained in BBALR by skipping unwanted switching costs. Finally, the difference in output between simulation and implementation is discussed. The results of implementation are based on the interarrival times, whose units are measured in microseconds, whereas the simulation results are obtained for interarrival times set in seconds. The measures of energy parameters in the simulation remain constant for fast interarrival times of smaller than 2 seconds due to the constraints of the simulation environment. On the other hand, the QoS and energy measurements change proportionally for the implementation even for faster interarrival times of less than 100 microseconds. The variations in output parameters like PPC for BBALR can be seen in the graphs (Figures 12, 13, and 14). Due to the smaller reference power values, no significant variation could be observed in the PPC of the simulation for high traffic rates [12].
The PPC calculations in the simulation use reference voltage values for two cases, staying at the same link rate and switching between different link rates ( Table 5). The PPC calculations for emulation use the reference power values in Table 3 for the number of active ports instead of the changes in the link rates. As the reference power values used in implementation is 10 times larger than the values used in simulation and the experimental setup of the implementation uses FPGA, we have preferred using these values in the implementation of our technique. Due to the large reference valuations used in BBALR, variations in PPC can be witnessed.

Conclusion
The hardware implementation of energy-saving policies for data transmission in wired networks has been performed in the NetFPGA platform. A suitable reference NIC architecture has been programmed to the board. Policies are defined at a high level from the software layer. The performance parameters are measured with different traffic rates for all the policies including burst-based adaptive link rate, which has energy-friendly thresholds. From the experiments, it is observed that the burst-based adaptive link rate harvests cumulative benefit with a combination of extended sleep and reduced switching costs compared to other policies. The QoS measures are also well maintained in BBALR. The core values that impact the energy measurement through projected power consumption and link utilization factor are measured directly from the hardware. The results of emulation are more insightful than the results of simulation. As future work, the hardware emulation can be extended to a full-length prototype along with its formal verification.