NPR Tutorial >> Examples | TOC |
We demonstrate the interaction of two
TCP flows going through a 300 Mbps bottleneck link
(port 4 of the left NPR).
This is the same configuration used in the first example in
Filters, Queues and Bandwidth.
But unlike UDP, TCP periodically sends ACK packets back to the senders
(e.g., from n2p2 to n1p2 through the 2.1-1.4 link).
Example A demonstrates the simple case where there are no extra
delays between iperf TCP senders and receivers.
Example B demonstrates how to add an artificial delay using
the delay plugin.
The new concepts demonstrated by this example include:
This example assumes that you have gone through the Filters, Queues and Bandwidth example which showed how filters were used to direct UDP traffic to a bottleneck link. As before, we use two shell scripts to start TCP senders and receivers. You can get these by copying them from the directory /users/onl/export/Examples/ into the directory on the ONL user host where you store your executables and shell scripts. Follow the same procedure for getting the shell scripts described in Filters, Queues and Bandwidth replacing the script names with those shown below:
File | Description |
---|---|
trcvrs-2npr | Start iperf TCP servers (receivers) on hosts n2p2 and n2p3 |
tsndrs-2npr | Start iperf TCP clients (senders) on hosts n1p2 and n1p3 |
The example has two parts:
Packets from TCP flows contend for the bottleneck output link in a FIFO manner.
Install a delay plugin at port 2.1 that delays ACK packets returning to the sending host. Bandwidth displays show the outlines of a standard TCP Reno behavior where TCP attempts to fill the transmission pipe during slow-start and repeatedly probes for bandwidth by linearly increasing its send window and backing off when it encounters packet loss.
We use the same dumbbell configuration and monitoring as in Example A except that we modify the filters to match on any protocol (*). Our setup will use the same forward path and use the reverse path for ACK packets sent by the receivers to the senders; e.g., ACK packets from both n2p2 and n2p3 will go to port 2.1, then out the link to port 1.4, and finally to the senders. The three charts will look different than those for UDP flows because of TCP's slow-start and congestion avoidance algorithms.
The main steps in Example A are:
We will start with the configuration file created in the preceding example and modify it by modifying the filters at ports 1.2 and 1.3 and adding filters at ports 2.2 and 2.3 for the ACK packets. We will purposely change the protocol fields of the filters to match any protocol so that ping packets will take the same path as the TCP packets, allowing us to verify the delay in Example B.
client> cp udp.exp tcp.exp
We will change the filters at ports 1.2 and 1.3 to match on any protocol, and then commit and save the configuration.
The Tables window for port 1.2 will appear with the old settings.
This procedure generally follows the same one for the UDP example except that we use the trcvrs-2npr shell script. The script will start an iperf TCP server process on each of the n2p2 and n2p3 hosts.
remote> ssh onl.arl.wustl.edu # any SSH client tool will do onlusr> trcvrs-2npr
Note: The TCP iperf servers will start running and listen for TCP iperf traffic on TCP port 5001.
What's Happening? The trcvrs-2npr script looks like this (with line numbers added):This script is almost identical to urcvrs-2npr except that there is no UDP flag (-u) and the window-size (-w) has been set to 16 MB instead of 1 MB. The window-size must be set to atleast the bandwidth-delay product if you don't want flow control (receiver buffer size) to restrict your effective transmission rate. 16 MB is much larger than is needed for either Example A or B.1 #!/bin/sh 2 source /users/onl/.topology # get defs of external interfaces 3 for host in $n2p2 $n2p3; do # start udp receivers 4 ssh $host /usr/local/bin/iperf -s -w 16m & 5 done
onlusr> source ~onl/.topology # use .topology.csh if running a c-shell onlusr> ssh $n2p2 ps -lC iperf # "ssh $n2p2 pgrep -l iperf" for less output onlusr> ssh $n2p3 ps -lC iperf
Now that the two TCP receivers are running, start sending traffic from the two senders at n1p2 and n1p3 using the tsndrs-2npr shell script. The procedure is the same as in the UDP example except that we now use the tsndrs-2npr shell script.
tsndrs-2npr
The bandwidth chart shows that the second sender starts about 1 second after the first one has started (dashed fuschia 2.3 line). It also shows that the n1p2 sender reaches a transmission rate of about 300 Mbps almost immediately after it starts (blue 1.2 line), and n2p2 also receives at 300 Mbps (red 2.2 line). On the other hand, sender n2p3 begins to receive traffic at no more than 15 Mbps (dashed fuschia 2.3 line); i.e., its first slow-start period ends almost immediately after it starts. But n2p3 doesn't get much traffic until about five seconds later when it finally gets about 200 Mbps and n2p2 is getting about 100 Mbps (red 2.3 line). After that there is some traffic bandwidth variation as both flows jocky for the link at port 1.4.
The queue length chart shows that queue 64 has packets during the congestion period [1498, 1530] when both flows are active. Furthermore, queue 64 peaks at about 1.5 MB (the queue threshold) at around times 1516 and 1524.
Watch Out!!! Shouldn't the chart show the queue reaching 1.5 MB near the start of the first flow (around time 1500)?
Yes ... if we were able to observe the queue continuously. But the queue length counter is read at the polling period ( once per second, and the queue is near its capacity only for a brief period of time at around time 1500.
At around time 1500, the Queue Manager does drop over 1,000 packets indicating that the queue did overflow during the first slow-start period even though the queue length chart did not show a queue length near 1.5 MB. The chart also shows that a small number of packet drops also occurred around times 1,508, 1,516, and 1,524. The last two times correspond to the times of maximum peak queue length in the queue length chart.
We run the two TCP flows for 60 seconds (40 seconds longer than before) to see if the behavior of the two flows changes.
The bandwidth chart continues to show the attempt by both flows to acquire more bandwidth although they never do attain equal bandwidth sharing.
The queue length chart looks like the one in our 20-second experiment except that around time 1,670, the queue length drops to about half of its maximum.
The number of packet drops is only 1 or 2 packets as the TCP flows probe for additional bandwidth. Except for the first packet drop at time 1,650, the time of the packet drops correspond to the times shown in the queue length when the queue overflows. The first packet drop is probably due to a slow-start period (note the exponential way that the queue length increases at time 1,650).
We delay ACK packets by 20 milliseconds by installing a delay plugin at NPR 2 and installing filters at ports 2.2 and 2.3 to direct packets to the plugin. We also monitor the number of ACK packets being dropped at port 2.1. Here are a couple of things to keep in mind:
The main steps in Example B are:
Removing the stagger will reduce the chance that late sender will immediately find a full queue when it starts. The script should now look like this where the "sleep 1" command has been commented out and the time has been changed to 60:
1 #!/bin/sh 2 source /users/onl/.topology # get defs of external interfaces 3 ssh $n1p2 /usr/local/bin/iperf -c n2p2 -w 16m -t 60 & 4 ### sleep 1 5 ssh $n1p3 /usr/local/bin/iperf -c n2p3 -w 16m -t 60 &
Here are the mini-steps involved in installing a plugin that will delay all packets (including ACK packets) by 20 msec:
- Install a delay plugin on plugin microengine (ME) 0.
- Verify that we can communicate with the plugin by sending it commands to read its main internal counters.
- Change the default delay of 50 msec to 20 msec by sending it the command "delay= 20."
The Plugins window appears.
A delay plugin entry appears with a default value in the microengine field.
The two main commands for reading the plugin's internal counters are =delay and =counts. When the delay plugin is loaded, the delay is set to 50 msec and all of its counters are zeroed. The =counts command reads the three main counters:
- npkts: Total number of packets seen by the plugin
- maxinq: The maximum number of packets in the delay queue
- ndrops: The number of packets dropped by the plugin
A send comamnd window appears.
The convention followed by the delay plugin for command names is that commands that read from plugin variables begin with the equal character (e.g., =delay, =counts). While those that write to plugin variables end with the equal character.
We change the size of queue 64 at port 1.4 to be equal to 750,000 bytes, the bandwidth-delay product, and we configure queue 64 at port 2.1 for the ACK packets.
We have left the port rate at its default value of 1 Gbps.
We install filters at ports 2.2 and 2.3 to direct all packets to the delay plugin and then on to reserved queue 64 at port 2.1. The plan is use queue 64 so that we can verify that no ACK packets are getting dropped at port 2.1. The steps are similar to those used to install filters at ports 1.2 and 1.3 earlier. The primary difference is that you need to direct the packets to the plugin instead of just an output port.
The output plugins entry indicates which plugin ME to send matching packets.
Repeat the above steps after opening up the Add Filter window at port 2.3.
Ping packets from either sender (n1p2 or n1p3) to a receiver (n2p2 or n2p3) should show RTTs of just over 20 msec. Also, the plugin counters should indicate a total of five packets passing through the plugin and a maximum at any time of one packet in the delay queue.
The ping output shows that the RTT between n1p2 and n2p2 is 20.1 msec.
The procedure is similar to what we did for queue 64 at port 1.4 earlier.
The procedure is similar to what we did for port 1.4 earlier.
Use the version of tsndrs-2npr that run the senders for 60 sec. Follow the same procedure as in Example A for starting the two TCP senders from the onlusr host.
Both flows use the slow-start algorithm as they attempt to find the available bandwidth starting around time 2,304 (solid black line). At this time, queue 64 overflows, and the Queue Manager at port 1.4 drops about 998 packets. A small number of drops occur at times 2,315, 2,328 and 2,338. No drops occur at port 2.1 (dashed blue line).
The 2.2 and 2.3 charts (solid red and dashed fuscia lines) indicate that both receivers start receiving traffic around time 2,304 but don't get much traffic until around time 2,313 (9 seconds later) when both start receiving about 150 Mbps of traffic. The 9 seconds that it takes the senders to recover from the large number of packet drops is not unusually since the initial retransmission timeout is probably 3 seconds.
Around time 2,328, the flow to n2p2 starts to get more bandwidth than the flow to n2p3.
The chart shows that queue 64 at port 1.4 is occassionally reaching its capacity of 750,000 bytes.
There are two features of the chart that may seem odd:With regard to the first oddity, it may be that the non-empty queue during slow-start was too short to be detected in a one second monitoring period. The second oddity may just be due to the queue draining while TCP is in fast recovery rather than additive increase.
- The curve for 1.4 queue 64 is 0 at the beginning of the two flows (around time 2,304); and
- The curve for 1.4 queue 64 is also 0 around time 2,338 when there are a small number of packets dropped but with apparently no affect on the traffic bandwidths at ports 2.2 and 2.3.
The charts covering the end of the 60-second transmission period show features that are similar to the period following the intial slow-start period.
There are packet drops occassionally, but there are only a few packet drops at three time periods.
The traffic bandwidth to the two receivers (charts 2.2 and 2.3) continue to converge toward an equal share (150 Mbps) of the 300 Mbps link capacity.
The queue length chart for queue 64 at port 1.4 continues to display a jagged appearance.
We send the plugin the =counts command to verify that the maximum size of the delay queue is the bandwidth-delay product.
The procedure is identical to the one used in Filters, Queues and Bandwidth and is repeated below.
onlusr> source /users/onl/.topology # .topology.csh if using a c-shell onlusr> ssh $n2p2 pkill iperf # kill all iperf on $n2p2 onlusr> ssh $n2p3 pkill iperf # kill all iperf on $n2p3
Repeat Example B but with three flows instead of two after connecting hosts to ports 1.1 and 2.4. What is the effect of this configuration change?
See if you can modify the configuration in Example B to include the following changes:
Add a 30 msec delay to packets before they reach queue 64 at port 1.4.
Have ACK packets return from NPR 2 through a link between ports 2.4 and 1.1.
Change the monitoring to verify that the ACK packets are really returning over the 2.4-1.1 link.
Revised: Wed, Oct 29, 2008
NPR Tutorial >> Examples | TOC |