ONL NPR Tutorial

The ONL NPR Tutorial

NPR Tutorial >> Examples

TOC

TCP With Delay

We demonstrate the interaction of two TCP flows going through a 300 Mbps bottleneck link (port 4 of the left NPR). This is the same configuration used in the first example in Filters, Queues and Bandwidth. But unlike UDP, TCP periodically sends ACK packets back to the senders (e.g., from n2p2 to n1p2 through the 2.1-1.4 link). Example A demonstrates the simple case where there are no extra delays between iperf TCP senders and receivers. Example B demonstrates how to add an artificial delay using the delay plugin.

The new concepts demonstrated by this example include:

Using filters to direct TCP packets to a reserved queue
Using the iperf command to generate TCP traffic
Installing, configuring and communicating with the delay plugin

Preparation

This example assumes that you have gone through the Filters, Queues and Bandwidth example which showed how filters were used to direct UDP traffic to a bottleneck link. As before, we use two shell scripts to start TCP senders and receivers. You can get these by copying them from the directory /users/onl/export/Examples/ into the directory on the ONL user host where you store your executables and shell scripts. Follow the same procedure for getting the shell scripts described in Filters, Queues and Bandwidth replacing the script names with those shown below:

File	Description
trcvrs-2npr	Start `iperf` TCP servers (receivers) on hosts n2p2 and n2p3
tsndrs-2npr	Start `iperf` TCP clients (senders) on hosts n1p2 and n1p3

The example has two parts:

Two TCP flows share reserved queue 64 at port 1.4
Packets from TCP flows contend for the bottleneck output link in a FIFO manner.
Add a delay plugin to delay ACK packets
Install a delay plugin at port 2.1 that delays ACK packets returning to the sending host. Bandwidth displays show the outlines of a standard TCP Reno behavior where TCP attempts to fill the transmission pipe during slow-start and repeatedly probes for bandwidth by linearly increasing its send window and backing off when it encounters packet loss.

Example A: Two TCP flows share reserved queue 64 at port 1.4

Steps (In Brief)

We use the same dumbbell configuration and monitoring as in Example A except that we modify the filters to match on any protocol (*). Our setup will use the same forward path and use the reverse path for ACK packets sent by the receivers to the senders; e.g., ACK packets from both n2p2 and n2p3 will go to port 2.1, then out the link to port 1.4, and finally to the senders. The three charts will look different than those for UDP flows because of TCP's slow-start and congestion avoidance algorithms.

The main steps in Example A are:

Start with the UDP configuration file
Modify the new configuration file
Start the two TCP receivers
Start the two TCP senders
Run the TCP senders longer

Steps (In Detail)

Start with the UDP configuration file

We will start with the configuration file created in the preceding example and modify it by modifying the filters at ports 1.2 and 1.3 and adding filters at ports 2.2 and 2.3 for the ACK packets. We will purposely change the protocol fields of the filters to match any protocol so that ping packets will take the same path as the TCP packets, allowing us to verify the delay in Example B.

Create a new configuration file from the udp.exp configuration file by making a copy.

client> cp udp.exp tcp.exp

Modify the new configuration file
We will change the filters at ports 1.2 and 1.3 to match on any protocol, and then commit and save the configuration.
- Start the RLI.
- Open the tcp.exp configuration file.
- RLI Window: Select Port 1.2 => Configuration => Filter Table.
  The Tables window for port 1.2 will appear with the old settings.
- Port 1.2 Filter Table Window: Select Edit => Add Filter.
- Add Filter Window: Change the filter at port 1.2 to match on any protocol ( * ), and check that the other fields are correct.
  - protocol: Select *
  - port/plugin selection: Select port only
  - output ports: 4
  - qid: 64
  - Select Add
- Repeat for the filter at port 1.3.
- Commit and save the configuration.
Start the two TCP receivers
This procedure generally follows the same one for the UDP example except that we use the trcvrs-2npr shell script. The script will start an iperf TCP server process on each of the n2p2 and n2p3 hosts.
- To run the trcvrs-2npr shell script, SSH to onl.arl.wustl.edu, and enter the command trcvrs-2npr.
- You can verify that iperf is running on the two hosts by enterring:
```
onlusr> source ~onl/.topology      # use .topology.csh if running a c-shell
onlusr> ssh $n2p2 ps -lC iperf     # "ssh $n2p2 pgrep -l iperf" for less output
onlusr> ssh $n2p3 ps -lC iperf
```
Start the two TCP senders
Now that the two TCP receivers are running, start sending traffic from the two senders at n1p2 and n1p3 using the tsndrs-2npr shell script. The procedure is the same as in the UDP example except that we now use the tsndrs-2npr shell script.

In another command window, launch iperf TCP senders on hosts n1p2 and n1p3 by running the tsndrs-2npr in the foreground:
```
tsndrs-2npr
```
If the two iperf clients have successfully connected to their servers, you should see in the receiver window that each server has gotten a connection on port 5001.
Unlike the UDP traffic example, the script starts the second sender just one second after the first one has started (instead of eight seconds as in the UDP example).

[[ bandwidth-300Mbps-0ms-20s.png Figure ]]

The bandwidth chart shows that the second sender starts about 1 second after the first one has started (dashed fuschia 2.3 line). It also shows that the n1p2 sender reaches a transmission rate of about 300 Mbps almost immediately after it starts (blue 1.2 line), and n2p2 also receives at 300 Mbps (red 2.2 line). On the other hand, sender n2p3 begins to receive traffic at no more than 15 Mbps (dashed fuschia 2.3 line); i.e., its first slow-start period ends almost immediately after it starts. But n2p3 doesn't get much traffic until about five seconds later when it finally gets about 200 Mbps and n2p2 is getting about 100 Mbps (red 2.3 line). After that there is some traffic bandwidth variation as both flows jocky for the link at port 1.4.

[[ queue-length-300Mbps-0ms-20s.png Figure ]]

The queue length chart shows that queue 64 has packets during the congestion period [1498, 1530] when both flows are active. Furthermore, queue 64 peaks at about 1.5 MB (the queue threshold) at around times 1516 and 1524.
Watch Out!!! Shouldn't the chart show the queue reaching 1.5 MB near the start of the first flow (around time 1500)?
Yes ... if we were able to observe the queue continuously. But the queue length counter is read at the polling period ( once per second, and the queue is near its capacity only for a brief period of time at around time 1500.

At around time 1500, the Queue Manager does drop over 1,000 packets indicating that the queue did overflow during the first slow-start period even though the queue length chart did not show a queue length near 1.5 MB. The chart also shows that a small number of packet drops also occurred around times 1,508, 1,516, and 1,524. The last two times correspond to the times of maximum peak queue length in the queue length chart.

Run the TCP senders longer

We run the two TCP flows for 60 seconds (40 seconds longer than before) to see if the behavior of the two flows changes.

Modify the tsndrs-2npr script by changing the iperf run time from 20 seconds to 60 seconds.
Run the tsndrs-2npr script again from the onlusr host.

[[ bandwidth-300Mbps-0ms-60s.png Figure ]]

The bandwidth chart continues to show the attempt by both flows to acquire more bandwidth although they never do attain equal bandwidth sharing.

[[ queue-length-300Mbps-0ms-60s.png Figure ]]

The queue length chart looks like the one in our 20-second experiment except that around time 1,670, the queue length drops to about half of its maximum.

The number of packet drops is only 1 or 2 packets as the TCP flows probe for additional bandwidth. Except for the first packet drop at time 1,650, the time of the packet drops correspond to the times shown in the queue length when the queue overflows. The first packet drop is probably due to a slow-start period (note the exponential way that the queue length increases at time 1,650).

We will not terminate the iperf TCP receivers since we will need them in Example B.

Example B: Add a delay plugin to delay ACK packets

We delay ACK packets by 20 milliseconds by installing a delay plugin at NPR 2 and installing filters at ports 2.2 and 2.3 to direct packets to the plugin. We also monitor the number of ACK packets being dropped at port 2.1. Here are a couple of things to keep in mind:

You can not create a plugin until after the NPR that will have the delay plugin has been committed.
The list of available predefined plugins is not shown until after the NPR commit.

The main steps in Example B are:

Modify the sender script to have no stagger and 60 seconds of traffic
Install a delay plugin on NPR 2
Configure queue 64 at ports 1.4 and 2.1
Install filters at ports 2.2, and 2.3
Verify packets are getting delayed
Add queue 64, port 2.1 to the queue length chart
Add packet drops to the drops chart
Start the two TCP senders
Get counts from the delay plugin
Terminate the TCP receivers

Steps (In Detail)

Modify the sender script to have no stagger and 60 seconds of traffic

Removing the stagger will reduce the chance that late sender will immediately find a full queue when it starts. The script should now look like this where the "sleep 1" command has been commented out and the time has been changed to 60:

1   #!/bin/sh
2   source /users/onl/.topology    # get defs of external interfaces
3   ssh $n1p2 /usr/local/bin/iperf -c n2p2 -w 16m -t 60 &
4   ### sleep 1
5   ssh $n1p3 /usr/local/bin/iperf -c n2p3 -w 16m -t 60 &

Install a delay plugin on NPR 2

Here are the mini-steps involved in installing a plugin that will delay all packets (including ACK packets) by 20 msec:

Install a delay plugin on plugin microengine (ME) 0.

Verify that we can communicate with the plugin by sending it commands to read its main internal counters.

Change the default delay of 50 msec to 20 msec by sending it the command "delay= 20."

Install a delay plugin on ME 0.

RLI Window:
Select NPR.2 => Plugin Table

The Plugins window appears.

Plugins Window:
Select Edit => Add Plugin => delay

A delay plugin entry appears with a default value in the microengine field.

Plugins Window:
Change the value in the microengine field to 0.

Read the main delay plugin counters
The two main commands for reading the plugin's internal counters are =delay and =counts. When the delay plugin is loaded, the delay is set to 50 msec and all of its counters are zeroed. The =counts command reads the three main counters:
- npkts: Total number of packets seen by the plugin
- maxinq: The maximum number of packets in the delay queue
- ndrops: The number of packets dropped by the plugin

Plugin Window:
Select Edit => Send Command to Plugin

A send comamnd window appears.

Send Command Window:
Enter 0 in the microengine box; Enter =delay in the message box; and Select Enter.

A Plugin Command Log window appears with a message indicating that the delay is 50 msec.

Plugin Window:
(Not shown) Select Edit => Send Command to Plugin

Send Command Window:
Enter 0 in the microengine box; Enter =counts in the message box; and Select Enter.

Plugin Command Log window: A message shows that all three counter values are 0.

Change the delay from 50 msec to 20 msec
The convention followed by the delay plugin for command names is that commands that read from plugin variables begin with the equal character (e.g., =delay, =counts). While those that write to plugin variables end with the equal character.

Send Command Window:
Enter 0 in the microengine box; Enter delay= 20 in the message box; and Select Enter.

Plugin Command Log window: A message shows that the delay has been set to 20 msec.

Commit and save the configuration

Configure queue 64 at ports 1.4 and 2.1

We change the size of queue 64 at port 1.4 to be equal to 750,000 bytes, the bandwidth-delay product, and we configure queue 64 at port 2.1 for the ACK packets.

Port 1.4 Queue Table:
Change the queue descriptor for queue 64 so that the threshold is 750000 bytes.

Port 2.1 Queue Table:
Select Edit => Add Queue.
Edit the new queue descriptor so that the queue id is 64 and the threshold is 50000 bytes.
Set the quantum to 1500.
RLI Window:
Select File => Commit.
RLI Window:
Select File => Save to File, and save the configuration to a file.

We have left the port rate at its default value of 1 Gbps.

Install filters at ports 2.2, and 2.3

We install filters at ports 2.2 and 2.3 to direct all packets to the delay plugin and then on to reserved queue 64 at port 2.1. The plan is use queue 64 so that we can verify that no ACK packets are getting dropped at port 2.1. The steps are similar to those used to install filters at ports 1.2 and 1.3 earlier. The primary difference is that you need to direct the packets to the plugin instead of just an output port.

RLI Window:
Select Port 2.2 => Configuration => Filter Table.

(Not Shown) Filter Table Window: Select Edit => Add Filter.

[[ add-filter-dialogue-plugin.png Figure ]]

Add Filter Window: Add or change the fields shown to the right.
- protocol: Select *
- port/plugin selection: Select plugin only
- output ports: Enter 1
- output plugins: Enter 0
- qid: Enter 64
- Select Add

The output plugins entry indicates which plugin ME to send matching packets.

Install the same filter at port 2.3
Repeat the above steps after opening up the Add Filter window at port 2.3.

Verify packets are getting delayed

Ping packets from either sender (n1p2 or n1p3) to a receiver (n2p2 or n2p3) should show RTTs of just over 20 msec. Also, the plugin counters should indicate a total of five packets passing through the plugin and a maximum at any time of one packet in the delay queue.

Send five ping packets from n1p2 to n2p2

The ping output shows that the RTT between n1p2 and n2p2 is 20.1 msec.

Read the main plugin counter values.

Plugin Window:
Select Edit => Send Command to Plugin
Send Command Window:
Enter 0 in the microengine box; Enter =counts in the message box; and Select Enter.
Plugin Command Log window: A message shows that five (5) packets passed through the plugin, there was atmost one (1) packet in the delay queue, and zero (0) packets were dropped.

Send a reset command to the plugin to zero the plugin counters.

Add queue 64, port 2.1 to the queue length chart

The procedure is similar to what we did for queue 64 at port 1.4 earlier.

Queue Length Chart:
Select Parameter => Add Parameter, and select Enter.
RLI Window:
Select Port 2.1 => Monitoring => ReadQLength
Add Parameter Dialogue Box:
Enter 64 into the queue_id box, and select Enter
Change the ReadQLength label to 2.1 queue 64, and the units to MB.

Add packet drops to the drops chart

The procedure is similar to what we did for port 1.4 earlier.

Drops Chart:
Select Parameter => Add Parameter, and select Enter.
RLI Window:
Select NPR.2 => Monitoring => ReadRegisterPacket
Add Parameter Window:
Enter 31 in the index window, and select Enter.
QM Drops Window:
Drops Chart:
Open the ReadRegisterPacket dialogue box by clicking on the ReadRegisterPacket legend label.
ReadRegisterPacket Dialogue Box:
Change the name from ReadRegisterPacket to Drops NPR2 and check the rate box to switch from absolute value.
RLI Window: File => Save

Start the two TCP senders

Use the version of tsndrs-2npr that run the senders for 60 sec. Follow the same procedure as in Example A for starting the two TCP senders from the onlusr host.

[[ drops-300Mbps-20ms-60s-begin.png Figure ]]

Both flows use the slow-start algorithm as they attempt to find the available bandwidth starting around time 2,304 (solid black line). At this time, queue 64 overflows, and the Queue Manager at port 1.4 drops about 998 packets. A small number of drops occur at times 2,315, 2,328 and 2,338. No drops occur at port 2.1 (dashed blue line).

[[ bandwidth-300Mbps-20ms-60s-begin.png Figure ]]

The 2.2 and 2.3 charts (solid red and dashed fuscia lines) indicate that both receivers start receiving traffic around time 2,304 but don't get much traffic until around time 2,313 (9 seconds later) when both start receiving about 150 Mbps of traffic. The 9 seconds that it takes the senders to recover from the large number of packet drops is not unusually since the initial retransmission timeout is probably 3 seconds.
Around time 2,328, the flow to n2p2 starts to get more bandwidth than the flow to n2p3.

[[ queue-length-300Mbps-20ms-60s-begin.png Figure ]]

The chart shows that queue 64 at port 1.4 is occassionally reaching its capacity of 750,000 bytes.
There are two features of the chart that may seem odd:

The curve for 1.4 queue 64 is 0 at the beginning of the two flows (around time 2,304); and

The curve for 1.4 queue 64 is also 0 around time 2,338 when there are a small number of packets dropped but with apparently no affect on the traffic bandwidths at ports 2.2 and 2.3.

With regard to the first oddity, it may be that the non-empty queue during slow-start was too short to be detected in a one second monitoring period. The second oddity may just be due to the queue draining while TCP is in fast recovery rather than additive increase.

The charts covering the end of the 60-second transmission period show features that are similar to the period following the intial slow-start period.

[[ drops-300Mbps-20ms-60s-end.png Figure ]]

There are packet drops occassionally, but there are only a few packet drops at three time periods.

[[ bandwidth-300Mbps-20ms-60s-end.png Figure ]]

The traffic bandwidth to the two receivers (charts 2.2 and 2.3) continue to converge toward an equal share (150 Mbps) of the 300 Mbps link capacity.

[[ queue-length-300Mbps-20ms-60s-end.png Figure ]]

The queue length chart for queue 64 at port 1.4 continues to display a jagged appearance.

Get counts from the delay plugin

We send the plugin the =counts command to verify that the maximum size of the delay queue is the bandwidth-delay product.

[[ counts-300Mbps-20ms-60s-end.png Figure ]]

The output shows that maxinq is 504 packets. Since each packet is about 1,500 bytes, the delay queue held atmost about 750,000 bytes; i.e., the expected bandwidth-delay product. The output also shows that the plugin processed 648,656 packets or about 975,000 bytes (assuming 1,500-byte packets). This number includes retransmitted packets and can be compared with the output from iperf which only excludes retransmissions.

Terminate the TCP receivers

The procedure is identical to the one used in Filters, Queues and Bandwidth and is repeated below.

onlusr> source /users/onl/.topology    # .topology.csh if using a c-shell
onlusr> ssh $n2p2 pkill iperf          # kill all iperf on $n2p2
onlusr> ssh $n2p3 pkill iperf          # kill all iperf on $n2p3

Test Your Understanding

A Simple Configuration Change

Repeat Example B but with three flows instead of two after connecting hosts to ports 1.1 and 2.4. What is the effect of this configuration change?

A New Configuration

See if you can modify the configuration in Example B to include the following changes:

Delay in the forward path

Add a 30 msec delay to packets before they reach queue 64 at port 1.4.

Different ACK packet path

Have ACK packets return from NPR 2 through a link between ports 2.4 and 1.1.

Monitoring Changes

Change the monitoring to verify that the ACK packets are really returning over the 2.4-1.1 link.

Repeat the experiment. Do the results make sense?

Revised: Wed, Oct 29, 2008

NPR Tutorial >> Examples	TOC