ONL Tutorial

The ONL Tutorial

TOC

ONL Linux 2.4 TCP

Linux 2.4 TCP is NOT Reno (nor Vegas, nor Tahoe) and can not be made to be exactly like these other TCP flavors without substantial kernel mods. There are three primary references:

Linux 2.4 TCP is described in the paper Congestion Control in Linux TCP by Sarolahti and Kuznetsov;
Tuning Linux 2.4 TCP is described in the Web page Enabling High Performance Data Transfers; and
The TCP man page tcp(7).

But many of the Linux 2.4 features will have little effect on most TCP experiments. This page describes those features that might be most important for bulk transfer experiments in the ONL testbed.

The maximum socket buffer sizes have been set to 40 MB for both reading and writing, and the maximum per connection buffer sizes to 20 MB for both reading and writing. These values can be verified with:

	cat /proc/sys/net/core/[rw]mem_max	# max recv/send windows
	cat /proc/sys/net/ipv4/tcp_[rw]mem	# max TCP recv/send buffers

The SO_SNDBUF and SO_RCVBUF arguments to setsockopt() are bounded by one-half of the rmem_max and wmem_max values. You will notice this behavior when you use the -w flag in the iperf command. For example, "-w 4M" will result in a message indicating that the buffer size has been actually set to 8 MB (twice the requested 4 MB).

You can inspect the TCP tuning parameters either by examining the files /proc/sys/net/ipv4/tcp* using the cat or more commands or by the sysctl command. For example, to display all TCP parameters, try one of these two commands:

	more /proc/sys/net/ipv4/tcp*
	sysctl -a | grep tcp

All standard advanced TCP features are ON by default in the ONL testbed. Try one these commands:

	cat /proc/sys/net/ipv4/tcp_{timestamps,window_scaling,sack}
	sysctl net.ipv4.tcp_{timestamps,window_scaling,sack}

and you will discover that:

The timestamp option is ON
The window scaling option is ON
The SACK option is ON

The timestamp and sack options can be turned OFF for the duration of your experiment by entering the following commands:

	sudo /usr/local/bin/net/timestamps-off
	sudo /usr/local/bin/net/sack-off

They can be turned back ON by calling the complementary command (e.g., timestamps-on).

Very Long Delay Paths

If you are planning to experiment with a long-delay path, you should look at Yee-Ting Li's work. In short, there are low-level buffers that may be sized too small when you have long, fat pipes (delays of hundreds of milliseconds and Gbps rate). Unfortunately, Linux will silently drop packets when these buffers are full leaving no indication that you have lost packets at the endhost. But these parameters may have little utility with the delay plugin since it limits traffic to 200 Mbps. The following two commands allow you to increase the size of the receive and send buffers from their defaults.

sudo /usr/local/bin/net/net-max-backlog
sudo /usr/local/bin/net/ifconfig-txqueuelen

Conformance to IETF

Readers should consult the Sarolahti and Kuznetsov paper for a detailed discussion of IETF conformance. This section summarizes what we consider to be the important differences for bulk transfer experiments in the ONL testbed. The table below is a reproduction of the conformance table in the Sarolahti and Kuznetsov paper.

Specification	Status
RFC 1323 (Performance Extensions)	Same
RFC 2018 (SACK)	Same
RFC 2140 (TCP Control Block Sharing)	Same
RFC 2581 (Congestion Control)	Differs
RFC 2582 (New Reno)	Differs
RFC 2861 (Cwnd Validation)	Same
RFC 2883 (Duplicate SACK)	Same
RFC 2988 (RTO)	Differs
RFC 3042 (Limited Transmit)	Same
RFC 3168 (ECN)	Differs

Perhaps the most noticeable Linux 2.4 TCP feature is its retentive destination caching in which the ssthresh and RTO estimation parameters are cached for each destination. This means that once traffic has been sent to some destination D, the initial ssthresh value for the next TCP connection to D will be the same as the one at the end of the preceding connection to D. This is disturbing when you decide to change the packet delay to destination D and expect the initial ssthresh value to be infinite; i.e., stay in slow start until a packet drop occurs.

Below are comments on some of the features listed in the table:

Control Block Sharing (RFC 2140)
New Reno (RFC 2582)
Congestion Control (RFC 2581)
RTO (RFC 2988)

Tutorial >> Summary Information	TOC