ONL NPR Tutorial

The NPR allows you to assign flows to packet scheduling queues that will provide bandwidth guarantees. Earlier we discussed how packet filters can be used to direct packets to specific packet scheduling queues. Queue parameters allow you to define the capacities of these queues and the fraction of the output rate that a queue will be given.

This page briefly discusses the concepts behind the use of packet scheduling queues and how to use the RLI to define the properties of those queues. An example based on some of the concepts discussed on this page is given in NPR Tutorial => Examples => Filters, Queues and Bandwidth. The sections in this page are:

Recall that there are 8,192 FIFO (First-In-First-Out) queues at each of the five output ports. The first 64 (QIDs 0-63) are datagram queues, and the remaining 8,128 queues are reserved flow queues. By default, the NPR will put packets into one of the 64 datagram queues using a stochastic fair queueing algorithm. This algorithm attempts to evenly spread packets over these queues using portions of the source and destination IP addresses to compute the QID of the datgram queue. One result of the PLC's route table lookup is to determine the datagram queue for a packet.

But you can use a packet filter to direct matching packets to a specific queue &mdash typically, one of the reserved queues (QIDs 64-8191). You can also define the queue's capacity and its bandwidth guarantee. Thus, through the use of filters and the configuration of queues, you can give preferential treatment to one flow over other flows.

Threshold

When a queue becomes full, the NPR drops incoming packets to that queue. Traditionally, network operators don't like to drop packets because an incoming packet might have already consumed network resources on its journey to a router. Dropping a packet that must be reliably delivered to a destination just increases the load on the network. On the other hand, a large threshold value can mean that a long queue can develop leading to a long queueing delay as it waits for those in front to leave the queue. It is not unheard of for router's to have enough buffering that lead to delays in excess of a few RTTs (round-trip times).

Recently, there has been some evidence that a small number of buffers for each flow might be a good design choice for TCP flows. The thinking is that a small buffer size for each flow will allow the router to signal congestion to the sender through early packet drops.

Quantum

The NPR uses a WDRR (Weighted Deficit Round Robin) packet scheduling algorithm over the queues at each output port. The quantum is the weight assigned to the queue. Consider a WDRR packet scheduler that manages four queues (not the 8,192 queues in the NPR) and quantum values of W0, W1, W2 and W3. Let W be the sum of the weights; i.e., W = W0+W1+W2+W3. Then, queue k (k=0,1,2,3) is guaranteed to get the fraction Wk/W of the link capacity when all four queues are non-empty. Of course, the guarantee can only met at a large enough time scale. For example, if W0=W1=1500, and W2=W3=3000 and the link capacity is R = 12 Mbps, W=9000, and queue 2 will be guaranteed to get 1/3=3000/9000 of the 12 Mbps or G2=4 Mbps. In fact, the guaranteed rates for the four queues are G0=G1=2 Mbps and G2=G3=4 Mbps.

If some of the queues are empty, the above bandwidth sharing idea still applies but just over the non-empty queues. For example, suppose all queues are non-empty except queue 2. Then, in the above example, W=6000, and W0=W1=1/4 and W3=1/2.

CAVEAT: A word of caution is in order at this point. In the first example above, in theory, other weight combinations will give the same guaranteed rates; e.g., W0=W1=1 and W2=W3=2. But you should avoid quantum values less than 1500 because the bandwidth sharing described in the preceding paragraph will not hold. In fact, if you choose quantum values of 256 and 512 for two queues hoping to give twice as much bandwidth to the second queue, you will discover that the two flows will get equal bandwidth shares. (This is a bug.) A rule of thumb is that you should choose quantum values that are atleast 1500.

Number of Flows Per Queue

If only one flow is mapped to each queue, the packets in that flow will get the bandwidth guarantee of the queue. Since each queue is served in FIFO order when its turn arrives, packets going to the same queue may still experience queueing delays. If you map more than one flow to the same queue, packets from all of those flows will contend for the bandwidth guaranteed to the queue.

When a route table entry or filter entry with qid 0 (automatic qid) is used to direct a packet to a datagram queue, a hash function is used to compute the datagram QID. The intent is to distribute flows evenly across the 64 datagram so as to reduce head-of-the-line blocking that would occur if flows were mapped to one queue.

A precise definition of this hash function is given at the end of this page. The hash function produces an integer between 0 and 63 based on bits extracted from the source IP address, source port number, and destination port number found in the IP and transport layer (e.g., TCP, UDP) headers. If we apply this hash function to our running 2-NPR, dumbbell example, the packets coming from n1p2 and n1p3 will be placed into different datagram queues (somewhat by chance). Specifically, packets from n1p2 and n1p3 will be placed into queues with QIDs 21 and 26 respectively. The effect is that the two flows are placed in different scheduling queues. [[ 2udp-bw-routes-300Mbps.png Figure ]]

Recall that in our 2-NPR dumbbell example has a 300 Mbps bottleneck at output port 1.4. And n1p2 and n1p3 are sending UDP packets to n2p2 and n2p3 respectively.

If NPR 1 (on the left) uses just route tables to direct the packets to out port 1.4 toward NPR 2, both flows should get an equal share of the bottleneck bandwidth because they go into separate queues with equal quantum values; i.e., 150 Mbps each.

The figure (right) shows that the stochastic fair queueing hash algorithm does give an equal share of the 300 Mbps to the two flows. The 12-second contention period is [1,346, 1,358]. The 2.2 line which represents the n1p2 traffic forwarded by port 1.4 drops from 200 Mbps to 150 Mbps at time 1,346. Meanwhile the 2.3 line immediately reaches 200 Mbps when the second flow from n1p3 starts. This situation continues until time 1,358 when the first flow stops and n1p3 receives its entire 200 Mbps.

The queue parameters can be set through the Queue Table window that appears when you select Configuration => Queue Table from a ports menu. At the beginning of this section on packet filters, we showed how to direct packets from an input port to a specific queue. The default threshold (size of the queue) is only 32,768 bytes. So, with filters at ports 1.2 and 1.3 directing both flows to queue 64, queue 64 would have started to drop packets almost immediately after the second flow started since queue 64 only has enough buffering for about 21 maximum-sized (1500-byte) packets. We now show how to increase the size of queue 64 at port 1.4 so as to delay the onset of overflow. In particular, we show how to set the threshold to 1,500,000 bytes. We do the following:

We demonstrate the interaction of two 200-Mbps UDP flows going through a 300 Mbps bottleneck link at port 1.4 (port 4 of the left NPR). Filters at NPR 1 (on the left) direct packets coming from n1p2 and n1p3 to queue 64 at port 1.4. The flow from n1p3 starts 8 seconds after the one from n1p2.
[[ 2udp-bw-1q-300Mbps.png Figure ]]

The bandwidth chart shows that both senders are transmitting at about 200 Mbps (1.2 and 1.3) when active. The 2.1 plot shows that port 2.1 is receiving about 300 Mbps (the bottleneck output rate) when both senders are active. The 2.2 and 2.3 plots show the bandwidths transmitted to the two receivers and indicate that the sum of the two flow rates is 300 Mbps and the two flows jockey for their 150 Mbps fair share of the bottleneck bandwidth. This short-lived unequal sharing of the link capacity is possible because the interpacket times for each flow are not constant.

If we change the configuration so that the flow from n1p2 to n2p2 is placed into queue 65 at port 1.4 and configure queue 65 to be identical to queue 64, each flow should get 150 Mbps, an equal share of the 300 Mbps capacity of output port 1.4. The recipe for doing this given in the example in NPR Tutorial => Examples => Filters, Queues and Bandwidth.

The figure (right) shows the bandwidth chart. Compare this bandwidth chart with the one above. The 2.2 and 2.3 plots in the congestion interval [1,155, 1,167] are identical(about 150 Mbps); i.e., both flows now get an equal share of the 300 Mbps link capacity at port 1.4.

If we change the quantum for queue 65 to 3000 while leaving the quantum for queue 64 at 1500, the bandwidth chart will show that during the congestion period the 2.2 plot is getting 200 Mbps and the 2.3 plot is getting 100 Mbps (i.e., the ratio of the bandwidth to n2p2 to the bandwidth to n2p3 is 3000:1500 or 2:1).

We now give a precise description of the datagram QID hash function. A packet going to one of the 64 datagram queues is determined by the following algorithm:

Extract Address Components:
	sa[]	= c0 a8 01 30 (hex)	= sa[31:24] sa[23:16] sa[15:8] sa[7:0]
	sa[9:8]	= 1 (hex) = 01 (binary)		since	sa[11:8] = 01 (hex)
	sa[6:5]	= 01 (binary)			since	sa[7:0]  = 30 (hex)
	da[6:5]	= 01 (binary)			since	da[] = c0 a8 02 30 (hex)

Compute H:
	H	= 01 01 01 (binary)
		= 16 + 4 + 1 = 21 (decimal)
Compute QID:
	QID	= 21

The ONL NPR Tutorial

Mapping Flows to Queues

Packet Queue Concepts

Threshold

Quantum

Number of Flows Per Queue

Datagram Queues

Configuring a Reserved Flow Queue

Effect of Multiple Queues

Effect of Quantum

Datagram QID Hash Function