The ONL NPR Tutorial

NPR Tutorial >> Writing A Plugin TOC

New Window?

Tour Of The Delay Plugin

Content

The delay plugin shows how you can queue a packet for a fixed period of time (its delay) and then forward the packet when its delay has expired. The features found in the plugin that are different from what is found in the mycount plugin are:

These features make the plugin much more complex than the mycount plugin. The first version of the delay plugin was created by starting with the mycount plugin and incremental adding and testing each feature. An understanding of how these features are implemented should help plugin developers write other comparable plugins (e.g., queue management, packet scheduling).

The basic idea behind the delay plugin is that the handle_pkt_user() thread enqueues an arriving packet onto the delay queue, and the callback() thread dequeues any packet in the delay queue whose forwarding time has arrived. In order to support this paradigm, the following major changes were made to the mycount plugin code:

 

Timer Concepts

IXP timestamps are 64 bits and are read in both handle_pkt_user() to record the arrival time of a packet and callback() to see if it is time to forward the first packet in the delay queue. Reading a timestamp involves reading two 32-bit microengine CSRs (Control Status Registers) that form a 64-bit timestamp that increments once every 16 clock cycles. Since an IXP runs at 1.4 GHz, one tick (16 clock cycles) is 11.42857 nsec. A typical code snippet for atomically reading the time stamp is:

	union tm_tag {
	    long long rm;
	    struct {
	    	unsigned long	hi;
	    	unsigned long	lo;
	    } tm2;
	};
	union tm_tag	y;

	y.tm2.lo = local_csr_read( local_csr_timestamp_low );	// must be first
	y.tm2.hi = local_csr_read( local_csr_timestamp_high );

Note that the order of the two calls is important because the reading of the local_csr_timestamp_low CSR latches the other CSR so that when you finally do read local_csr_timestamp_high, it contains a value that is consistent with the other CSR; i.e., the two statements act as an atomic read of the timestamp.

 

The Delay Queue

Each item in the delay queue represents one meta-packet. The queue is a standard forward-linked list in which each item contains the time the meta-packet should be forwarded and a copy of the meta-packet fields:

	struct delay_item_tag {
	    union tm_tag time;			// time to leave
	    unsigned int buf_handle;		// meta-packet
	    unsigned int out_port;		// .
	    unsigned int qid;			// .
	    unsigned int l3_pkt_len;		// .
	    struct delay_item_tag *next;
	};

We describe the process involved in developing the the interface functions to give you some insight into how other similar functions should be developed.

Also recall that plugins have been allocated a 5 MB region of SRAM for their own use. We describe the initialization of the delay queue so that you can understand how that SRAM region is used and the changes to the Makefile required to support the loading of all five plugin MEs with the delay plugin.

The delay queue interface functions are:

Take Note: There is nothing unusual about these functions except that the code recognizes that we must allocate space for the queue items from the predesignated 5 MB plugin SRAM region. But we first developed most of the code on a general-purpose machine and then made the necessary code modifications to accommodate the IXP before compiling and testing the plugin in the IXP environment. This approach makes development quite fast because you can use normal debugging tools in the general-purpose environment and only have to resort to primitive debug messages to debug the IXP-specific parts of the delay queue code. We highly recommend this approach whenever developing complicated code that is not IXP-specific. For example, we recommend this approach when developing a queue management (e.g., RED) or packet scheduling plugin.

1	int
2	delayq_init( __declspec(shared, sram) struct delayq_tag *qptr ) {
3	    int		i;
4	    int		K = MAX_QUEUE_SZ-1;
5	    struct delay_item_tag *item_ptr;
6	
7	    if ( pluginId == 0)		item_ptr = (struct delay_item_tag *) 0xC0100000;
8	    else if ( pluginId == 1)	item_ptr = (struct delay_item_tag *) 0xC0200000;
9	    else if ( pluginId == 2)	item_ptr = (struct delay_item_tag *) 0xC0300000;
10	    else if ( pluginId == 3)	item_ptr = (struct delay_item_tag *) 0xC0400000;
11	    else if ( pluginId == 4)	item_ptr = (struct delay_item_tag *) 0xC0500000;
12	    else	return -1;
13	
14	    qptr->free_hd = item_ptr;	// queue descriptor
15	    qptr->hd = qptr->tl = 0;
16	    qptr->ninq = 0;
17	
18	    (item_ptr+K)->next = 0;
19	    for (i=0; inext = item_ptr+1;
21		++item_ptr;
22	    }
23	
24	    return 0;
25	}

The NPR uses most of SRAM (e.g., buffer descriptors) but allows the plugin user to use 5 MB of it starting at memory location 0xC0100000. The delay plugin assumes that we will divide up that region into five 1 MB regions and that plugin ME k will use the kth region. Lines 7-12 implements this decision by initializing item_ptr to point to the proper 1MB SRAM region. (Note that the values of item_ptr are separated by an amount equal to 0x00100000 or 2^20.) The rest of plugin_init_user() uses item_ptr to initialze the queue descriptor and the freelist.

The rest of the delayq_init() code is obvious. Lines 14-16 intializes the queue descriptor so that the the freelist pointer points to the beginning of the 1 MB SRAM region (line 14); the head and tail pointers are 0 (line 15); and the number in the queue is 0 (line 16). Lines 18-22 creates the freelist by setting the next pointer to point to the next free item structure.

Two other lines are worth discussing: lines 2 and 4. First, the easy one. The name MAX_QUEUE_SZ appears in line 4. This is the maximum number of items (and therefore meta-packets) that can be queued. It is defined to be 35,000. The implication is that for maximum-sized packets (1,500 bytes), the plugin can support a bandwidth-delay product (BDP) of about 280 Mb. A 280 Mb BDP means that you can have atmost a 280 msec delay at 1 Gbps which is not unreasonable.

Second, in line 2 qptr contains the address of the queue descriptor. The queue descriptor contains the freelist pointer free_hd, the head and tail pointers hd and tl, and the population counter ninq. Note that qptr has been declared to be in SRAM and is shared among threads in the ME. The queue descriptor is actually defined and statically allocated near the beginning of the source code file and then later, its address is passed into delayq_init():

	struct delay_item_tag {
	    union tm_tag	time;		// time to leave
	    unsigned int	buf_handle;	// meta-packet
	    unsigned int	out_port;	// .
	    unsigned int	qid;		// .
	    unsigned int	l3_pkt_len;	// .
	    struct delay_item_tag *next;
	};
	struct delayq_tag {
	    unsigned long		ninq;	// # in delay queue
	    struct delay_item_tag	*hd;	// head ptr
	    struct delay_item_tag	*tl;	// tail ptr
	    struct delay_item_tag	*free_hd;	// free list
	};
	
	__declspec(shared, sram) struct delayq_tag	delayq;
	...
	void plugin_init_user()
	{
	    ...
	    if ( delayq_init( &delayq ) != 0 )	errno = BAD_DELAYQ_INIT;
	    ...
	}

The reason that delayq is declared shared is that both the handle_pkt() thread (via handle_pkt_user()) and the callback() thread update the delay queue. Also, because the queue is shared, we must protect the updating of the queue with a lock. This is shown in the abbreviated delayq_pop() code snippet below:

1	#define UNLOCKED 0
2	#define LOCKED   1
3	__declspec(shared gp_reg) unsigned int	delayq_lock;
4	...
5	int
6	delayq_pop( __declspec(shared, sram) struct delayq_tag *qptr ) {
7	    struct delay_item_tag	*item;
8	
9	    while( delayq_lock == LOCKED )	ctx_swap();
10	    delayq_lock = LOCKED;
11
12	    ... Pop front item from queue and return to freelist ...
13	
14	    delayq_lock = UNLOCKED;
15	    return 0;
16	}

Line 9 yields the CPU (context switch to the next thread by calling ctx_swap()) if some other thread has already acquired the lock ddelayq_lock. Otherwise, line 10 acquires the lock. After updating the delay queue, line 14 releases the lock. The code fragment denoted by line 12 is ordinary, straightforward code for removing the item from the queue and returning the space to the freelist.

The fact that the delay queue descriptor delayq is declared to be in SRAM is an issue if we want to have more than one ME run the delay plugin. This issue is addressed in the section Makefile Settings.

 

The plugin_init_user() Function

The plugin_init_user() function looks identical to the one for the mycount plugin except that it now must intialize the delay queue (line 7) and the delay queue lock (line 6):

1	void plugin_init_user()
2	{
3	    if( ctx() == 0 )
4	    {
5		npkts = 0;		// #pkts seen by plugin
6		delayq_lock = UNLOCKED;
7		if ( delayq_init( &delayq ) != 0 )	errno = BAD_DELAYQ_INIT;
8	    }

Line 7 does indicate that if delayq_init() returns and error, errno is set so that the user can query for the latest error. The point is that we don't do much more with errors because the plugin can't do anything about errors.

 

The handle_pkt() And handle_pkt_user() Functions

There are two changes that need to be made to the mycount plugin when a meta-packet arrives to the delay plugin: