NPR Tutorial >> Writing A Plugin | TOC |
This page describes some useful plugin code fragments. During the discussion, you will be introduced to some useful plugins that do queue management, packet scheduling and traffic shaping.
You may have a need to chain together multiple plugins. For example, you might want to do some packet processing in one plugin and then pass the packet to the delay plugin. Most of the simple plugins (e.g., mycount) send meta-packets to the Queue Manager (QM). We have also seen two alternatives to sending to the QM: dropping a packet (nstats) and sending a meta-packet to the MUX block with a non-zero plugin tag so that it can be sent to the PLC (Parse, Lookup and Classify) block for reclassification (TOStag). Because the nstats plugin drops all meta-packets it receives, it sends each meta-packet to the Freelist Manager which then frees up the space associated with the meta-packet (e.g., DRAM packet buffer, SRAM packet descriptor). In the TOStag plugin, the plugin tag field in meta-packets is set to a value equal to one more than the TOS field in the IP packet header before forwarding on to the MUX block. But another useful destination would be to forward to another plugin microengine.
void handle_pkt() { dl_source_packet(dlFromBlock); handle_pkt_user( ); dl_sink_packet(dlNextBlock); }
As shown above, handle_pkt() in the TOStag plugin gets a packet from the input ring indicated by dlFromBlock, processes the packet with handle_pkt_user(), and then sends the meta-packet to the block indicated by dlNextBlock. Recall that the code at the end of handle_pkt_user() did the processing necessary to send the meta-packet to the MUX block. You can take a similar approach when sending a meta-packet to another plugin.
Forwarding a meta-packet to a plugin microengine involves two simple steps:
This variable is initialized to direct meta-packets to the Queue Manager (QM) input ring in plugin_init() which is called by every thread as part of thread initialization.
For example, the output port field for a meta-packet destined for the Queue Manager is the upper 3-bits of a 16-bit Queue Manager QID field. But the output port field for a meta-packet destined for another plugin is the second lower order 3-bits of the 16-bit uc_mc_bits field.
This makes forwarding of an incoming meta-packet to another plugin trivial. (We will see this code in the helper_send_from_queue() function later.)
If the plugin will always send meta-packets to the same microengine, the implementation is simpler and faster (much like that used in the TOStag plugin). If not, dlNextBlock will need to be determined through the control message interface. In this latter case, all of the thread-specific values of dlNextBlock will need to be updated to a consistent value.
This section describes one family of plugins where dlNextBlock never changes after it is set during initialization in plugin_init_user(). The next section describes the case when dlNextBlock can be changed dynamically through the control message interface.
By convention, plugins with names ending in ++ (e.g., shaper++, delay++, erd++) have been written so that if they are loaded into plugin ME 4, they will forward all meta-packets to the QM; otherwise, they will forward to ME k+1 if they are loaded into ME k. Consider, for example, the following configuration in which PLC sends meta-packets to a traffic shaper plugin. From there, the meta-packets go to a delay plugin, then the Early Random Drop plugin, and then the Queue Manager (if not dropped by erd++).
ME | Plugin | Description | From | To |
---|---|---|---|---|
2 | shaper++ | Traffic shaper | PLC | ME 3 |
3 | delay++ | Delay | ME 2 | ME 4 |
4 | erd++ | Early Random Drop | ME 3 | QM |
Three pieces of code support this plugin chaining paradigm:
void plugin_init_user() { 1 if(ctx() == 0) { ... initialization for thread 0 ... 2 3 if( pluginId == 0 ) dlNextBlock = PACKET_IN_RING_1; 4 else if( pluginId == 1 ) dlNextBlock = PACKET_IN_RING_2; 5 else if( pluginId == 2 ) dlNextBlock = PACKET_IN_RING_3; 6 else if( pluginId == 3 ) dlNextBlock = PACKET_IN_RING_4; 7 else dlNextBlock = QM; }
The plugin_init_user() code in lines 3-7 for the erd++ plugin (shown above) initializes dlNextblock based on which ME it is running on. This initialization code appears in all chained plugins (i.e., the ones with names ending in ++). Note that lines 3-7 are executed by all thread contexts. The declaration of the variable dlNextBlock is in the global declaration area (outside the scope of any function):
__declspec(gp_reg) int dlNextblock; // declared in global area
The erd++ plugin probabilistically drops a packet if the the length of its destination queue should be managed and its queue length is larger than a threshold. If it determines that the packet should be dropped (code not shown), it sets the variable droppkt to 1. Otherwise, it sets droppkt to 0.
void handle_pkt_user( ) { 1 int droppkt; 2 3 ... Set droppkt to 1 if the packet should be dropped ... 4 5 if( droppkt ) { 7 onl_api_plugin_cntr_inc(pluginId, DROP_COUNT); 8 ++ndrops; 9 if ( helper_set_meta_default( DROP ) != 0 ) { 10 helper_set_errno( BAD_NXTBLK ); 11 } 12 } else { 13 if ( helper_set_meta_default( dlNextBlock ) != 0 ) { 14 helper_set_errno( BAD_NXTBLK ); 15 } 16 } }
Lines 5-16 sets the fields in the outgoing meta-packet based on the value of droppkt. If the packet should be dropped, it calls helper_set_meta_default() in line 9 with the argument DROP to initialize the fields so that the meta-packet will be sent to the Freelist Manager. Otherwise, it calls helper_set_meta_default() in line 13 with the argument dlNextBlock to initialize the fields so that the meta-packet will be sent to the next plugin or the Queue Manager in the plugin chain.
Finally, we come to helper_set_meta_default(), the function that actually puts the meta-packet into the input ring of the next packet processing block. Below is its control structure:
static __forceinline int helper_set_meta_default( __declspec(gp_reg) int nextBlk ) { 1 __declspec(gp_reg) int out_port; 2 3 dlNextBlock = nextBlk; 4 5 if( nextBlk == QM ) { 6 ... insert meta-packet into Queue Manager's input ring 7 } else if( nextBlk == DROP ) { 8 ... insert meta-packet into Freelist Manager's input ring 9 } else if( nextBlk == MUX ) { 10 ... insert meta-packet into MUX's input ring 11 } else if( (nextBlk == PACKET_IN_RING_0) || 12 (nextBlk == PACKET_IN_RING_1) || 13 (nextBlk == PACKET_IN_RING_2) || 14 (nextBlk == PACKET_IN_RING_3) || 15 (nextBlk == PACKET_IN_RING_4) ) { 16 ... insert meta-packet into plugin ME's input ring 17 } else if( nextBlk == DO_NOTHING ) { // do nothing 19 } else { // all other options 20 return -1; // error 21 } 22 return 0; }
The control structure (and processing) is this complicated because the format of the meta-packet depends on the IXP block. For example, the Queue Manager accepts a 12-byte (3-word) meta-packet, but a plugin accepts a 24-byte (6-word) meta-packet. This difference is obvious when we look at the expansions of lines 6 and 16.
Line 6 is the code for sending the meta-packet to the Queue Manager.
... 5 if( nextBlk == QM ) { 6.1 __declspec(gp_reg) int out_port; 6.2 out_port = (ring_in.uc_mc_bits >> 3) & 0x7; 6.3 onl_api_update_ring_out_to_qm( 6.4 ring_in.buf_handle_lo24, 6.5 out_port, 6.6 (((out_port+1) << 13) | ring_in.qid), 6.7 ring_in.l3_pkt_len); 7 } else if( nextBlk == DROP ) { ...
The figure above shows that the 3-word outgoing meta-packet to be sent to the QM is constructed from the incoming 6-word meta-packet. Four fields are passed to onl_api_update_ring_out_to_qm():
The 24-bit buffer handle is copied from the plugin's input ring buffer.
(Line 6.2) The 3-bit output port number is extracted from bits 3-5 of the 8-bit unicast field in the plugin's input ring buffer.
(Line 6.6) The raw (or internal) QID is a 16-bit quantity consisting of the the 3-bit internal output port number followed by the 13-bit visible (external) QID. For example, QID 64 at port 1 in the RLI is encoded as the 16-bit quantity 0x4040 (hexadecimal) which is 16,448 (decimal) (= 2 * 8192 + 64); i.e., there are 8,192 QIDs numbered from 0 to 8191 decimal, and port number is encoded to be one more than its visible value.
The 16-bit datagram length is extracted from the plugin's input ring buffer.
Line 16 is the code for sending the meta-packet to another plugin. Because the format of the meta-packet to be sent to another plugin is 6-words instead of 3-words, we use a different function for sending to the next plugin in the plugin chain.
... 11 } else if( (nextBlk == PACKET_IN_RING_0) || 12 (nextBlk == PACKET_IN_RING_1) || 13 (nextBlk == PACKET_IN_RING_2) || 14 (nextBlk == PACKET_IN_RING_3) || 15 (nextBlk == PACKET_IN_RING_4) ) { 16.1 onl_api_update_ring_out_to_plugin( 16.2 ring_in.buf_handle_lo24, 16.3 (ring_in.uc_mc_bits >> 3) & 0x7, 16.5 ring_in.in_port, 16.6 ring_in.plugin_tag, 16.7 ring_in.stats_index, 16.8 0, 16.9 ring_in.qid, 16.10 ring_in.nh_eth_daddr_hi32, 16.11 ring_in.nh_eth_daddr_lo16, 16.12 ring_in.eth_type, 16.13 ring_in.uc_mc_bits, 16.14 ring_in.l3_pkt_len); 17 } else if( nextBlk == DO_NOTHING ) { ...
The function onl_api_update_ring_out_to_plugin() fills in the 12 fields in the 6-word output ring buffer by copying them from the input ring buffer.
The concept of a plugin chain or pipeline is a useful packet processing paradigm. This section has shown that it is fairly easy to provide a standard approach to implementation. Although the end of this section discussed the details behind the implementation, the user interface is straightforward, and the functionality can be included by copying the code fragments cited above.
The previous section described the plugin chain concept. Its implementation used dlNextBlock as a write-once variable. It's not much harder to use dlNextblock in a more dynamic way where its value can change many times. There is one technical difficulty that must be addressed:
Each thread has its own dlNextBlock variable that indicates the next IXP processing block. In some applications, the different instances need not be kept consistent. But in applications that require consistency, the instances of dlNextBlock need to be updated whenever one instance changes.This consistency can be maintained by storing the value in a shared variable. In the setNxtBlk plugin example below, the official value of the next processing block is stored in the shared variable sharedNextBlock.
The setNxtBlk plugin is a simple example of how to dynamically set the value of dlNextBlock. For example, it is possible to load the setNxtBlk plugin onto microengine 0 and the mycount plugin onto microengine 4 and configure setNxtBlk to send meta-packets to the mycount plugin on ME 4.
By default, the setNxtBlk plugin sends meta-packets to the Queue Manager. The user can change this behavior by sending it the "next=" control message. For example, to send meta-packets to plugin microengine 4, the user would enter the control message "next= PLUGIN4" (note the space character after the = character). The setNxtBlk plugin recognizes the following values for the next block:
Plugin Command | Next Block | Internal Constant |
---|---|---|
"next= PLUGIN0" | Plugin ME 0 | PACKET_IN_RING0 |
"next= PLUGIN1" | Plugin ME 1 | PACKET_IN_RING1 |
"next= PLUGIN2" | Plugin ME 2 | PACKET_IN_RING2 |
"next= PLUGIN3" | Plugin ME 3 | PACKET_IN_RING3 |
"next= PLUGIN4" | Plugin ME 4 | PACKET_IN_RING4 |
"MUX" | MUX | MUX |
"DROP" | Drop Packet | DROP |
"QM" | Queue Manager | QM |
(otherwise) | Queue Manager | QM |
Two functions in the setNxtBlk plugin contain the necessary code to allow dlNextBlock to be changed: handle_pkt_user() and handle_msg(). We discuss only the part of handle_msg() which is unique to this plugin.
void handle_msg() { 1 __declspec(local_mem) char inmsgstr[28]; // inbound 2 __declspec(local_mem) char outmsgstr[28]; // outbound 3 __declspec(sram) char sram_inmsgstr[28]; 4 __declspec(sram) char sram_outmsgstr[28]; 5 ... 6 char SET_next[8] = "next="; 7 ... 8 char BAD_OP_msg[8] = "BAD OP"; 9 char NEED_ARG_msg[12]= "NEED ARG"; 10 ... 11 if( ... ) { 12 ... 13 } else if( strncmp_sram(sram_inmsgstr, SET_next, 5) == 0 ) { 14 __declspec(sram) char *valptr; 15 valptr = helper_nxt_token( sram_inmsgstr, 28 ); 16 if( valptr == 0 ) { 17 memcpy_lmem_sram( outmsgstr, NEED_ARG_msg, 12 ); 18 } else { 19 sharedNextBlock = str2dlNextBlock( valptr ); 20 dlNextBlock = sharedNextBlock; 21 memcpy_lmem_sram( outmsgstr, valptr, 21 ); 22 } 23 } }
The user sets the value of dlNextBlock by entering a control message. For example, to send meta-packets to plugin microengine 4, the user would enter the control message "next= PLUGIN4" (note the space character after the = character). The function str2dlNextBlock() (line 19) translates the external string value ("PLUGIN4" in this example) enterred by the user to the corresponding internal constant (PACKET_IN_RING4 which has value 4) and its return value is used to set the shared variable sharedNextBlock. Note that although Line 20 sets dlNextBlock to the new value sharedNextBlock, handle_msg() is in the control message handling thread which is different than the packet handling thread(s). Thus, it is necessary to save this new value in the shared variable sharedNextBlock so that the value of dlNextBlock in the packet handling thread can be updated to this new value.
1 volatile __declspec(shared gp_reg) int sharedNextBlock; 2 ... 3 void handle_pkt_user() { 4 ++npkts; 5 onl_api_plugin_cntr_inc(pluginId, 0); // Incr global plugin cntr 0 6 7 dlNextBlock = sharedNextBlock; 8 helper_set_meta_default( dlNextBlock ); 9 if( dlNextBlock == MUX ) helper_inc_meta_mux_tag( ); 10 }
Line 7 is where the packet handling thead(s) update the value of
dlNextBlock.
Otherwise, handle_pkt_user() is almost identical to the one
in the TOStag plugin.
It uses the helper_set_meta_default() function (Line 8) to
set the outgoing meta-packet fields based on the value of
dlNextBlock and meta-packet fields in the input ring buffer.
Line 9 increments the plugin tag field if the meta-packet is going to
the MUX block, allowing the user to install a filter that matches this
plugin tag value.
Queueing Packets Inside A Plugin
We saw earlier that the delay plugin used a queue that was internal to the plugin to store meta-packets until their delay time had expired. The tutorial page Tour_Of_The_Delay_Plugin describes the functions that implement a FIFO queue. This section discusses how other plugins have used that code base to implement their own versions of FIfO queues:
Plugin | Description | Queue Usage | Special Queue Feature(s) |
---|---|---|---|
delay++ | Delay packets | Packets delayed by fixed amount | Support plugin chaining |
shaper++ | Shape traffic | Packets delayed to conform to traffic descriptor | Delay varies to meet traffic descriptor |
priq | Priority queueing | Hold medium and low-priority packets | Two queues with one common free list |
The queueing code used in these plugins were copied from the original delay plugin and customized to varying degrees to fit the special requirements of these plugins. After reviewing the delay plugin, we describe the changes to the queue management code required for each of these plugins. You will see that the delay++ plugin required only a change in the format of the queue item structure, but the priq plugin required rewriting the entire free space management routines.
Recall that the delay queue was a list of items where each item consisted of time (the time when the meta-packet should be forwarded to the Queue Manager), four fields from the incoming meta-packet (buf_handle, out_port, qid, l3_pkt_len) and next (the address of the next item on the list):
1 struct delay_item_tag { 2 union tm_tag time; // time to leave 3 unsigned int buf_handle; // meta-packet 4 unsigned int out_port; // . 5 unsigned int qid; // . 6 unsigned int l3_pkt_len; // . 7 struct delay_item_tag *next; 8 }; 9 struct delayq_tag { 10 unsigned long ninq; // # in delay queue 11 struct delay_item_tag *hd; // head ptr 12 struct delay_item_tag *tl; // tail ptr 13 struct delay_item_tag *free_hd; // free list 14 }; 15 16 #define MAX_QUEUE_SZ 35000 // max #items in queue 17 __declspec(shared, sram) struct delayq_tag delayq; // queue descriptor
Access to the queue and the free space is provided by the queue descriptor delayq (line 17) which is the structure defined in lines 9-14. Lines 1-8 define the structure of an item on the queue and is 28-bytes (each integer and pointer is 4 bytes and a time is 8 bytes). Since each plugin has access to its own 1 MB of SRAM, we can have over 37,000 items in the queue, Line 16 defines the number of items on the initial free list to be 35,000 items which is well below the 37,000 items provided by 1 MB.
The delay++ plugin is just the chained plugin version of the delay plugin. Becaue the plugin may need to forward meta-packets to another plugin rather than just the Queue Manager, it must store the incoming meta-packet and not just the fields needed by the Queue Manager.
1 // sizeof(struct item_tag) = 36 ==> 29,127 items in 1 MB 2 struct item_tag { 3 union tm_tag tdepart; // time for pkt to leave 4 plugin_out_data metapkt; 5 struct item_tag *next; 6 }; 7 8 struct queue_tag { 9 unsigned long npkts; // #pkts in queue 10 unsigned long nbytes; // #bytes in queue 11 unsigned long maxinq; // max #pkts in queue 12 unsigned long ndrops; // #overflows from queue 13 unsigned long nerrs; // #errors other than drops 14 struct item_tag *hd; // head ptr 15 struct item_tag *tl; // tail ptr 16 struct item_tag *free_hd; // free list 17 }; 18 19 #define MAX_QUEUE_SZ 29000 20 __declspec(shared, sram) struct queue_tag queue; // queue descriptor
Lines 2-6 define the structure of an item used in the delay++ plugin. Since an entire meta-packet (line 4) is six words (or 24 bytes), an item is now 36 bytes instead of 24 bytes. The effect of this change is a reduction in the maximum number of items to 29,000 shown in line 19. The additions to the queue descriptor in lines 9-13 are extensions to the statistics collected by this version of the queue management routines. Of course, there are additional lines of code that initialize and maintain these variables. There are no other major changes that were made to the queueing code.
The shaper++ is a traffic shaper plugin that implements traffic shaping using a token bucket; that is, the output of the traffic shaper with burst size B and rate R has a long-term average rate of R with an initial burst of B bytes after a sufficient idle period. It is similar to the delay++ plugin except the callback() function adds tokens at a rate of R and forwards the first meta-packet in the queue when there are enough tokens in the token bucket. Thus, it delays meta-packets by a variable amount instead of the fixed amount in the delay plugin.
1 // sizeof(struct item_tag) = 32 ==> 32,168 items in 1 MB 2 struct item_tag { 3 plugin_out_data metapkt; 4 unsigned int iplen; 5 struct item_tag *next; 6 }; 7 ... 8 #define MAX_QUEUE_SZ 32000
The only change is that an item contains the length of the IP datagram (iplen) instead of the departure time and therefore, an item is four bytes smaller.
The priq plugin implements priority queueing that handles three traffic priorities: high, medium and low. To do this, it maintains two internal queues: one for medium priority packets and one for low priority packets. All high-priority packets get sent immediately to queue 64 at the output port. Medium and low priority packets get forwarded in priority order and only if queue 64 is empty. It uses the same queueing structure as the delay plugin except that:
1 // sizeof(struct item_tag) = 20 ==> 52,428 items in 1 MB 2 struct item_tag { 3 unsigned int buf_handle; // meta-packet 4 unsigned int out_port; // . 5 unsigned int qid; // . 6 unsigned int l3_pkt_len; // . 7 struct item_tag *next; 8 }; 9 10 #define N 2 11 #define MAX_QUEUE_SZ 40000 12 __declspec(shared sram) struct queue_tag queue[N]; // descriptor 13 __declspec(sram) struct item_tag * __declspec(shared sram) free_hd; 14 // free list pointer to SRAM that resides in sram and is shared
There are now two queue descriptors (line 12) instead of one: one for medium priority packets and one for low priority packets. Furthermore, the free list pointer has been separated out from the queue descriptor (line 13). This required rewriting the queue management routines (not shown) queue_init(), queue_enq(), queue_pop(), queue_alloc() and queue_free().
The declaration of the free list pointer in line 13 looks strange. Here is where the heterogeneous memory hierarchy rears its head! There is a big semantic difference between the declaration in line 13 and these two declarations:
(A) __declspec(shared sram) struct item_tag * free_hd; (B) __declspec(shared sram) struct item_tag * __declspec(sram) free_hd;
We want to say that there is only ONE copy of the pointer which is shared among all of the thread contexts. So, the rightmost memory modifier should have been written as __declspec(shared sram) because the pointer itself is shared. Furthermore, since the shared modifier in the leftmost part of line (B) is irrelevant, we can omit it. The result of these changes is line 13.
Below is the handle_pkt_user() which enqueues packets to the proper internal queue. The three flow priorities are GOLD_FLOW (high priority), SILVER_FLOW (medium priority) and BRONZE_FLOW (low priority). The priq plugin forwards all packets to queue RESQ (=64), port 4 in priority order. The callback() thread forwards medium and low priority packets while the handle_pkt() thread immediately forwards high priority packets.
void handle_pkt_user( ) { 1 ... Compute qid and out_port ... 2 ... Update counters ... 3 if( qid == GOLD_FLOW ) { // high priority 4 helper_set_meta_default( QM ); 5 helper_set_meta_qid( out_port, RESQ ); 6 } else { 7 rawqid = FormRawQid( out_port, RESQ ); 8 if( qid == SILVER_FLOW ) { // medium priority 9 ninq = queue_enq( &queue[SILVERQ], 10 ring_in.buf_handle_lo24, 11 out_port, 12 rawqid, 13 ring_in.l3_pkt_len ); 14 } else { // low priority 15 ninq = queue_enq( &queue[BRONZEQ], 16 ring_in.buf_handle_lo24, 17 out_port, 18 rawqid, 19 ring_in.l3_pkt_len ); 20 } 21 if( ninq == -1) { // out of free space 22 helper_set_out_to_DROP( ); 23 } else { // OK 24 helper_set_out_to_DO_NOTHING( ); 25 } 26 } }
The callback() thread forwards the highest priority packet from the medium and low priority queues when queue 64 at the output port is empty. If queue 64 is not empty it just sleeps for about 10 usec.
void callback() { 1 __declspec( gp_reg ) onl_api_qparams qparams; 2 3 onl_api_getQueueParams( 41024, &qparams ); // 5*8192+64 4 5 if( qparams.length == 0 ) { // empty high-priority queue 6 if( queue[SILVERQ].npkts > 0 ) { 7 helper_send_from_queue_to_QM( &queue[SILVERQ] ); 8 } else if( queue[BRONZEQ].npkts > 0 ) { 9 helper_send_from_queue_to_QM( &queue[BRONZEQ] ); 10 } 11 } 12 sleep( SLEEP_CYCLES ); }
The erd++ plugin implements an early random drop algorithm which probabilistically drops packets once a queue gets above a given threshold. The global variables used in the algorithm are shown below:
__declspec(shared gp_reg) unsigned int qlen[N]; // queue length (bytes) __declspec(shared gp_reg) unsigned int qthresh[N]; // ERD threshold; i.e., // when to start dropping (bytes) __declspec(shared gp_reg) unsigned int dropmask[N]; // (2^K - 1)
The callback() thread reads the length of one of the target queues (64-67) of a specified port and stores the values in the shared array qlen[N]. It continuously cycles through these four queues so that each queue length has been read once after about 40 usec have elapsed. The indices of the three arrays above correspond to queues 64-67 respectively.
The array qthresh[N] contains the drop thresholds for the four queues and its default value is computed by plugin_init_user(). Currently, the default drop thresholds are set to one-fourth the queue threshold from its RLI setting but can be changed through the control message interface.
The array dropmask[N] is used to select the bits from random integers which are used to decide whether to drop packets from over populated queues. Currently, the default value is 0x3f but can be changed through the control message interface. It is used in the drop() function below which returns 1 if a packet should be dropped and 0 otherwise.
static __forceinline int drop( __declspec(gp_reg) unsigned int qlen, __declspec(gp_reg) unsigned int qthresh, __declspec(gp_reg) unsigned int dropmask ) { 1 int randint; 2 __declspec(gp_reg) int dropit; 3 4 if( qlen < qthresh ) return 0; 5 6 randint = rand(); 7 randint = randint & dropmask; // rightmost K bits 8 9 if( randint == 0 ) dropit = 1; 10 else dropit = 0; 11 12 return dropit; }
At the heart of the drop() function is the library function rand() in line 6 which returns a pseudo-random, 16-bit unsigned integer between 0 and RAND_MAX = 32767. The current default value of all dropmask[] values is 0x3f which when used as in line 7 with the bitwise AND operator selects the rightmost seven bits from the random integer and stores it in the variable randint. Lines 9 and 10 determine the return value of the drop() function. If the value of randint is 0 it returns 1 (i.e., drop the packet); otherwise it returns 0. The effect in the long run is to drop packets with probability 1/128 whenever the queue is overpopulated.
The approach taken by drop() assumes that the denominator of the drop probability is an integer power of 2. Below is an approach for returning a 1 with probability m/n where m and n are arbitrary integers with m no more than n:
1 randint = rand(); 2 randint = randint % n; 3 if( randint < m ) dropit = 1; 4 else dropit = 0;
Below is the handle_pkt_user() function that calls the drop() function and then either drops the meta-packet or forwards it to the next block.
void handle_pkt_user( ) { 1 __declspec(gp_reg) unsigned int qid; 2 int droppkt; 3 4 qid = ring_in.qid & 0x1fff; // external qid 5 if( qid == 64 ) { 6 droppkt = drop( qlen[0], qthresh[0], dropmask[0] ); 7 } else if( qid == 65 ) { 8 droppkt = drop( qlen[1], qthresh[1], dropmask[0] ); 9 } else if( qid == 66 ) { 10 droppkt = drop( qlen[2], qthresh[2], dropmask[0] ); 11 } else if( qid == 67 ) { 12 droppkt = drop( qlen[3], qthresh[3], dropmask[0] ); 13 } else { 14 droppkt = 0; // forward pkt untouched 15 } 16 17 if( droppkt ) { 18 if ( helper_set_meta_default( DROP ) != 0 ) ... error ... 19 } else { 20 if ( helper_set_meta_default( dlNextBlock ) != 0 ) ... error ... 21 } }
If you look at the sequence of random integers produced by rand(), you might be surprised to see that the sequence is identical repeated each time you reload the plugin. That's because the numbers are produced by an algorithm and therefore the integers are not really random, but pseudo-random; i.e., they do pass some statistical tests that indicate that they appear random. If you want to generate a different sequence, you can initialize the pseudo-random number generator with a seed value using the srand() library function:
int seed = 4321; ... srand( seed );
The erd++ plugin allows the user to set the seed through the "seed=" control message. Those interested in its implementation can look at the handle_msg() function in the erd++.c source code.
The shaper++ plugin shapes traffic using a token bucket that maintains a constant output rate of rate_Kbps Kbps when sufficiently backlogged but will allow a burst of atmost bucketsz bytes if it has been idle long enough. It ensures that the following bound is maintained during a period when a queue is backlogged:
The handle_pkt_user() just queues incoming packets, and the callback() thread maintains the bound above. The callback() thread adds tokens to the variable token_cnt at a rate of rate_Kbps Kbps such that:
It forwards the next meta-packet when token_cnt is atleast equal to the IP datagram length and then decrements token_cnt by the datagram length. Initially, token_cnt will be equal to bucketsz. So, any arriving meta-packet will be immediately forwarded since bucketsz is usually chosen to be atleast equal to the maximum-sized packet.
The intellectual center of the shaper++ plugin is the callback() thread. It adds tokens to the token counter, dequeues and forwards packets as long as there are enough tokens, and decreases the token counter according to the packets that it forwards. A single token was chosen to be equal to 0.0001 bits so that rates between 1 Kbps and 1 Gbps could be accurately supported when the sleep interval was 10 usec. This selection means that there are 80,000 tokens per byte.
__declspec(shared gp_reg) union tm_tag told;// last callback time ... #define TOKENS_PER_BYTE 80000 // 1 token = 0.0001 bits void callback() { 1 __declspec(gp_reg) unsigned int pktlen_tokens; 2 union tm_tag tnow; 3 long long tdiff_nsec; 4 int rc; 5 // update token counter 6 tnow.tm2.lo = local_csr_read( local_csr_timestamp_low ); 7 tnow.tm2.hi = local_csr_read( local_csr_timestamp_high ); 8 tdiff_nsec = diff_nsec( tnow.tm, told.tm ); 9 token_cnt = token_cnt + (tdiff_nsec*rate_Kbps)/100; 10 if( token_cnt > TOKENS_PER_BYTE*bucketsz ) 11 { token_cnt = TOKENS_PER_BYTE*bucketsz; } 12 told.tm = tnow.tm; 13 // forward packets as long as there are enough tokens 14 while( queue.npkts > 0 ) { 15 pktlen_tokens = TOKENS_PER_BYTE*queue.hd->iplen; 16 if( token_cnt >= pktlen_tokens ) { // fwd first pkt 17 rc = helper_send_from_queue( &queue, dlNextBlock ); 18 token_cnt -= pktlen_tokens; 19 if( rc != 0 ) ... error ... 20 } else break; 21 } 22 23 sleep( SLEEP_CYCLES ); }
Two useful functions for processing control messages are helper_count_words() and helper_tokenize(). As their names imply, they have the following functionality:
The code fragment below from the handle_msg() function of the shaper++ code demonstrates how these two functions are used in processing a control message such as "params= 1000 3000" which sets the token bucket average rate to 1000 Kbps and the bucket size to 3000 bytes. The basic idea is to use helper_count_words() to find out if there are enough arguments of an operation and to use helper_tokenize() to find the next word and terminate it with a NUL byte so that conversion functions such as helper_atou_sram() can be applied to the word.
void handle_msg() { ... other declarations ... __declspec(local_mem) char outmsgstr[28]; __declspec(sram) char sram_inmsgstr[28]; char SET_params[8] = "params="; ... other operations ... 1 } else if( strncmp_sram(sram_inmsgstr, SET_params, 7) == 0 ) { 2 char *cmnd_word; // points to input command field 3 char *rate_word; // points to input rate(Kbps) field 4 char *bucketsz_word; // points to input bucketsz(bytes) field 5 unsigned int nwords; 6 7 nwords = helper_count_words( sram_inmsgstr ); 8 if( nwords != 3 ) { 9 memcpy_lmem_sram( outmsgstr, NEED_ARG_msg, 12 ); 10 } else { 11 cmnd_word = helper_tokenize( sram_inmsgstr ); // get command 12 rate_word = helper_tokenize( cmnd_word+strlen(cmnd_word)+1 ); 13 bucketsz_word = helper_tokenize( rate_word+strlen(rate_word)+1 ); 14 15 rate_Kbps = helper_atou_sram( rate_word ); 16 bucketsz = helper_atou_sram( bucketsz_word ); 17 helper_sram_outmsg_2ul( rate_Kbps, bucketsz, outmsgstr ); 18 } 19 } else ... other operations ... }
We describe the above code when the control message is "params= 1000 3000" which attempts to set the long-term average rate to 1,000 Kbps and the bucket size to 3,000 bytes.
Revised: Fri Apr 3, 2009
NPR Tutorial >> Writing A Plugin | TOC |