The ONL NPR Tutorial

NPR Tutorial >> Writing A Plugin TOC

New Window?

Handling Errors

Contents

So far, when an error occurred in a simple plugin, we either ignored it or sent a debug message to the Xscale Control Processor. But ignoring the error may lead to confusion when trying to locate the root cause of misbehavior. Alternatively, sending a debug message to the Xscale has the disadvantage that the message volume can lock up the NPR. Another approach is to log an error code (perhaps along with auxilliary data) which the user can retrieve using a control message.

 

Error Logging With The errno[] Array

We used the error logging approach in the priq plugin to both handle errors and locate a difficult bug. (The priq plugin implements a form of strict priority queueing with three flow priorities (high, medium and low).)

// error codes
#define	BAD_QUEUE_INIT_ERR	1	// bad queue_init()
#define	BAD_ENQ_ERR		2	// bad queue_enq()
#define	BAD_POP_EMPTY_ERR	3	// bad queue_pop() - empty queue
#define	BAD_POP_FREE_ERR	4	// bad queue_pop() - free() failed
#define	BAD_NXTBLK		5	// bad nextBlk
#define	BAD_OUT_PORT		6	// bad QID in meta-pkt
#define	BAD_QID			7	// bad QID in meta-pkt
#define	BAD_HANDLE_A		98	// bad buffer handle in meta-pkt
#define	BAD_HANDLE_B		99	// bad buffer handle in meta-pkt = 0

__declspec(shared gp_reg) unsigned int nerrs;		// #errors
volatile __declspec(shared sram) unsigned int errno[5];	// 1st 5 errors

static __forceinline void
helper_set_errno( __declspec(local_mem) unsigned int n ) {
    if( nerrs < 5 )	errno[nerrs] = n;
    ++nerrs;
    onl_api_plugin_cntr_inc(pluginId, 0);	// external error counter
}

The code fragment above defines nine error codes. The function helper_set_errno() stores up to five error codes in the array errno[5]. The limit of five error codes was an arbitrary choice but is typical since we are usually interested in the earliest errors. Furthermore, we use Plugin Counter 0 as an error counter which we chart to get an indication of errors.

    void plugin_init_user()
    {
1       if(ctx() == 0)
2       {
3   	reset_counters( );
4   
5   	queue_lock = UNLOCKED;
6   	if( queue_init_free() != 0 ) {
7   	    helper_set_errno( BAD_QUEUE_INIT_ERR );
8   	}
10   	queue_init_desc( &queue[SILVERQ], SILVERQ );
11   	queue_init_desc( &queue[BRONZEQ], BRONZEQ );
12       }
    }

The code fragment above which was taken from the priq plugin shows a typical simple example of how helper_set_errno() is used.

 

Run-Time Checking

The priq plugin also introduced the use of run-time checking with auxilliary data logging. During the development of the plugin, the plugin would run fine for thousands of packets and then mysteriously stop forwarding packets. Furthermore, when this lockup occurred, a queue length chart showed millions of packets, a sign that the Queue Manager was being confused from corrupted meta-packets.

void
helper_check_meta( plugin_out_data my_ring_out ) {
    if( my_ring_out.plugin_qm_data_out.out_port != 4 ) {// check output port
    	helper_set_errno( BAD_OUT_PORT );
    }
    if( my_ring_out.plugin_qm_data_out.qid != 41024 ) {	// check QID
    	helper_set_errno( BAD_QID );
    }
							// check buffer handle
    if( (my_ring_out.plugin_qm_data_out.buf_handle_lo24 & 0x7) != 0 ) {
	helper_set_xdata( my_ring_out.plugin_qm_data_out.buf_handle_lo24 );
    	helper_set_errno( BAD_HANDLE_A );
    	++nerrsA;
    }
    if( my_ring_out.plugin_qm_data_out.buf_handle_lo24 == 0 ) {
	helper_set_xdata( my_ring_out.plugin_qm_data_out.buf_handle_lo24 );
    	helper_set_errno( BAD_HANDLE_B );
    	++nerrsB;
    }
}

To check for sane meta-packet fields, the priq plugin calls the helper_check_meta() function before forwarding a meta-packet to the Queue Manager. An error is indicated if any of the following are true:

In the code above, helper_set_errno() is used to record the error. And if an invalid buffer handle is detected, the buffer handle is recorded in the array xdata[NX].
static __forceinline void
helper_send_from_queue_to_QM( __declspec(shared,sram) struct queue_tag *qptr ) {
    plugin_out_data	my_ring_out;	// ring data to next block
    int		rc;

    my_ring_out.plugin_qm_data_out.out_port		= qptr->hd->out_port;
    my_ring_out.plugin_qm_data_out.qid			= qptr->hd->qid;
    my_ring_out.plugin_qm_data_out.l3_pkt_len		= qptr->hd->l3_pkt_len;
    my_ring_out.plugin_qm_data_out.buf_handle_lo24	= qptr->hd->buf_handle;

    rc = queue_pop( qptr );
    if( rc == -1 ) {
    	helper_set_errno( BAD_POP_EMPTY_ERR );
    } else if( rc == -2 ) {
    	helper_set_errno( BAD_POP_FREE_ERR );
    }

    if( debug_on )	helper_check_meta( my_ring_out );

    scr_ring_put_buffer_3word( PLUGIN_TO_QM_RING, my_ring_out.i, 0 );
}

The helper_send_from_queue_to_QM() function above calls helper_check_meta() before it sends the meta-packet to the Queue Manager by calling scr_ring_put_buffer_3word(). The function helper_send_from_queue_to_QM():

#define NX	32
volatile __declspec(shared sram) unsigned int	xdata[NX];	// extra data
volatile __declspec(shared sram) unsigned int	nxdata;		// #xdata valid
volatile __declspec(shared sram) unsigned int	nxget;		// next to send

static __forceinline void
helper_set_xdata( unsigned int x ) {	// record auxilliary data
    if( nxdata < NX ) {
	xdata[nxdata] = x;
	++nxdata;
    }
}

The xdata[] array is handled in a similar manner to how we handled errno[] except for two differences. First, we recorded up to 32 values instead of five values. Second, only two of the possibly 32 auxilliary values could be returned with one =xdata control message request. The variable nxget indicates the next xdata[] element to be sent and allows the user to get all 32 values by issuing 16 =xdata requests.

The xdata[NX] array is also used to record other auxilliary data. For example, this approach was also used to locate a pointer bug associated with management of the free list.


 Revised:  Fri, Feb 13, 2009 

  
  

NPR Tutorial >> Writing A Plugin TOC