ONL NPR Tutorial

The ONL NPR Tutorial

NPR Tutorial >> Writing A Plugin

TOC

New Window?

Basic Plugin Testing

A Development Strategy
A Testing Strategy

Test 1 (Load): Will the plugin load?
Test 2 (Hello): Will the plugin respond to a simple control message.
Test 3 (A Few Pings): Can you send a few ping packets through the plugin?
Test 4 (Continuous Pings): Can you send ping packets continuously through the plugin?
Test 5 (UDP/TCP Traffic): Can you send UDP or TCP packets through the plugin?

Even though you may have successfully compiled your plugin, there still may be bugs in the code that will cause the plugin to misbehave. Typical signs that there is something wrong with your plugin include:

The plugin doesn't see any traffic.
Traffic doesn't appear to leave the NPR where a plugin is installed.
The plugin will not respond to control messages.

We gave an example of testing the mycount plugin in the page NPR Tutorial => Writing A Plugin => Quick Start. But that plugin was very simple and known to have no bugs. You are likely to encounter unexpected bugs when writing your first plugin that significantly departs from any working plugin.

A Development Strategy

You can not approach the writing, testing and debugging of a new plugin with a cavalier attitude. Such an approach will surely mean little progress even after spending many hours debugging. This situation is due to several factors:

There is no debugger for the IXP 2800.

You will not be able to set breakpoints and examine memory. Instead, you will have to insert calls to onl_api_debug_message() whenever you want to display state information.

There is no operating system.

Bad operations can lead to an unresponsive plugin. You will have to reload the plugin or perhaps restart the entire experiment.

The memory architecture can't be ignored.

Some of the functions in the ONL API already use part of the 640 words of local memory in each ME. This limitation is somewhat relieved by using SRAM, but the user still needs to avoid memory usage extravagance. Variable declarations that don't include the __declspec() modifier will be allocated from a region of SRAM defined by a linker flag (this limit can be modified to some degree). But the standard Makefile allocates each plugin ME only approximately 200,000 bytes for compiler-generated SRAM variables. But a plugin such as the delay plugin, may need more SRAM (e.g., internal queue). The standard Makefile gives each plugin ME 1 MB of SRAM for its own use, but the user code must manage this space.

The IXP architecture is untraditional.

There are a number of pitfalls that can trip up a beginner or at the very least be annoying. For example:
An assignment statement will cause an implicit context switch if it needs to access SRAM or DRAM. That is, the machine code generated by the statement will include an instruction to do a context switch and will not resume until the thread gets control again. This can lead to a race condition.

These factors lead to the following approach to code development and testing:

Start With A Good Base: Start with the source code of an existing plugin that is similar to the one you plan to develop.
This reduces the amount of new code that needs to be written and allows you to begin with a majority of the code in solid, working order.
For Example: When writing the delay plugin, the most complicated related plugin we had was the mycount that counted and forwarded packets without delay. In order to extend the mycount plugin, we had to add these features:
- queue management routines for delayed packets
- additional variables to track queue statistics
- delay packets by a fixed number of milliseconds
Plan For Incremental Steps: Sketch out a sequence of features that you plan to implement and test so that the development can proceed in reasonable steps. Also, sketch out the tests you plan to do so as to verify that the plugin is behaving properly.
Incremental steps will facilitate forward progress. The important point is to make the step small enough so that the plugin behavior can be verified from debug output.
For Example: When developing the delay plugin, the first plugin involved a few small changes to the handle_pkt_user() and handle_msg() routines:
- handle_pkt_user() enqueued a new meta-packet and then immediately dequeued and forwarded the meta-packet.
- handle_msg() was extended to recognize operations for returning the value of some queue management variables.
This would be a small step if the queue management routines were already debugged. During the testing of the very first plugin version, you are just trying to see if you can get one or a few packets through the plugin and whether key plugin variables have correct values.
Do Off-Line Testing First: If there will be a large amount of new code, try to test that code on a general-purpose machine first.
You will be able to use a debugger and a standard operating system on a general-purpose machine. Obviously, you can not test anything having to do with the IXP's memory hierarchy.
For Example: Before doing our first delay plugin test described above, we debugged that part of the queue management routines that were unrelated to the memory hierarchy. The file ~onl/npr/plugins/delay/list.c is a version of the off-line test routine we wrote to test the queue management functions.
Start With Low-Bandwidth Traffic: Sending debug messages will slow down a plugin so much that you can not expect to handle more than about 80 packets per second if only one debug message is generated per packet. During the beginning of a test cycle, you will have to limit the traffic to ping and low-bandwidth UDP packets.
The testing sequence described later takes this approach.
Look For Verification Opportunities: Be on the look out for ways to verify proper plugin behavior. This includes judicious use of onl_api_debug_message() and control messages.
The page Basic Debug Messages described some functions for logging debug messages to a file. Control messages handled by handle_msg() can also be used for debugging. An alternative technique that can be used for higher-bandwidth traffic is described in the page Handling Errors. Also, the value of a key variable can be easily charted if the value is stored in one of the five public ME counters.
For Example: The delay plugin responds to the command =counts which returns the values of npkts (number of packets), maxinq (maximum number of queued packets), and ndrops (number of packets dropped). The values of npkts and ndrops can be compared to the packet statistics from ping and UDP iperf. The value of maxinq can be compared to the expected bandwidth-delay product.
Reuse Existing Code: Little is gained from writing code from scratch if there is existing code that can be modified for your use.
A good source for existing code is the code for standard plugins. See the .c and .h files in subdirectories of ~onl/npr/plugins/. Another good source is in the plugin framework source code which is in the directory ~onl/npr/pluginFramework/. Occassionally, users have found some useful code in the subdirectories of ~onl/npr/onl_router/ which contains code used by the plugin framework code.

A Testing Strategy

Our first objective should be to see if low-bandwidth traffic can flow through the plugin and the plugin continues to respond to control messages. Then, we can use incrementally more difficult tests as each step seems to succeed:

Test 1 (Load): Will the plugin load?
Test 2 (Hello): Will the plugin respond to a simple control message.
Test 3 (A Few Pings): Can you send a few ping packets through the plugin?
Test 4 (Continuous Pings): Can you send ping packets continuously through the plugin?
Test 5 (UDP/TCP Traffic): Can you send UDP or TCP packets through the plugin?

The test sequence will need to be modified if the plugin doesn't handle one or more of the protocols suggested in the test sequence.

The first four tests were demonstrated for the mycount plugin in the Quick Start page. The sections below expand on that mycount example. You can follow these steps using your own plugin.

Test 1 (Load)

Do the following:

Add your mycount plugin to ME 0 using the RLI (Configuration =>Plugin Table // Edit => Add Plugin).
Commit (File => Commit).
Define a log file (Configuration => PluginDebugging).

If the RLI gives no indications of an error, the plugin likely loaded properly and it may have initialized properly. If you are worried about some variable initialization, you can use onl_api_debug_message() to output a message from within the plugin_init() routine or the plugin_init_user() routine.

Test 2 (Hello)

You can still send control messages to the plugin even if you have not created a filter to direct packets to the plugin. The mycount plugin should respond to three commands:

g (get value of pkt_count),
z (zero pkt_count and Plugin Counter 0), and
d (toggle debug_on).

Successful return values from each of these commands indicates that the plugin executed its initialization properly, the hardware threads are context switching, and the handle_msg() routine is responding. This would be a good sign that something terrible has not happened yet.

Sometimes users have an explicit "hello" or "version" command where the plugin responds with a version number. This is also a good idea when you have a short debug cycle and you want to make sure that you are executing the correct version of the plugin. Although you should see a consistent view of your files on all machines, sometimes the Network File System (NFS) may be slow in updating the version of the plugin on the NPR control processor.

There are three possible bad outcomes:

The plugin responds with incorrect values.

This is the easy case since you can insert additional debug code to locate the problem.

The plugin is immediately unresponsive.

This normally doesn't occur unless you use pointers or you make a bad function call. Although the problem(s) could be anywhere, it is likely in the new code that you wrote. You should insert debug code to see what code has executed. Begin by inserting code in the user initialization routine plugin_init_user() that shows that your variables have been initialized properly and that the handle_msg() routine has been called.
Note: It is possible that the problem is in the handle_msg() code itself. A common problem is to try to create a string using a message buffer that is too small. This may happen when several variables have very large values (because they were not initialized) and the concatenated ASCII values exceed the 28-byte message body limit. Hopefully, the message routines would check for this, but this isn't always the case.

The plugin responds with correct values but then becomes unresponsive.

This is similar to the previous case but you may be able to locate the problem area more quickly. The good news is that some of the code seems to work (atleast once).

Test 3 (A Few Pings)

Now, direct packets to the plugin with a filter, and see if you can send a few ping packets through the plugin (e.g., "ping -c 3 n2p3"). Then, follow up with some control messages to see if the plugin is still responsive. If this test is successful, it indicates that the plugin can handle low-intensity traffic and works at some rudimentary level.

There are mainly four bad outcomes:

No ping packets are dropped and the plugin remains responsive to control messages, but the debug log file indicates some incorrect values.

This is the easy case since the incorrect values may pinpoint the problem code where you can insert additional debug code.

No ping packets are dropped but the plugin becomes unresponsive to control messages.

Hopefully, the debug log file will give some indication of where the problem is located. Quite often, a plugin becomes unresponsive to control messages because an assignment has run off the end of an array. Check the debug and control message calls themselves.

One or more ping packets are received by the sender.

The fundamental question is "How far did the first missing packet get?" It is possible that the packet got to the destination, but that the ICMP echo reply packet never made it back to the sender. Here are some things to consider:

Did the packet get to input port of the NPR with the plugin?

You can chart the packet counter at the input port (Port X => Monitoring => RXPKT) which increments for every packet seen coming from port X by the NPR's RX block.

Did the packet get to the plugin?

You can chart Plugin Counter 0 (Monitoring => Plugin Counter) which increments for every packet seen by handle_pkt_user().

Did the packet get out of the plugin's NPR?

You can chart the packet counter at the output port (Port X => Monitoring => TXPKT) which increments for every packet seen going out of port X by the NPR's TX block.

Did the packet get to the next hop?

The RXPKT and TXPKT counters can be monitored along the entire packet path to see where the lost packet was seen. If the next hop is a host, you can use the netstat -i eth1 before and after sending traffic to see how many packets made it to the destination host.

The plugin is immediately unresponsive; i.e., no debug messages, no control messages, and no ping packets are received.

This is like the last Test 2 case, and you will need to insert debug code to isolate the problem.

By default, the ping command will send a 64-byte ICMP Echo Request (56 bytes of payload, 8 bytes of header, 20 bytes of IP header) packets once per second. If you get the standard response, then ICMP request/reply packets made it from the sender to the destination and back. This alone doesn't mean that your plugin saw the ping packet, but following the steps above should. You can also use the control message command "g" (get counts) to query for the values of npkts, maxinq and ndrops.

Test 4 (Continuous Pings)

Now, send a continuous sequence of ping packets by omitting the "-c 3" flag. The plugin should be able to handle this traffic load even if you outputting debug messages. Then, repeat the verification steps in the previous test.

Test 5 (UDP/TCP Traffic)

Now, you are ready to send UDP packets but remember that you will lock up the plugin if you send more than a few packets per second and the plugin is sending debug messages. So, you may want to send the UDP packets at a low rate. For example, the iperf command:

    iperf -c n2p3 -u -b 12k

will send UDP packets to n2p3 at 12 Kbps or about one 1470-byte packet every second. You might be able to send at 80 times this if you only generate one debug message per data packet (a little less than 1 Mbps for maximum-sized packets). If this works, then turn off all debugging output from the plugin and try increasing the iperf sending rate. The page Handling Errors shows how to record some debug information while sending at high traffic rates.

Even if your plugin is for TCP packets, it might be worthwhile testing a version of the plugin with UDP traffic even if it means changing the plugin code. The difference between UDP and TCP traffic when using iperf is that the UDP traffic will be near constant rate but the TCP traffic will be bursty. In fact, during the slow start phase, there may be pairs of back-to-back packets spaced by approximately the transmission delay of a single packet. Furthermore, the average input rate may reach twice the bottleneck capacity. This bursty behavior will be more stressful than the one provided by UDP iperf traffic. One way to limit this burstiness is to set the bottleneck capacity to some low rate, but note that the smallest NPR port rate is 0.683 Mbps which translates to about 6 packets per second for maximum-sized packets.

Revised: Fri, Jan 30, 2009

NPR Tutorial >> Writing A Plugin	TOC