ONL FAQ

This FAQ is a compilation of the most frequently asked questions. It is NOT a tutorial. You should still use the tutorial pages find explanations of features and concepts. You may want to use your browsers Edit => Find to search for help or use our index below. The index has been divided into these sections to make it easier to find help:

Tunnels and Connectivity	Problems with the RLI tunnel, connectivity, and ssh.
The Remote Laboratory Interface	Problems with using the basic RLI features but excluding filters, queues, and plugins.
Filters, Queues and Bandwidth	Problems with filters, and queues.
Router Plugins	Problems with using and writing router plugins.
The SYN Demo	Problems running the SYN demonstration in Tutorial => Examples => The SYN Demo
Unix Commands	Problems with Unix commands such as source, ping, netstat, and iperf.

Selecting a link from this table will take you to the Questions section. If you find a potentially helpful question, select the Q-label and that link will take you to the question/answer in the Questions and Answer section.

Index

Tunnels and Connectivity

Q1: I got a dialogue box that contains the error message Unable to connect: couldn't get I/O for 127.0.0.1.

Q2: I have recived this warning when i added the ssh tunnel:

		Warning: Permanently added the RSA host key for IP address '10.0.1.3'
		to the list of known hosts.

what does it mean and whats wrong?

Q3: I can't ssh into any hosts
Q4: I was trying to make my first trial but i got IO Exception errors. I created the ssh tunnel and then launched the RLI, but it didn´t work. My computer is an iMac with OS X 10.4.

The Remote Laboratory Interface

Q1: Do we have to have a reservation before we define our network topology?
Q2: The RLI complains about my RLI.jar file being out of date. What does that mean?
Q3: I got an "allocation failure" message when I hit commit. What can I do about that?
Q4: I was working on defining my network topology, and I got a message about my reservation about to be canceled? Why did I get that message when I was obviously using the RLI?
Q5: All NSPs were reserved when my NSP failed to commit. I tried committing again, but then I got a message about insufficient resources. What can I do about that?

Filters, Queues and Bandwidth

Q1: My bandwidth chart shows 110 Mbps of UDP traffic coming out of an egress port where I know I set the link rate to 100 Mbps. What is going on?
Q2: Why does the Queue Table for Port 3 show 9.972 Mbps when I set the egress link rate to 10 Mbps?
Q3: I am sending UDP traffic from Port 2 to Port 3 at 20 Mbps, and I have set the egress link at Port 3 to 10 Mbps. But when I monitor Port 3 => Egress => Port Bandwidth, the graph shows that there is 20 Mbps of traffic. Shouldn't it show only 10 Mbps since I set the egress link rate to 10 Mbps?
Q4: When I send back-to-back 1500-byte packets through a 10 Mbps egress link, I noticed that the interarrival times between the first two packets looked like it was for a 1 Gbps link and for a 10 Mbps link for the remaining packets. What did I do wrong?

Router Plugins

Q1: After changing pdelay.c, I get all of these compile time errors even though when I compile the code on a non-ONL host using g++ I don't. What am I doing wrong?
Q2: If I recompile a plugin, do I have to unload the plugin first?
Q3: "double x;" doesn't seem to work. Why?
Q4: When I use the math functions pow(), log() and ceil(), I get undefined symbols. What is wrong?
Q5: For my plugin I need to use the random() function, but when I include stdlib.h there's a conflicting type compilation error because libkern.h also declares random and malloc.h declares malloc, free and realloc.
Q6: I changed pdelay to lab4, pdelay.c to lab4.c and pdelay.h to lab4.h and recompiled. I got lots of errors. What's going on?
Q7: In the pdelay example, unlike the drop packet example, msr_freebuffer is never called. In what situations should we call msr_freebuffer, and is its absense in the example code an error?
Q8: My bandwidth plots are looking weird. First of all, at a time when my bottleneck link should not have had any traffic on it, the egress port bandwidth trace was already being drawn at 6.8 Mb/s. When I ran an iperf tcp session the port 6 egress did raise above 6.8 Mb/s and then after the session stopped, the trace returned back to 6.8 Mb/s. I tried a ping test this moring with the RLI.jar and the plot looked fine for all my traces I added to the plot. So, the plots looked ok with RLI.jar but weird with betaRLI.jar.
Q9: Is there a function to compute the UDP checksum?

Unix Commands

Q1: When I do the following
```
		onlusr> source /users/onl/.topology.csh
	
```
I get an unexpected end of file.
Q2: I couldn't get the touchme script to run unless I enterred ./touchme.
Q3: 'ping' hangs

The SYN Demo

Q1: When I enter http://localhost:8080/~MyUserName/syndemo to the browser, I get a "page not found" error message. Can you tell me what is wrong?
Q2: I got a dialogue box that contains the error message Unable to connect: couldn't get I/O for 127.0.0.1.
Q3: I received this message after issuing the command:
```
		$n1p2> ssh -g -L 8080:n1p3:80 n1p3
	
```
"Connect to host n1p3 port 22: Network is unreachable." What could be the problem?
Q4: Regarding the "SYN Attack Mitigation DEMO", I succeeded in initializing and configuring most of the components of the demo. However, until now there is one component (and maybe others in the coming days) that I couldn't run or configure. It is the touchme script that is very important for preventing the Web client to display the six images from a local cach instead of requesting them from the Web server. Do you have any clues or hints on how to run it succesfully?
Q5: When I do the following
```
		onlusr> source /users/onl/.topology.csh
	
```
I get an unexpected end of file.
Q6: When i enter
```
		http://localhost:8080/~YourUserName/syndemo
	
```
to the browser I get Unable to connect and Firefox can't establish a connection to the server at localhost:8080.
Q7: I replaced the touchme script in my directory with the one in the ONL directory. However, when I accurately typed the command mentioned in the tutorial example:
```
		(cd .www-docs/syndemo/Images; touchme &)
	
```
the script did not run. The following error message was shown in the command-line window:
```
		-bash: cd: .www-docs/syndemo/Images: No such file or directory
		-bash: touchme: command not found
		[1]+  Exit 127                            touchme
	
```
Q8: When I tried to run the attack daemon on onlusr Win 3 by typing the command:
```
		sudo /usr/local/bin/sec/synster
	
```
the following message appeared in the window:
```
		"Sorry, user mndd is not allowed to execute
		'/usr/local/bin/sec/synster' as root on onl41.arl.wustl.edu."
	
```
where onl41.arl.wustl.edu is the external address of n1p1a (where the attacker resides).
Q9: I couldn't get the touchme script to run unless I enterred ./touchme.
Q10: When I want to enter the plugin directory to port 3 plugins it gives me an error "found no classes in /users/mndd/plugins in nsp NSP1".
Q11: Now everything is fine. The Web client with the three java panels now appear, and the six images appear in 3-second intervals. And I believe that the touchme script and the synster are ok with me. The only problem at this point is that the Image Traffic and Bandwidth monitor displays do not show any traffic. I am really confused with this problem. The images are showing up with the touchme script running, although nothing is displayed on the monitor displays. What do you think is the problem? I am really confused.

Q12: I have recived this warning when i added the ssh tunnel:

		Warning: Permanently added the RSA host key for IP address '10.0.1.3'
		to the list of known hosts.

what does it mean and whats wrong?

Q13: Regarding the SYN Demo, everything is ok. I am very happy because I believe that I have achieved about 95% of it. I can now see the Image Traffic, and the bandwidth of both valid requests and attacker. I can also see the Partial Connections curve in the display window as seen by the plugin. However, the partial connections curve as seen by the Web server is not shown at all. I don't know whats the problem.
Q14: The second and final problem in my work is that both the Image Traffic and Bandwidth display windows show traffic for some time (about 30 sec) then stop showing more traffic.
Q15: When I enter http://localhost:8080/~mndd/syndemo to the browser, I get "Page not found" error message.

Questions and Answers (Tunnels and Connectivity)

Q1: I got a dialogue box that contains the error message Unable to connect: couldn't get I/O for 127.0.0.1.
A1: It looks like you did not build your RLI tunnel. See the "Getting Started" link in the sidebar of the ONL web page. The least troublesome way to build the RLI tunnel is if you can run the ssh command from the command line:
```
	ssh -L 7070:onlsrv:7070 onl.arl.wustl.edu
```
If you are using a graphical tool like PuTTY or SSH client, you will have to follow precisely the steps given in the Getting Started sidebar. The precise steps for building the RLI tunnel are given at the RLI SSH Tunneling link on that page. If you are taking a course that is using ONL, someone should be assigned to help you with this if you have problems.
Q2: I have recived this warning when i added the ssh tunnel:
```
	Warning: Permanently added the RSA host key for IP address '10.0.1.3'
	to the list of known hosts.
```
what does it mean and whats wrong?
A2: The short answer is that there is nothing wrong.
10.0.1.3 is the IP address of the eth0 interface to the host onlusr; i.e., the ONL user host. You can see this by enterring:
```
	onlusr> /sbin/ifconfig
```
while logged into onlusr and note the inet addr field in the eth0 entry. You didn't say so, but my guess is that you got this message when you tried to build an SSH tunnel from one ONL host to another. More specifically, you must have enterred the ssh command FROM onlusr to some other onl host.
Whenever you SSH to a remote host X from host Y, the IP address of host Y (the FROM host) is looked up in the file ~/.ssh/known_hosts (a plaintext file with RSA keys) at the remote host X. If it is there, then you are connected to that host. If not, then SSH will add the hostname to the file after authentication. In your case, I noticed that your ~mndd/.ssh/known_hosts contains an entry for 10.0.1.3 as its first entry ... which makes sense. This is why ...
Your onl home directory is NFS mounted on every onl host which also means that the file ~mndd/.ssh/known_hosts is accessible on every onl host. Suppose that you are on onlusr, and you enter something like:
```
		onlusr> ssh onl31
```
All of your onl hosts (given to you through File => Commit) are setup to accept the ssh connection without asking for a password. But the ssh server (daemon) running on onl31 will still do some authentication. One thing it does is look at the file ~mndd/.ssh/known_hosts on onl31 to see if the IP address of onlusr (10.0.1.3) is a host that you have allowed to login to onl31 before. The first time you do this, there is nothing in the known_hosts file. Since you are allowed to login to your onl hosts from other onl hosts, ssh adds the host to the known_hosts file.
Enter the command "man ssh" and scroll down to the section "Server authentication" for more details.
Q3: I can't ssh into any ONL hosts.

A3:
+ You can not log into an ONL host unless you have an ONL account. This means that you must have either registered for an account through the ONL Web page or you received a predefined login name as part of a course/tutorial (and an email).
+ You can only ssh into onl.arl.wustl.edu from outside of the testbed. Once the ssh succeeds, you will end up on the host acting as the user host (currently onlusr).
+ You can only ssh into other hosts after they have been commited to you; i.e., wait for the experiment commit to finish first.
Q4: I was trying to make my first trial but i got IO Exception errors. I created the ssh tunnel and then launched the RLI, but it didn´t work. My computer is an iMac with OS X 10.4.
A4: All requests from the RLI to the testbed go through the ONL Proxy Daemon. This type of error usually means that the connection between the RLI and that Proxy Daemon either was lost or never established. Here are some possibilities:
- The Daemon died.
- Your SSH tunnel was incorrectly created.
- The SSH tunnel is OK but something at your end is causing the problem.

Questions and Answers (The Remote Laboratory Interface)

Q1: Do we have to have a reservation before we define our network topology?

A1: No. The reservation should be for those parts where you actually need to commit (bind) actual resources. You can either do that through advanced reservations (see sidebar) or the RLI will pop up a dialogue box that allows you to do it when you commit. But if the testbed is very busy, it is best to make an advanced reservation.
Q2: The RLI complains about my RLI.jar file being out of date. What does that mean?

A2: Yes, the RLI changes every once in a while. And it does complain if the version is old enough. We usually announce new versions to those using ONL as part of a course. The procedure for getting the RLI.jar file is the same as it has always been. You have two options:
1) Use HTTP: Click the "Get RLI.jar" link in the "Getting Started" page to download it from the Web. [[ If the resulting file is not the one above, then perhaps you need to flush your browser cache ... this should not be necessary unless you have a long lived www connection ]]
2) Use scp: The HTTP version is really obtained from onlusr.arl.wustl.edu:~onl/export/RLI.jar. So, you can SSH into onlusr and copy it from /users/onl/export/RLI.jar.
Q3: I got an "allocation failure" message when I hit commit. What can I do about that?

A3: Normally, this should not happen. But occassionally, an NSP or host can fail to properly initialize. If the NSP initialization fails, then close the experiment (File => Close) and try again. In rare cases when there are catastrophic hardware problems, all NSPs can end up in the repair state leaving no available NSPs. This situation can not be resolved until the staff fixes the underlying problem. If a single host or link fails, you can continue to use the NSP if you don't need that particular part of the setup. An email about the failure is sent to our staff, but the NSP is not placed in the repair state.
Q4: I was working on defining my network topology, and I got a message about my reservation about to be canceled? Why did I get that message when I was obviously using the RLI?

A4: The reservation is not considered to be in use until you commit. Do not ignore the message because indeed your reservation will be canceled because all reservations left unused for the first 30 minutes of the reservation period will be canceled. Some advice:
1) Make the beginning time of the reservation for when you think you will commit; and
2) Do a File => Commit even if you are not done with the network topology.
After the first commit, we assume that you have arrived for your reservation and we will not bother you anymore until near the end of the reservation period when you will get a warning message. But the RLI will pop up a dialogue box that asks if you want to extend your reservation period. If it is possible, the reservation will be extended. Even if the reservation is not extended, you can continue to work as long as no one else makes a reservation that will require your NSP.
Q5: All NSPs were reserved when my NSP failed to commit. I tried committing again, but then I got a message about insufficient resources. What can I do about that?

A5: Nothing. Email is automatically sent to our staff, and someone will look into the problem. But since reservations are now overbooked, we have to look at the NSP, fix the problem and put it back into service before there are sufficient resources. Sometimes the problem can be quickly resolved, but it depends on the nature of the problem.

Questions and Answers (Filters, Queues and Bandwidth)

Q1: My bandwidth chart shows 110 Mbps of UDP traffic coming out of an egress port where I know I set the link rate to 100 Mbps. What is going on?

A1: The bandwidth in most cases is measured inside the switch fabric where IP packets are encapsulated inside ATM cells. These cells consist of a 5-byte header and a 48-byte payload leading to a 10% overhead. You can remove this overhead from the bandwidth charts by clicking on the label (e.g., OPP BW 3) to get a dialogue box that gives some details about the measurement point. Remove the check mark in the include ATM header check box. See Tutorial => The Remote Laboratory Interface => Features of Monitoring Panels and Tutorial => The Remote Laboratory Interface => Monitoring Concepts.
Q2: Why does the Queue Table for Port 3 show 9.948 Mbps when I set the egress link rate to 10 Mbps?

A2: Egress output rates are controlled by a token bucket regulator that has a granularity of around 54.1 Kbps; i.e., all egress rates are integer multiples of 54.1 Kbps.
Q3: I am sending UDP traffic from Port 2 to Port 3 at 20 Mbps, and I have set the egress link at Port 3 to 10 Mbps. But when I monitor Port 3 => Egress => Port Bandwidth, the graph shows that there is 20 Mbps of traffic. Shouldn't it show only 10 Mbps since I set the egress link rate to 10 Mbps?

A3: That menu item is actually selecting a monitoring point that is inside the switch fabric leading into output Port 3, not what is going out of the link attached to Port 3. If you really want to see the bandwidth going out of Port 3 and you are sending fixed length packets, you could monitor Port 3 => FPX Counters => Egress Packets and multiply the packet count by the length of the packet using the Formula feature of monitoring charts. See Tutorial => The Remote Laboratory Interface => Monitoring Concepts.
Q4: When I send back-to-back 1500-byte packets through a 10 Mbps egress link, I noticed that the interarrival times between the first two packets looked like it was for a 1 Gbps link and for a 10 Mbps link for the remaining packets. What did I do wrong?

A4: The egress link rate is controlled by an FPX token bucket regulator. The current implementation has this behavior. We are looking into changing it to conform more to what you would expect where the interarrival times are the same for packets of the same size. See Tutorial => Filters, Queues and Bandwidth => NSP Architecture => Link Rate.

Questions and Answers (Router Plugins)

Q1: After changing pdelay.c, I get all of these compile time errors even though when I compile the code on a non-ONL host using g++ I don't. What am I doing wrong?

A1: The plugins have to be written in C, not C++. All variable declarations in C have to be at the beginning of a block. They can't appear randomly through out the code as they can be in C++.
Q2: If I recompile a plugin, do I have to unload the plugin first?
A2: You should:
- Delete the plugin instance
- Unload the plugin
- Create an instance of the plugin again
If I know that I will be outputting debug messages, I just change the first MSRDEBUG call in the handle_packet routine. But I will sometimes change the handle_message routine in a way that I can tell if I have the new plugin instance loaded. For example, if I keep message type 0 to be a Hello message, I could return the version number as part of the reply message.
Q3: "double x;" doesn't seem to work. Why?

A3: Plugins are in the kernel, and the kernel doesn't have floating point. You will have to do it in integer and perhaps use approximations. For example, 0.01 is 1/100. It is a pain when going to smaller fractions. That's why if you look at something like Van Jacobson's RTT estimation calculation it involves powers of 2 so that it can be done using the shift operator ... i.e., x/8 is x>>3.
Q4: When I use the math functions pow(), log() and ceil(), I get undefined symbols. What is wrong?

A4: Plugins are kernel code, and the kernel doesn't have these functions. You will have to code them yourself. But pow() and ceil() are trivial. log() is not trivial. But I suggest you approximate log(x). Your application is probably using a limited range of x. So, use small table of log values and use linear interpolation. Or, use the first few terms of a Taylor series scaled to be integer. What a pain. I would just do a very crude approximation using linear interpolation. You can't be expected to write a real kernel version of log(x) for a 2 week project.
Q5: For my plugin I need to use the random() function, but when I include stdlib.h there's a conflicting type compilation error because libkern.h also declares random and malloc.h declares malloc, free and realloc.
A5: This is kernel programming ... there is no such thing as stdlib.
I describe a workaround. Yes, it uses rand() which doesn't generate very good random numbers, but who cares right now.
Here is what you do:
- Copy rand.c from the directory ~onl/stdPlugins/dropdelay-610/ to yours:
```
	  	cp ~onl/stdPlugins/dropdelay-610/rand.c .
		
```
- Change your Makefile so that you list rand.c:
```
	  	SRCS=$(KMOD).c rand.c
		
```
- Compile as before. Make sure that you check for undefined symbols as in any standard Makefile (look at the one in the dropdelay-610/ directory).
Q6: I changed the name of my plugin from pdelay to lab4, pdelay.c to lab4.c and pdelay.h to lab4.h and recompiled. I got lots of compile errors. What's going on?
A6: You need to do a global replace of "pdelay" with "lab4" in both lab4.c and lab4.h. If you use vim, do something like:
```
	  	... make a backup copy of lab4.c ...
	  	vim lab4.c
	  	:g/pdelay/s//lab4/g
		:wq
	  
```
Q7: In the pdelay example, unlike the drop packet example, msr_freebuffer is never called. In what situations should we call msr_freebuffer, and is its absense in the example code an error?

A7: It is not an error because the kernel frees the buffer after it forwards it in the delay plugin. If we were really good, we would have defined a msr_drop_pkt function which would encapsulate the freeing of the buffer and you wouldn't even know it was happening when you called it. But we didn't. So, stats.c shows the explicit dropping because the kernel has no idea that the buffer needs to be freed.
Q8: My bandwidth plots are looking weird. First of all, at a time when my bottleneck link should not have had any traffic on it, the egress port bandwidth trace was already being drawn at 6.8 Mb/s. When I ran an iperf tcp session the port 6 egress did raise above 6.8 Mb/s and then after the session stopped, the trace returned back to 6.8 Mb/s. I tried a ping test this moring with the RLI.jar and the plot looked fine for all my traces I added to the plot. So, the plots looked ok with RLI.jar but weird with betaRLI.jar.
A8: You are saying that there seems to be this background traffic of 6.8 Mbps, right? What you need to do is turn off Distributed RP-Queueing. RLI.jar should turn OFF DQ by default. Do this:
```
	  	NSP => Queueing
	  	... Make sure that DQ is NOT checked ...
	  
```
i.e., Click the center of the NSP. The DQ algorithm automatically computes ingress side VOQ rates, but in doing so, generates around 6 Mbps of control traffic. So, turn it off. Now, the default VOQ rates will be static (default = 600 Mbps) and there will be no extra control traffic.
Q9: Is there a function to compute the UDP checksum?

A9: Use assign_udpCksums((iphdr_t *) iph) where iph is a pointer to the IP header. assign_udpCksums is defined in /users/onl/wu_arl/msr/rp/plugins/include/ipnet.h and is included by #include <plugins/include/ipnet.h>. But remember that all of the remaining fields in the IP and UDP headers must already have their final values; i.e., you don't want to compute the checksum and then decide to change one of the header fields.

Questions and Answers (Unix Commands)

Q1: When I do the following
```
	onlusr> source /users/onl/.topology.csh
```
I get an unexpected end of file.
A1: This looks like you are trying to source a c-shell script when you are actually running the bash shell. Yes, when I enter:
```
	ls -al ~mndd
```
I see files in your home directory like .bashrc. And when I enter:
```
		ypcat passwd | grep mndd
```
I see:
```
	sec:x:5261:5005:max nobody:/users/mndd:/bin/bash
```
which indicates (last field) that your shell is bash and not csh. So, you need to do this:
```
		onlusr> source /users/onl/.topology
```
i.e., source the file .topology, NOT .topology.csh.
Q2: I couldn't get the touchme script to run unless I enterred ./touchme.

A2: Right. It looks like the default command search PATH for most users does not contain the current directory ("."). That means that if touchme is in the current directory, you will need to enter ./touchme in order for your shell to find the script. Also, since it is a script, check that it has execute permissions.
Q3: 'ping' hangs

A3:
+ Are you pinging from the correct host; i.e., usually not onlusr?
+ Are you pinging to the correct host?
+ Have you installed routes in both the forward and reverse directions? (The brute force method: 'Topology => Generate default routes' will generate default routes on all ports) (Note: This should not be necessary if you are using a predefined configuration file unless your instructor says that you need to define routes.)

Questions and Answers (The SYN Demo)

Q1: When I enter http://localhost:8080/~mndd/syndemo to the browser, I get a "page not found" error message. Can you tell me what is wrong?
A1: There can be many reasons, but it looks like you (and the rest of the students in your class) need to make your home directories world searchable.
I will change all of your home directory permissions to 711 (rwx --x --x) so you don't have to do this.
REPEAT: You do not have to do anything because I have already changed the permissions on JUST home directories.
But if you want to know how to do it on your own and test it easily, read on. Do this:
```
	chmod 711 ~UserName
```
where UserName is your username. This makes your home directory searchable by everyone including Apache.
I tested this idea by changing the permissions on the home directory and then using a test tunnel that goes only over the control network directly to the host $n1p3 by doing this (replace UserName with your username and $n1p3 with the external interface name of the host on port 3):
- Make sure .www-docs directory is world readable/searchable
- Put a test web page in the .www-docs subdirectory
- Build test tunnel:
- Enter http://localhost:8080/~UserName to your browser
Q2: I got a dialogue box that contains the error message Unable to connect: couldn't get I/O for 127.0.0.1.

A2: It looks like you did not build your RLI tunnel. See the "Getting Started" link in the sidebar of the ONL web page.
Q3: I received this message after issuing the command:
```
	$n1p2> ssh -g -L 8080:n1p3:80 n1p3
```
"Connect to host n1p3 port 22: Network is unreachable." What could be the problem?
A3: I assume that you issued the following command from $n1p2 (where $n1p2 is either onl32, onl38, onl26 or onl11 depending on what NSP was assigned to you during the commit):
```
		$n1p2> ssh -g -L 8080:n1p3:80 n1p3
```
This means that you want to create a tunnel for port 8080 on $n1p2 going to port 80 on n1p3 and that the terminal session will log onto the host n1p3 through the NSP. Port 80 is the usual port that Apache listens for requests.
The "Connect to host n1p3 port 22: Network is unreachable" typically means that either there is a hardware problem along the physical path to n1p3 or that there is no route to n1p3 or there is a problem along the return path from n1p3.
I suspect that the real problem is that you have no route established from n1p2 to n1p3; i.e., the route table at port 2 is not configured properly. So, now the question is exactly what is the problem?
- Step 1 (Find out if there is connectivity from n1p2 to n1p3)
- Step 2 (Look at the route table at port 2 of the NSP)
Q4: Regarding the "SYN Attack Mitigation DEMO", I succeeded in initializing and configuring most of the components of the demo. However, until now there is one component (and maybe others in the coming days) that I couldn't run or configure. It is the touchme A4: Assuming that your ONL login name is mndd, I see that you are missing the file /users/mndd/.www-docs/syndemo/Images/touchme.

Q5: When I do the following onlusr> source /users/onl/.topology.csh I get an unexpected end of file. A5: This looks like you are trying to source a c-shell script when you are actually running the bash shell. Yes, when I enter: ls -al ~mndd I see files in your home directory like .bashrc. And when I enter: ypcat passwd | grep mndd I see: sec:x:5261:5005:max nobody:/users/mndd:/bin/bash which indicates (last field) that your shell is bash and not csh. So, you need to do this: onlusr> source /users/onl/.topology i.e., source the file .topology, NOT .topology.csh. Q6: When I enter: http://localhost:8080/~YourUserName/syndemo to the browser I get Unable to connect and Firefox can't establish a connection to the server at localhost:8080. A6: This last problem indicates that you have not properly setup the tunnel at your client host so that traffic to port 8080 will go to the relay node. In the instructions for Approach 1, it says that you need to build the A-B-C tunnels like this: client> ssh -L 5050:$n1p1a:5050 -L 3552:$n1cp:3552 -L 8080:$n1p2:8080 onl.arl.wustl.edu where you need to replace $n1p1a, $n1cp and $n1p2 with the appropriate ONL host names which you can get by clicking on the n1p1a icon, the NSP icon, and the n1p2 icon (e.g., onl35). These instructions are about 3/4 of the way down the SYN Demo Web page. Q7: I replaced the touchme script in my directory with the one in the ONL directory. However, when I accurately typed the command mentioned in the tutorial example: (cd .www-docs/syndemo/Images; touchme &) the script did not run. The following error message was shown in the command-line window: -bash: cd: .www-docs/syndemo/Images: No such file or directory -bash: touchme: command not found [1]+ Exit 127 touchme A7: I am guessing that you ran it from some directory other than your home directory. I looked in your home directory ~mndd and you have the directories: .www-docs/ .www-docs/syndemo/ .www-docs/syndemo/Images/ So, that looks ok. The command you ran needs to be run from your home directory or you need to change it so that the path .www-docs/syndemo/Images is correct. I just su'd to your account and tried the command and it worked as expected. So, try on your $n1p3 host: cd (cd .www-docs/syndemo/Images; ./touchme &) The first command changes to your home directory. The second one executes the touchme command. If you want to kill the touchme command: ps -l kill -9 X where X is the PID of the touchme script. Q8: When I tried to run the attack daemon on onlusr Win 3 by typing the command: sudo /usr/local/bin/sec/synster the following message appeared in the window: "Sorry, user mndd is not allowed to execute '/usr/local/bin/sec/synster' as root on onl41.arl.wustl.edu." where onl41.arl.wustl.edu is the external address of n1p1a (where the attacker resides). A8: I pushed out a new sudo file to resolve this second problem. I tried it with your account on onl41, and it worked. So, try it now. The problem should go away. Q9: I couldn't get the touchme script to run unless I enterred ./touchme. A9: Right. It looks like the default command search PATH for most users does not contain the current directory ("."). That means that if touchme is in the current directory, you will need to enter ./touchme in order for your shell to find the script. Also, since it is a script, check that it has execute permissions. Q10: When I want to enter the plugin directory to port 3 plugins it gives me an error "found no classes in /users/mndd/plugins in nsp NSP1". A10: [[ THIS IS THE IMPORTANT PARAGRAPH ]] It looks like you told the RLI that your plugin directory was /users/mndd/plugins. If so, it assumes that the directory will contain sub-directories with names of the form XXX-NNN where XXX is a plugin name and NNN is a plugin number; e.g., syn_demo-54203. Inside this subdirectory ~mndd/plugins/syn_demo-54203/ should be the code to that plugin. I just looked at your directory /users/mndd/plugins and it looks really messed up. But it now does have the subdirectory syn_demo-54203. So, you should no longer be getting that message ... even though the rest of the directory is messed up. But if I were you I would just use the standard syn_demo plugin stuff supplied by default. Only if that plugin worked would I attempt to build my own version of the plugin. I think the config file in ~onl/export/Examples/syndemo/syndemo.exp uses the default plugin located at ~onl/stdPlugins/syn_demo-54203. But if you really do want to create your own version of the plugin, and you are still having problems, I would recreate your plugin directory to conform to what I described above. Q11: Now everything is fine. The Web client with the three java panels now appear, and the six images appear in 3-second intervals. And I believe that the touchme script and the synster are ok with me. The only problem at this point is that the Image Traffic and Bandwidth monitor displays do not show any traffic. I am really confused with this problem. The images are showing up with the touchme script running, although nothing is displayed on the monitor displays. What do you think is the problem? I am really confused. A11: It is very hard for me to debug this without knowing more information. It sounds like you are saying that the browser cycling through a sequence of images (5?) and that you can control the image transmission by clicking the Stop/Start button in the Web Client. Right? Q1: Does the Bandwidth monitor show the Attacker traffic? I suspect yes. If not, that would be really strange. Q2: Did you deviate from the instructions in anyway? If so, how? Q3: What does the 'netstat' command tell you about traffic coming into the $n1p3 host? For example, do this WHILE IMAGE TRAFFIC IS GOING TO YOUR BROWSER: onlusr> ssh $n1p3 $n1p3> netstat -i ... You should see 3 lines (eth0, eth1, lo) ... ... Record the RX-OK and TX-OK numbers ... ... Wait about 10 sec ... $n1p3> netstat -i # again ... Record the RX-OK and TX-OK numbers again ... The difference between the second and first RX-OK numbers indicates how much traffic came into the host over that interface in 10 sec. The difference between the second and first TX-OK numbers indicates how much traffic went out of the host over that interface in 10 sec. The interface with the smallest numbers is probably the one going into the NSP. The other interface is connected to the control network. Unfortunately, the control network is sometimes on eth0 and sometimes eth1. So, I can't tell in advance which interface it would be ... unless I knew the external host name (e.g., onl31, onl37, onl25, or onl10). Now, in these 4 cases, I think eth0 is attached to an NSP and eth1 is attached to the control network. The command "ls -l ~/.www-docs/syndemo/Images/*.jpg" indicates the sizes of the images range from 83735 bytes (evening.jpg) to 108161 bytes (nudy.jpg); i.e., about 100 KB per image. So, we expect that if image requests are sent every 3 sec, that in 10 seconds, you will see a difference in outgoing traffic (image) of about 300,000 bytes and a small amount of incoming traffic (HTTP request). If I had to guess, the image traffic must be going over the control network instead of the private network (NSP). And that would only happen if Tunnel D were not built properly. This is only a guess. But if you did this: $n1p2> ssh -g -L 8080:$n1p3:80 $n1p3 instead of this CORRECT WAY: $n1p2> ssh -g -L 8080:n1p3:80 n1p3 when building Tunnel D what you are seeing would happen. This last tunnel forces the traffic coming into port 8080 of the relay node to go to port 80 of n1p3 over the interface to the NSP. Another possibility is that you built Tunnel D ok, but for some strange reason, the route going from n1p2 to n1p3 was not built properly and somehow the relay host $n1p2 found a path over the control network to $n1p3 (unlikely, but possible). Of course, if the above fails, we could verify all of this ... painfully ... using tcpdump on the right interfaces along the traffic path and deciphering the output. Q12: I have recived this warning when i added the ssh tunnel: Warning: Permanently added the RSA host key for IP address '10.0.1.3' to the list of known hosts. what does it mean and whats wrong? A12: The short answer is that there is nothing wrong. 10.0.1.3 is the IP address of the eth0 interface to onlusr. You can see this by enterring: onlusr> /sbin/ifconfig while logged into onlusr and note the inet addr field in the eth0 entry. You didn't say so, but my guess is that you got this message when you tried to build an SSH tunnel from one ONL host to another. More specifically, you must have enterred the ssh command FROM onlusr to some other onl host. Whenever you SSH to a remote host X from host Y, the IP address of host Y (the FROM host) is looked up in the file ~/.ssh/known_hosts (a plaintext file with RSA keys) at the remote host X. If it is there, then you are connected to that host. If not, then SSH will add the hostname to the file after authentication. In your case, I noticed that your ~mndd/.ssh/known_hosts contains an entry for 10.0.1.3 as its first entry ... which makes sense. This is why ... Your onl home directory is NFS mounted on every onl host which also means that the file ~mndd/.ssh/known_hosts is accessible on every onl host. Suppose that you are on onlusr, and you enter something like: onlusr> ssh onl31 All of your onl hosts (given to you through File => Commit) are setup to accept the ssh connection without asking for a password. But the ssh server (daemon) running on onl31 will still do some authentication. One thing it does is look at the file ~mndd/.ssh/known_hosts on onl31 to see if the IP address of onlusr (10.0.1.3) is a host that you have allowed to login to onl31 before. The first time you do this, there is nothing in the known_hosts file. Since you are allowed to login to your onl hosts from other onl hosts, ssh adds the host to the known_hosts file. Enter the command "man ssh" and scroll down to the section "Server authentication" for more details. Q13: Regarding the SYN Demo, everything is ok. I am very happy because I believe that I have achieved about 95% of it. I can now see the Image Traffic, and the bandwidth of both valid requests and attacker. I can also see the Partial Connections curve in the display window as seen by the plugin. However, the partial connections curve as seen by the Web server is not shown at all. I don't know whats the problem. A13: runTcpMon.pl produces the file tcp.data in your home directory. Your file contains all 0s. That script is looking at the file /proc/net/tcp. Enter the command "cat /proc/net/tcp" on any of the onl hosts and you will see that the file contains the state of TCP connections to that host. The script is just looking for connections to port 80 (HTTP) (or hex 0050) that are in state 03 (partial connection). I am guessing (and this is only a guess) that the runTcpMon.pl script is running on the wrong host. It should be running on your $n1p3 host. Q14: The second and final problem in my work is that both the Image Traffic and Bandwidth display windows show traffic for some time (about 30 sec) then stop showing more traffic. A14: The fact that you got traffic for 30 seconds is good because it shows that things are working. The fact that it stops after 30 seconds will be difficult to debug. Furthermore, the demo is fragile. So, if anything goes wrong, it doesn't recover well. Some comments: I would first try it again and see if this behavior repeats. If the touchme script is still running, the problem is probably NOT due to the browser thinking that it already has up-to-date images. You can tell that the modification dates of the images are in fact being updated: cd ~/.www-docs/syndemo/Images ls -l *.jpg ... wait 10 seconds ... ls -l *.jpg ... repeat occassionally to see that the modification dates are changing I don't know if the RTT between you and us would be an issue. It must be about 150-200 milliseconds. It shouldn't be an issue since the traffic volume is low. I don't know if network security at your end would be a problem. Again, I don't think so since you are able to begin an experiment. I am guessing that either the Web server or the plugin got overwhelmed. I would try turning the attacker off when the images stop and see if that allows outstanding requests to drain. Then turn it back on. And repeat this process. But if you are having this problem when the attacker has never been turned on, that would be a different problem. Q15: When I enter http://localhost:8080/~mndd/syndemo to the browser, I get "Page not found" error message. A15: I just tested this on your Web page and it worked. So, there is no problem with your files in ~mndd/syndemo/. The fact that you got "Page not found" probably indicates that you are communicating with some HTTP server, but I suspect the wrong one. Probably some tunnel was not built properly. But it is hard to tell. You will have to find out where the traffic is going by working backwards from the HTTP server $n1p3. For example, if you monitor the ingress and egress bandwidth at port 3 of your NSP, you should see some traffic going out of and coming into that port when you hit the carriage return on the URL. If your http request got to $n1p3 but the page was not found by Apache on $n1p3, you should see two spikes in the plot, one for egress (the request) and then one for ingress (the error message). If you don't see traffic at port 3, then go to port 2 and repeat the monitoring process. If you don't see traffic at port 2, then look at the relay node's ($n1p2) network interfaces using "netstat -i" before you enter the URL and after. My only other suggestion is to talk to your fellow student Mohammad Firas (login toshiba). He seems to have gotten most of the demo working.

Revised: Thu, Feb 1, 2007