monitor host for slow ping times

When there is intermittent network latency to a host, it’s important to monitor a it for a pattern. Using ping can help narrow down what is causing the latency. VMWare load, bandwidth limitations, employee work patterns, backups, and many other sources could be the cause of the latency.

while true; do j=`ping <slowhost> -i1 -c1 2>&1 | grep icmp_req | awk '{print $7}' | cut -d = -f2 | cut -d . -f1`; if [ $j -gt 30 ]; then date | tr '\n' ' ';  echo $j; fi; sleep 1s; done;

This does a ping every second, and if it’s over a threshold (30ms in this case) it is considered unacceptable and logged with date.

monitor host for connectivity

Sometimes, you want to be notified if a host goes up or down. Usually Nagios is perfect for this, but in this case I had an internet circuit, and all I cared about was knowing when the ISP deactivated it. Use ping in a loop, make 1 request every second, if ping doesn’t get a response, then send a text message (Verizon number) and stop the loop.

while true; do ping -nc 1 -W 1 5.6.7.8 | grep -q icmp; if [ "$?" == "1" ]; then echo "circuit is down" | mail <10-digit phone number no spaces>@vtext.com; break; fi; sleep 1s; done;

I also use the converse of this method when I want to know when a new circuit comes up.

test bandwidth throughput using iperf

Testing bandwidth between 2 hosts can be helpful to determine maximum transfer rates, and how much bandwidth can be pushed. The iperf utility is a great option for this. Set up one side as a listening server and the other side as a client. In this scenario I want to test an OpenVPN tunnel between a Raspberry Pi and a Vyatta server. The layout looks like this:

Laptop --(wireless)--> RPi --(wireless)--> Linksys router -> Enterasys router -> Cable Modem -> ISP -> INTERNET -> Vyatta -> remote host

Start iperf in server mode on the remote host listening for UDP datagrams:

# iperf -s -u
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size:   110 KByte (default)
------------------------------------------------------------

Start up the client. The -c flag tells it what remote host to connect to with iperf, the -u makes it use UDP, the -b tells it what bandwidth to try to achieve, and the -t sets how many seconds to run for:

# iperf -c 10.211.0.141 -u -b 1000K -t30

As the test runs, the server and client will output the rates. Here’s the server side output:

[  3] local 10.211.0.141 port 5001 connected with 10.211.0.6 port 40816
[ ID] Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams
[  3]  0.0-30.0 sec  3.57 MBytes    998 Kbits/sec  14.738 ms    4/ 2552 (0.16%)

So we were able to push a full 1 Mbps and only lost 4 datagrams, that’s pretty good. I would say this can comfortably push 1Mbps. (Side note: top reported openvpn process on RPi was at 9% CPU utilization) Let’s try for 2Mbps:

# iperf -c 10.211.0.141 -u -b 2000K -t30

And the output:

[  4] local 10.211.0.141 port 5001 connected with 10.211.0.6 port 56707
[  4]  0.0-30.0 sec  6.21 MBytes  1.74 Mbits/sec  7.346 ms  670/ 5103 (13%)

We were able to achieve 1.74 Mbps, but we lost 13% of the traffic. That is unacceptable! I would run again at 1500K and narrow in where exactly the break point is for throughput. To find the bottleneck you can run iperf from every (Linux) host in the chain to pinpoint the limitation. I suspect it could be the upload speed of my home internet connection, or the poor wireless signal of the client side of the RPi.

change SSH listen address

If you have servers with internal and external interfaces, you may want to disable ssh on the external side. In this case we just get the internal IP address and tell sshd to only listen on that address:

sed -i "s/#ListenAddress 0.0.0.0/ListenAddress `grep address /etc/network/interfaces | grep 10.229 | awk '{print $2}'`/" /etc/ssh/sshd_config

Do it to many hosts:

for i in `seq 313 364`; do ssh ftp$i "sed -i \"s/#ListenAddress 0.0.0.0/ListenAddress \`grep address /etc/network/interfaces | grep 10.229 | awk '{print \$2}'\`/\" /etc/ssh/sshd_config"

And restart SSH:

for i in `seq 313 364`; do ssh ftp$i "service ssh restart"; done;

remove IPv6 address from interface

If you don’t want to have an IPv6 address on an interface, you can quickly remove it:

# ip addr show dev eth0 
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:c2:fc:59 brd ff:ff:ff:ff:ff:ff
    inet 10.210.0.141/16 brd 10.210.255.255 scope global eth0
    inet6 fe80::20c:29ff:fec2:fc59/64 scope link 
       valid_lft forever preferred_lft forever

Only get the address:

# ip addr show dev eth0 | grep inet6 | awk '{print $2}'
fe80::20c:29ff:fec2:fc59/64

Remove it:

# ip -6 addr del `ip addr show dev eth0 | grep inet6 | awk '{print $2}'` dev eth0

Or do it to many hosts:

for i in `seq 201 272`; do ssh www$i "ip -6 addr del \`ip addr show dev eth0 | grep inet6 | awk '{print \$2}'\` dev eth0"; done;

use command line to add pool and virtual server to f5 BigIP load balancer

Using the bigpipe cli command (or it’s alias “b“) to add pools and virtual servers, can save you hundreds of clicks. This is a very old version of a bigip.

# uname -r
BIG-IP 4.5.14

This creates a pool named myserverpool and adds a single member to it:

# b pool myserverpool {member 172.16.11.201:80}

To add many servers just use a while loop:

# i=202; while [ "$i" -lt "237" ]; do b pool myserverpool add \{ member 172.16.11.$i:80 \}; i=$(($i+1)); done;

Repeat for https:

# b pool myserverpool_ssl {member 172.16.11.201:443}
# i=202; while [ "$i" -lt "237" ]; do b pool myserverpool_ssl add \{ member 172.16.11.$i:443 \}; i=$(($i+1)); done;

Now add health checks:

# i=201; while [ "$i" -lt "237" ]; do b node 172.16.11.$i:80 monitor use http; i=$(($i+1)); done;
# i=201; while [ "$i" -lt "237" ]; do b node 172.16.11.$i:443 monitor use https; i=$(($i+1)); done;

And finally create the virtual servers, pointing traffic to the corresponding pools:

# b virtual 5.6.7.8:80 use pool myserverpool
# b virtual 5.6.7.8:443 use pool myserverpool_ssl

troubleshoot UDP connectivity

When troubleshooting network connectivity issues the best method is to begin with the lower layers of the OSI model and work your way up until you eliminate the problem. It’s similar to the way some people troubleshoot helpdesk issues, where the first question is always “is it plugged in?”

Suppose you have 2 hosts, and you are trying to do an snmp query from one to the other. It doesn’t work.

# snmpwalk -v2c -c public 10.210.0.142 
Timeout: No Response from 10.210.0.142

You try pinging one from the other and you get replies. At this point you know that everything is working through layer 3, but something is not right with the transport layer. If you were troubleshooting a TCP based application, you could just use ol’ telnet to test if the port will open up. To see if a webserver is running you could try to telnet to port 80. UDP is a connectionless protocol, and can’t be tested like TCP. Here is where you can use nmap to see if the UDP port in question is open. In this case we know that by default snmpd listens on UDP 161.

# nmap -sU -v -p161 10.210.0.142

Starting Nmap 5.00 ( http://nmap.org ) at 2013-04-24 11:27 MST
NSE: Loaded 0 scripts for scanning.
Initiating ARP Ping Scan at 11:27
Scanning 10.210.0.142 [1 port]
Completed ARP Ping Scan at 11:27, 0.03s elapsed (1 total hosts)
Initiating UDP Scan at 11:27
Scanning example.fordodone.com (10.210.0.142) [1 port]
Completed UDP Scan at 11:27, 0.07s elapsed (1 total ports)
Host example.fordodone.com (10.210.0.142) is up (0.00100s latency).
Interesting ports on example.fordodone.com (10.210.0.142):
PORT    STATE  SERVICE
161/udp closed snmp
MAC Address: 00:0C:29:A2:BE:4D (VMware)

Read data files from: /usr/share/nmap
Nmap done: 1 IP address (1 host up) scanned in 0.43 seconds
           Raw packets sent: 2 (70B) | Rcvd: 2 (98B)

Well, there you go. The port is closed. No wonder our queries were timing out. Remove the -p161 part to just scan for all open UDP ports. Also, if your host doesn’t allow icmp, you can use -P0 to just skip host discovery, and assume it’s up. Depending whether they are open, filtered, or closed, you can start to diagnose what’s going on, i.e. snmpd crashed, firewall is blocking it, etc.