monitor Apache memory usage

When looking at a webserver for memory usage, it’s important to consider the VSZ and RSS memory usage.

This little one liner gets the Total and Average VSZ and RSS usage as well as thread count, and prints those statistics every 5 seconds:

# while true; do ps auxfww | grep apache | grep -v -e cronolog -e grep | awk '{ vsum+=$5; rsum+=$6 } END { print "VSZ:", vsum, "(", vsum/NR, ") RSS:", rsum, "(", rsum/NR, ") Procs:", NR }'; sleep 5; done;
VSZ: 9896272 ( 341251 ) RSS: 1716216 ( 59179.9 ) Procs: 29
VSZ: 9547608 ( 340986 ) RSS: 1650100 ( 58932.1 ) Procs: 28
VSZ: 9546328 ( 340940 ) RSS: 1649044 ( 58894.4 ) Procs: 28
VSZ: 9861976 ( 340068 ) RSS: 1687968 ( 58205.8 ) Procs: 29
VSZ: 9868632 ( 340298 ) RSS: 1694496 ( 58430.9 ) Procs: 29
VSZ: 9853272 ( 339768 ) RSS: 1679112 ( 57900.4 ) Procs: 29
VSZ: 9853272 ( 339768 ) RSS: 1679264 ( 57905.7 ) Procs: 29
^C
#

So there are around 29 threads running right now on this server. The threads are using an average of 340MB per thread VSZ, and 59MB per thread RSS. The total of around 1.7GB of RSS looks good, on a machine with 8G physical memory.

NetApp show disk firmware progress

During disk firmware upgrades, you may wonder how long it’s taking or how it’s moving along. Use this one liner to count how many disks have the old and new firmware versions:

# ssh toaster "sysconfig -a" | grep NA0 | awk '{ if (/NA06/)i++; if (/NA01/)j++; } END{ print "NA01: "j" NA06: "i}'
NA01: 133 NA06: 91

So it’s moving along.

NetApp decode acp domain option

How does this option function to set a network? The acp.domain option is a convoluted decimal representation of the network portion of the IP address used for acp.

toaster*> options acp
acp.domain 65193
acp.enabled on
acp.netmask 65535
acp.port e0f

Take 65193 and convert it to binary: 1111111010101001. Then split it up into two (or more) octets: 11111110 10101001. Then convert each of the octets back to decimal: 254 169. Then reverse the order: 169 254. That is the acp network. The netmask portion is more straightforward. In this case our ACP network is 169.254/16.

You could hack a quick little one liner:

# for i in `echo "obase=2;65193" |bc | awk 'BEGIN{FS=""} {for(i=1;i<33;i++){printf $i; if(i==8)printf " ";}printf "\n"}'`; do echo "ibase=2;$i" |bc; done|tac | paste - - | sed 's/\t/./'
169.254
#

awk average multiple columns

If you have some output lined up in columns, use awk to average the columns. Here’s some sample output (from a NetApp “toaster> stats show -p flexscale-access”)

    
# cat sample.txt
    73   5480      0   1040  84     0     0      0     0      0     0      0       541
    73   6038     39   1119  84     0     0      0     0      0     0      0       475
    73   5018     19    859  85     0     0      0     0      0     0      0       348
    73   5960     20   1480  80   120     0    320     0      0     0      0       427
    73   6098      0   1019  85     0     0      0     0      0     0      0       486
    73   5220      0   1220  81     0     0      0     0      0     0      0       288
    73   5758     79   1319  81    59    39    319     0      0     0      0       500
    73   4419      0   2039  68     0     0      0     0      0     0      0       279
    73   5400      0    840  86     0     0      0     0      0     0      0       382
    73   5238      0   1299  80     0     0      0     0      0     0      0       389
    73   5449      0   1696  76    59     0    199     0      0     0      0       340
    73   5478      0   1419  79     0     0      0     0      0     0      0       414
    73   5020     20   1000  83     0     0      0     0      0     0      0       405
    73   4359      0   1059  80     0     0      0     0      0     0      0       295
    73   5838     39   1139  83     0    19      0     0      0     0      0       494
    73   6100     40   1720  78     0     0      0     0      0     0      0       480
    73   5398     19   1239  81     0     0      0     0      0     0      0       398
    73   5089     79   1097  82     0     0      0     0      0     0      0       459
    73   6178     19   1159  84     0    39    159     0      0     0      0       487
    73   4999      0   1239  80     0     0      0     0      0     0      0       345
    73   4820      0    880  84     0     0      0     0      0     0      0       339
    73   5467      0   1177  82     0     0      0     0      0     0      0       413
    73   4700     60   1480  76     0     0      0     0      0     0      0       337
#

And the column averages:

# cat sample.txt | awk '{for (i=1;i<=NF;i++){a[i]+=$i;}} END {for (i=1;i<=NF;i++){printf "%.0f", a[i]/NR; printf "\t"};printf "\n"}'
73      5371    19      1241    81      10      4       43      0       0       0       0       405
#

Here awk loops through each field in a row, and adds the value to an array (a[i]) with the key being the field number. Then at the end, it takes the total, and divides by the number of rows (NR) and prints that (without decimals). It separates each field by a tab (\t) and after the end record prints a newline (\n).

You could make it print totals, as well as averages. You could also make it print out the original data, or a field header to know what each column represents...

get current client IP addresses from web farm

To see what common IPs are connecting to your web farm, ssh to all of the servers and get a list of clients. Then sort it until you see most busy clients.

# for i in `seq 401 436`; do ssh www$i "netstat -natp | grep EST | grep apa | grep ":80 "| awk '{print \$5}' | cut -d : -f1"; done | sort | uniq -c | sort -nk1 | tail
      3 10.0.0.1
      3 10.0.0.10
      3 10.245.34.2
      4 10.29.45.89
      5 10.111.111.111
      5 10.239.234.234
      5 10.1.1.1
      5 10.2.2.2
      6 10.3.3.3
     10 10.100.100.100
#

The list shows the number of connections, and the client IP.

monitor host for slow ping times

When there is intermittent network latency to a host, it’s important to monitor a it for a pattern. Using ping can help narrow down what is causing the latency. VMWare load, bandwidth limitations, employee work patterns, backups, and many other sources could be the cause of the latency.

while true; do j=`ping <slowhost> -i1 -c1 2>&1 | grep icmp_req | awk '{print $7}' | cut -d = -f2 | cut -d . -f1`; if [ $j -gt 30 ]; then date | tr '\n' ' ';  echo $j; fi; sleep 1s; done;

This does a ping every second, and if it’s over a threshold (30ms in this case) it is considered unacceptable and logged with date.

change SSH listen address

If you have servers with internal and external interfaces, you may want to disable ssh on the external side. In this case we just get the internal IP address and tell sshd to only listen on that address:

sed -i "s/#ListenAddress 0.0.0.0/ListenAddress `grep address /etc/network/interfaces | grep 10.229 | awk '{print $2}'`/" /etc/ssh/sshd_config

Do it to many hosts:

for i in `seq 313 364`; do ssh ftp$i "sed -i \"s/#ListenAddress 0.0.0.0/ListenAddress \`grep address /etc/network/interfaces | grep 10.229 | awk '{print \$2}'\`/\" /etc/ssh/sshd_config"

And restart SSH:

for i in `seq 313 364`; do ssh ftp$i "service ssh restart"; done;

drop messages in mailqueue from single sender

Drop all messages from the sender ‘nagios’:

# for i in `mailq | tail -n +2 | awk  'BEGIN { RS = "" } { if ($7== "nagios") print $1}'`; do postsuper -d $i; done;
postsuper: B60BF9FB69: removed
postsuper: Deleted: 1 message
postsuper: C3B429FB6F: removed
postsuper: Deleted: 1 message
postsuper: 0306C9FB87: removed
postsuper: Deleted: 1 message
postsuper: E3BC79FB7E: removed
postsuper: Deleted: 1 message
postsuper: B32EA9FB65: removed
(many more lines)

Gets the mailqueue, starts the output on line 2, skipping the first header line, if the sender equals ‘nagios’ then print the first field, which is the message id. Then use postsuper to drop the message identified by it’s id.