nagios | ForDoDone

# while true; do date | tr '\n' ' ' && /usr/local/nagios/libexec/check_smtp -H somemailhost -p 25 ; sleep 5s; done; Fri Aug 7 11:15:35 MST 2015 SMTP OK - 0.108 sec. response time|time=0.108085s;;;0.000000 Fri Aug 7 11:15:41 MST 2015 SMTP OK - 0.111 sec. response time|time=0.111096s;;;0.000000 Fri Aug 7 11:15:46 MST 2015 SMTP OK - 0.110 sec. response time|time=0.110013s;;;0.000000

During massive outages (which thankfully happen rarely), I like to keep my Nagios monitoring machines online and working. This is because I like to have a view of the servers with remaining problems, or processes that didn’t come back online correctly. However, I stop our MTA (postfix) on those servers, because I don’t want to receive texts and emails complaining about all the servers that are still down. Once the problem is resumed, I could just startup postfix, but lets take a look at the mailqueue:

# mailq 2>&1 | tail -1 | cut -d " " -f 5-
428 Requests.

Hmm… seems a bit high. If we start postfix again, guess how many text messages are going to wind up on my phone? Let’s drop all of the messages in the queue:

# postsuper -d ALL
postsuper: Deleted: 428 messages

Now we can start postfix without excessive messages being sent.

Alternatively, if the main MX relays go down for a period of time, you will see the mailqueue fill up with undelivered mail. After you bring the MXs back online, the mail may be sent to them immediately. Your MTA probably has an increasing retry interval, which could lead to a one hour delay or longer. Do this to attempt to relay all the mail in the queue:

# postqueue -f

It will try to reconnect immediately to the MX relay, and deliver all mail if it can.

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

ForDoDone

Tales from the Command Line…

Tag Archives: nagios

nagios aggressive cli smtp monitoring

drop messages from mail queue