# while true; do date | tr '\n' ' ' && /usr/local/nagios/libexec/check_smtp -H somemailhost -p 25 ; sleep 5s; done; Fri Aug 7 11:15:35 MST 2015 SMTP OK - 0.108 sec. response time|time=0.108085s;;;0.000000 Fri Aug 7 11:15:41 MST 2015 SMTP OK - 0.111 sec. response time|time=0.111096s;;;0.000000 Fri Aug 7 11:15:46 MST 2015 SMTP OK - 0.110 sec. response time|time=0.110013s;;;0.000000
During massive outages (which thankfully happen rarely), I like to keep my
Nagios monitoring machines online and working. This is because I like to have a view of the servers with remaining problems, or processes that didn’t come back online correctly. However, I stop our MTA (
postfix) on those servers, because I don’t want to receive texts and emails complaining about all the servers that are still down. Once the problem is resumed, I could just startup
postfix, but lets take a look at the mailqueue:
# mailq 2>&1 | tail -1 | cut -d " " -f 5- 428 Requests.
Hmm… seems a bit high. If we start
postfix again, guess how many text messages are going to wind up on my phone? Let’s drop all of the messages in the queue:
# postsuper -d ALL postsuper: Deleted: 428 messages
Now we can start
postfix without excessive messages being sent.
Alternatively, if the main MX relays go down for a period of time, you will see the mailqueue fill up with undelivered mail. After you bring the MXs back online, the mail may be sent to them immediately. Your MTA probably has an increasing retry interval, which could lead to a one hour delay or longer. Do this to attempt to relay all the mail in the queue:
# postqueue -f
It will try to reconnect immediately to the MX relay, and deliver all mail if it can.