Well I continued my work with Nagios these last few days since I got back from my vacation. Before I go into the details, my purpose: I’m setting up Nagios for monitoring of some our production machines. We currently have other monitoring in place from the colo facility but its a little too basic for my tastes. I’m going to find out one way or another if our production servers go down, but I’d rather find out ahead of time if its preventable. I’ve been using MRTG to check a number of system stats, but thats not going to help if something goes wrong.
The servers I’m monitoring are Dell Poweredge machines, and I happen to have MIB’s for their CPU temps, case temps, and case fan RPM’s. I want to measure all of it (along with IIS, SQL, DiskUsage, etc) so if I can ever avoid something going wrong — I will be notified. I wrote up some new Nagios check scripts based off my last scripting attempt. One for fanspeed, and one for CPU temp. I also ended up writing a simple SMTP notification script because I couldn’t seem to get the built in notifications to email me.
First is fanspeed. check_fanspeed.pl. Its simple enough, if fan speed falls below warning RPM level — send up a flag. If RPM falls below critical send up the flares. I personally prefer 3000 and 1000 RPM for warning/critical respectivly.
Second is CPU temp. check_cpu_temp.pl. Same basis again. I prefer 115/125 deg. Now this _IS_ in fahrenheit. Its what I prefer, its simple to change it to centigrade — on the temp conversion line just remove * 1.8 + 32 (You still need the /10).
Lastly is notify.pl. Its another very simple script, but it does the job. In order to use it you have to add the define commands somewhere (probably misccommands.cfg). They are included below:
define command {
command_name notify-by-perl-host
command_line /usr/local/nagios/libexec/notify.pl $CONTACTEMAIL$ “$HOSTNAME$ Alert!” “$HOSTALIAS$ is $HOSTSTATE$||$LONGDATETIME$”
}
define command {
command_name notify-by-perl-serv
command_line /usr/local/nagios/libexec/notify.pl $CONTACTEMAIL$ “$HOSTNAME$ $SERVICEDESC$ Alert!” “$SERVICEDESC$ on $HOSTALIAS$ is $SERVICESTATE$||$LONGDATETIME$”
}