Nagios监控

配置nagios监控整个集群

拓扑图

mark

nagios定义脚本,非常灵活,可以用任何语言来写脚本监控你想要干的活,只需要给出参数和退出码
nagios有4大状态:
0: 正常,显示为ok
1: 警告,  显示为warning
2: 严重,显示为critical
3: 未知错误,显示为unkown

15-1 安装nagios

15-1-1 服务器安装nagios

#防火墙上传nagios安装包至内网nagios服务器
[root@nagios ~]# mkdir nagios
[root@firewall ~]# scp nagios-* 192.168.1.100:/root/nagios/
nagios-3.5.1.tar.gz                                                          100% 1722KB   1.7MB/s   00:00    
nagios-plugins-2.1.1.tar.gz                                                  100% 2615KB   2.6MB/s   00:00    
#安装依赖
[root@nagios ~]# yum install gcc glibc glibc-common php gd gd-devel libpng libmng libjpeg zlib httpd -y
#创建用户,添加额外的组
[root@nagios ~]# useradd nagios
[root@nagios ~]# groupadd nagcmd
[root@nagios ~]# usermod -G nagcmd nagios
[root@nagios ~]# usermod -G nagcmd apache
#开始编译

    [root@nagios ~]# cd nagios/
    [root@nagios nagios]# tar xf nagios-3.5.1.tar.gz 
    [root@nagios nagios]# cd nagios 
    [root@nagios nagios]# ./configure --with-command-group=nagcmd
                     .......
                     General Options:
 -------------------------
        Nagios executable:  nagios
        Nagios user/group:  nagios,nagios
       Command user/group:  nagios,nagcmd
            Embedded Perl:  no
             Event Broker:  yes
        Install ${prefix}:  /usr/local/nagios
                Lock file:  ${prefix}/var/nagios.lock
   Check result directory:  ${prefix}/var/spool/checkresults
           Init directory:  /etc/rc.d/init.d
  Apache conf.d directory:  /etc/httpd/conf.d
             Mail program:  /bin/mail
                  Host OS:  linux-gnu

 Web Interface Options:
 ------------------------
                 HTML URL:  http://localhost/nagios/
                  CGI URL:  http://localhost/nagios/cgi-bin/
 Traceroute (used by WAP):  /bin/traceroute


Review the options above for accuracy.  If they look okay,
type 'make all' to compile the main program and CGIs.
#出现上面的提示就表示编译成功,按提示make all 就可以了
[root@nagios nagios]# make all
                      .......

*** Support Notes *******************************************

If you have questions about configuring or running Nagios,
please make sure that you:

     - Look at the sample config files
     - Read the documentation on the Nagios Library at:
           
Nagios Library
before you post a question to one of the mailing lists. Also make sure to include pertinent information that could help others help you. This might include: - What version of Nagios you are using - What version of the plugins you are using - Relevant snippets from your config files - Relevant error messages from the Nagios log file For more information on obtaining support for Nagios, visit:
Home
************************************************************* Enjoy. #提示上面的内容这可以安装了 [root@nagios nagios]# make install && make install-init && make install-commandmode && make install-config && make install-webconf #为了检查mysql的状态,我们要安装mysql,安装完毕我们来把插件解压到目录 [root@nagios ~]# yum -y install mysql mysql-devel -y [root@nagios nagios]# tar xf nagios-plugins-2.1.1.tar.gz -C /usr/local/src/ [root@nagios nagios]# cd !$ cd /usr/local/src/ [root@nagios src]# cd nagios-plugins-2.1.1/ [root@nagios nagios-plugins-2.1.1]# ./configure --with-nagios-user=nagios --with-nagios-group=nagcmd ......... config.status: creating po/POTFILES config.status: creating po/Makefile --with-apt-get-command: --with-ping6-command: /bin/ping6 -n -U -w %d -c %d %s --with-ping-command: /bin/ping -n -U -w %d -c %d %s --with-ipv6: yes --with-mysql: /usr/bin/mysql_config --with-openssl: yes --with-gnutls: no --enable-extra-opts: yes --with-perl: /usr/bin/perl --enable-perl-modules: no --with-cgiurl: /nagios/cgi-bin --with-trusted-path: /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin --enable-libtap: no [root@nagios nagios-plugins-2.1.1]# make && make install [root@nagios nagios-plugins-2.1.1]# ls /usr/local/nagios/libexec/ check_apt check_file_age check_load check_ntp check_sensors check_uptime check_breeze check_flexlm check_log check_ntp_peer check_simap check_users check_by_ssh check_ftp check_mailq check_ntp_time check_smtp check_wave check_clamd check_http check_mrtg check_nwstat check_spop negate check_cluster check_icmp check_mrtgtraf check_oracle check_ssh urlize check_dhcp check_ide_smart check_mysql check_overcr check_ssmtp utils.pm check_dig check_ifoperstatus check_mysql_query check_ping check_swap utils.sh check_disk check_ifstatus check_nagios check_pop check_tcp check_disk_smb check_imap check_nntp check_procs check_time check_dns check_ircd check_nntps check_real check_udp check_dummy check_jabber check_nt check_rpc check_ups #安装完成,开始设置密码登录 [root@nagios nagios-plugins-2.1.1]# htpasswd /usr/local/nagios/etc/htpasswd.users mafei0728 .... #真实机器无法访问内网,太麻烦,为了不影响实验,我们用虚拟Ip映射出去,我们将织梦网站端口改成8080(不赘述),避免冲突 [root@nagios nagios-plugins-2.1.1]# ifconfig eth2:1 192.168.1.105 #在防火墙上 [root@firewall /]# iptables -t nat -A PREROUTING -p tcp --dport 80 -j DNAT --to 192.168.1.105 #在真实机器上通过外网地址可以登录如下,输入账号密码后

mark

mark

服务器配置完毕!!

15-1-2 客服端安装nagios

由于本作业只采用讲过的技术,所有采用ssh,部署,脚本也可以,安装包中附一键安装脚本,客服端需要安装nagios-plugins-2.1.1.tar.gz, nrpe-2.15.tar.gz

mark

#在防火墙上将nagios-plugins-2.1.1.tar.gz,nrpe-2.15.tar.gz包传给所有的客服端(不赘述),批量操作,防火墙为第一视角
#解决依赖
[root@firewall ~]# yum install -y openssl openssl-devel gcc glibc glibc-common php gd gd-devel libpng libmng libjpeg zlib
#创建用户和组
[root@firewall ~]# useradd -s /sbin/nologin nagios
[root@firewall ~]# groupadd nagcmd
[root@firewall ~]# usermod -G nagcmd nagios
[root@firewall ~]# id nagios
uid=500(nagios) gid=500(nagios) groups=500(nagios),501(nagcmd)
#安装xinetd(服务端不安装)
[root@firewall ~]# yum install xinetd -y
#解压安装包
[root@firewall ~]# tar xf nagios-plugins-2.1.1.tar.gz -C /usr/local/src/
[root@firewall ~]# tar xf nrpe-2.15.tar.gz -C /usr/local/src/
[root@firewall ~]# cd !$
cd /usr/local/src/
#编译三部曲
[root@firewall nagios-plugins-2.1.1]# cd nagios-plugins-2.1.1/
[root@firewall nagios-plugins-2.1.1]#  ./configure && make && make install
[root@firewall nagios-plugins-2.1.1]# cd ../nrpe-2.15
[root@firewall nrpe-2.15]#  ./configure && make && make install
#服务端到此为止
#安装配置文件
[root@firewall ~]# cd /usr/local/src/nrpe-2.15
[root@firewall nrpe-2.15]# make install-daemon-config && make install-xinetd
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/etc
/usr/bin/install -c -m 644 -o nagios -g nagios sample-config/nrpe.cfg /usr/local/nagios/etc
/usr/bin/install -c -m 644 sample-config/nrpe.xinetd /etc/xinetd.d/nrpe 
#修改配置文件
[root@firewall ~]# vim /etc/xinetd.d/nrpe 
# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
        flags           = REUSE
        socket_type     = stream
        port            = 5666
        wait            = no
        user            = nagios
        group           = nagios
        server          = /usr/local/nagios/bin/nrpe
        server_args     = -c /usr/local/nagios/etc/nrpe.cfg --inetd
        log_on_failure  += USERID
        disable         = no
        only_from       = 127.0.0.1,192.168.1.100  --------------------------------->加入nagios服务器地址
}
wq!
#将配置文件复制到每台客服端
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.50:/etc/xinetd.d/
nrpe                                                                                                                                                                                                          100%  476     0.5KB/s   00:00    
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.51:/etc/xinetd.d/
nrpe                                                                                                                                                                                                          100%  476     0.5KB/s   00:00    
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.52:/etc/xinetd.d/
nrpe                                                                                                                                                                                                          100%  476     0.5KB/s   00:00    
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.53:/etc/xinetd.d/
nrpe                                                                                                                                                                                                          100%  476     0.5KB/s   00:00    
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.201:/etc/xinetd.d/
nrpe                                                                                                                                                                                                          100%  476     0.5KB/s   00:00    
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.202:/etc/xinetd.d/
nrpe                                                                                                                                                                                                          100%  476     0.5KB/s   00:00    
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.203:/etc/xinetd.d/
nrpe                                                                                                                                                                                                          100%  476     0.5KB/s   00:00    
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.204:/etc/xinetd.d/
nrpe                                                                                                                                                                                                          100%  476     0.5KB/s   00:00    
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.205:/etc/xinetd.d/
nrpe                                                                                                                                                                                                          100%  476     0.5KB/s   00:00    
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.140:/etc/xinetd.d/
nrpe
#注册端口,开启服务器
[root@firewall ~]# echo "nrpe 5666/tcp # NRPE" >> /etc/services
[root@dns nrpe-2.15]# /etc/init.d/xinetd start
Starting xinetd:                                           [  OK  ]
[root@dns nrpe-2.15]# netstat -anput |grep 5666
tcp        0      0 :::5666                     :::*                        LISTEN      29620/xinetd        
[root@dns nrpe-2.15]# chkconfig xinetd on
#服务端最后一步
[root@nagios ~]# cd /usr/local/src/nrpe-2.15/
[root@nagios nrpe-2.15]# make install-plugin && make install-daemon
cd ./src/ && make install-plugin
make[1]: Entering directory `/usr/local/src/nrpe-2.15/src'
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/libexec
/usr/bin/install -c -m 775 -o nagios -g nagios check_nrpe /usr/local/nagios/libexec
make[1]: Leaving directory `/usr/local/src/nrpe-2.15/src'
cd ./src/ && make install-daemon
make[1]: Entering directory `/usr/local/src/nrpe-2.15/src'
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/bin
/usr/bin/install -c -m 775 -o nagios -g nagios nrpe /usr/local/nagios/bin
make[1]: Leaving directory `/usr/local/src/nrpe-2.15/src'                     
[root@nagios nrpe-2.15]# ls /usr/local/nagios/libexec/check_nrpe
/usr/local/nagios/libexec/check_nrpe
#定义监控外部的命令,加入下面此行
[root@nagios objects]# vim commands.cfg
# 'check_nrpe' command definition
        define command{
        command_name    check_nrpe
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
        }
#安装完毕

15-2 配置监控集群

  • 通用监控项目有
    • cpu负载,内存,ping,ssh,磁盘空间,总进程,SSH
  • 特殊监控项目按具体服务器角色来定

  • 严格按照三部曲

    • 定义主机,定义服务,定义命令
    #在nagios下插入定义主机,服务的配置文件
    [root@nagios ~]# vim /usr/local/nagios/etc/nagios.cfg
    #加下下面三行
    #Third stage cluster project evaluation monitoring
    cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
    cfg_file=/usr/local/nagios/etc/objects/service.cfg
    cfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg
    

15-2-1 配置集群监控定义主机配置文件

#定义主机配置文件
[root@nagios ~]# vim /usr/local/nagios/etc/objects/hosts.cfg
#firewall.mafei0728.cn
define host{
use             linux-server
hostgroup_name  generic_server
host_name       firewall
address         192.168.1.254
}
#dns.mafei0728.cn
define host{
use             linux-server
hostgroup_name  generic_server
host_name       dns
address         192.168.1.140
}
#lvs1.mafei0728.cn
define host{
use             linux-server
host_name       lvs1
hostgroups      lvs_keepalived,generic_server
address         192.168.1.201
}
#lvs2.mafei0728.cn
define host{
use             linux-server
host_name       lvs2
hostgroups      lvs_keepalived,generic_server
address         192.168.1.202
}
#nfs.mafei0728.cn
define host{
use             linux-server
hostgroup_name  generic_server
host_name       nfs
address         192.168.1.205
}
#apache1.mafei0728.cn
define host{
use             linux-server
host_name       apache1
hostgroups      web_server,generic_server
address         192.168.1.203
}
#apache2.mafei0728.cn
define host{
use             linux-server
host_name       apache2
hostgroups      web_server,generic_server
address         192.168.1.204
}
#atlas.mafei0728.cn
define host{
use             linux-server
hostgroup_name  generic_server
host_name       atlas
address         192.168.1.53
}
#master.mafei0728.cn
define host{
use             linux-server
host_name       master
hostgroups      mysql_server,generic_server
address         192.168.1.50
}
#slave1.mafei0728.cn
define host{
use             linux-server
host_name       slave1
hostgroups      mysql_server,generic_server
address         192.168.1.51
}
#slave2.mafei0728.cn
define host{
use             linux-server
host_name       slave2
hostgroups      mysql_server,generic_server
address         192.168.1.52
}

15-2-2 配置服务监控组

#定义keepalived+lvs组
define hostgroup{
        hostgroup_name  lvs_keepalived
        members         lvs1,lvs2
}
#定义web服务器组
define hostgroup{
        hostgroup_name  web_server
        members         apache1,apache2
}
#定义数据库组
define hostgroup{
        hostgroup_name  mysql_server
        members         master,slave1,slave2
}
#常规监控组
define hostgroup{
        hostgroup_name  generic_server
        members   firewall,dns,lvs1,lvs2,master,slave1,slave2,atlas,apache1,apache2,nfs
}

15-3 配置客户端监控配置(超多)

15-3-1 通用监控配置(所有客户端)

#修改配置文件(通用,全部)
[root@firewall ~]# vim /usr/local/nagios/etc/nrpe.cfg 
                           ......
# The following examples use hardcoded command arguments...
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda  ------------------>虚拟机只有一个分区,只好监视整个硬盘
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 
                           .......
#复制到所有机器,重启服务

[root@firewall ~]# service xinetd restart
Stopping xinetd:                                           [  OK  ]
Starting xinetd:                                           [  OK  ]

15-3-2 keepalived,lvs监控配置

15-3-2-1 监控keepalived

#思路keepalivek启动有3个进程,不等于3个就报警
[root@lvs1 ~]# vim /usr/local/nagios/etc/nrpe.cfg 
# The following examples use hardcoded command arguments...
                     ......
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 40% -c 20%
command[check_swap]=/usr/local/nagios/libexec/check_procs -c 3:3 -C keepalived   ----------------------->监控进程来判断

15-3-2-1 监控lvs的连接数

#我们只需要在lvs1上安装
#安装web服务
[root@lvs1 ~]# yum install httpd php -y
#下载画图工具,并安装
[root@lvs1 ~]# yum install rrdtool -y
#上传lvs-rrd包并解压到httpd网站根目录
[root@lvs1 ~]# tar xf lvs-rrd-v0.7.tar.gz  -C /var/www/html
#修改配置文件
[root@lvs1 ~]# cd /var/www/html/lvs/
[root@lvs1 lvs]# ls
Changelog  graph-lvs.sh  graphs  index.php  lvs-rrd.php  lvs.rrd.update  README  rrd
[root@lvs1 lvs]# vim lvs.rrd.update 
# User set variables.
# Change these to match your system config.
RRDTOOL="/usr/bin/rrdtool"
IPVSADM="/sbin/ipvsadm"
WORKDIR="/var/www/html/lvs/rrd"
wq!
#修改画图脚本
[root@lvs1 lvs]# vim graph-lvs.sh 
#!/bin/bash
# WORKDIR must match the directory used in the update script.
WORKDIR="/var/www/html/lvs/rrd"
RRDTOOL="/usr/bin/rrdtool"
# Where to put the graphs. 
GRAPHS="/var/www/html/lvs/graphs"
WEBPATH="/lvs/graphs"
wq!
#修改php文件
[root@lvs1 lvs]# vim lvs-rrd.php 
<?php
header("Cache-Control: max-age=300, must-revalidate");
system("/var/www/html/lvs/graph-lvs.sh -H");
?>
wq!
#定时收集
[root@lvs1 lvs]# crontab -e
*/20 * * * * /usr/sbin/ntpdate dns.mafei0728.cn >/dev/null 2>&1
* * * * * sh /var/www/html/lvs/lvs.rrd.update >/dev/null 2>&1
wq!
#开机启动服务
[root@lvs1 lvs]# service httpd start
Starting httpd: 
[root@lvs1 lvs]# chkconfig httpd on
#在nagios监控主机中配置
[root@nagios objects]# vim hosts.cfg 
#lvs1.mafei0728.cn
define host{
use             linux-server
host_name       lvs1
alias           lvs1
hostgroups      lvs_keepalived,generic_server
address         192.168.1.201
notes_url       http://192.168.1.201/lvs
}

配置完毕查看效果

keepalived

mark

lvs链接监控

mark

15-3-3 监控nfs

#nfs监控,官方网站下有插件下载,在防火墙上上传安装包到nahios和nfs服务器(不赘述)\
#安装以nagios服务器一样,nfs服务器同意(nagios服务器可以不安装)
[root@nagios ~]# tar -xf monitoringplug-0.16.tar.gz -C /usr/local/src/
[root@nagios ~]# cd !$
cd /usr/local/src/
[root@nagios src]# cd monitoringplug-0.16/
[root@nagios monitoringplug-0.16]# ./configure --prefix=/usr/local/nagiosextend
[root@nagios monitoringplug-0.16]# make && make install
[root@nagios monitoringplug-0.16]# cd /usr/local/nagiosextend/lib/nagios/plugins/
[root@nagios plugins]# ls
check_bonding  check_enforce     check_mem        check_mysql       check_nrped     check_sebool  notify_mail
check_dhcp     check_file        check_memcached  check_mysql_rows  check_redis     check_sockets  notify_sms
check_dummy    check_gsm_signal  check_multipath  check_nfs         check_rpc_ping  check_timeout  notify_stdout
[root@nagios plugins]# cp check_nfs /usr/local/nagios/libexec/
#在nfs服务器在配置
[root@nfs libexec]# vim ../etc/nrpe.cfg 
# The following examples use hardcoded command arguments...
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 40% -c 20%
command[check_nfs]=/usr/local/nagios/libexec/check_nfs -H 127.0.0.1 -w 10s -c 3s
#在nagios服务器在配置
#测试正常
[root@nagios libexec]# ./check_nrpe -H 192.168.1.205 -c check_nfs
OK - mountd export by udp:mountdv3, tcp:mountdv3
#定义service.cfg,加入下面
[root@nagios objects]# vim service.cfg 
#检查NFS
define service{
        use                             local-service
        host_name                       nfs
        service_description             nfs
        check_command                   check_nrpe!check_keepalived
}
#检查配置文件
[root@nagios objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 3.5.1
.....
Total Warnings: 0
Total Errors:   0
Things look okay - No serious problems were detected during the pre-flight check
#重启服务
[root@nagios objects]# service httpd restart && service nagios restart
Stopping httpd:                                            [  OK  ]
Starting httpd:                                            [  OK  ]
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
#查看效果

mark

#停掉服务看下
[root@nfs libexec]# service nfs stop
Shutting down NFS daemon:                                  [  OK  ]
Shutting down NFS mountd:                                  [  OK  ]
Shutting down NFS quotas:                                  [  OK  ]
Shutting down NFS services:                                [  OK  ]
Shutting down RPC idmapd:                                  [  OK  ]

mark

nfs配置完毕

15-3-4 监控web服务器

#web服务器监控异常简单,只需要在nagios服务器定义
[root@nagios objects]# vim commands.cfg
# 'check_http' command definition
define command{
        command_name    check_http
        command_line    $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
        }
[root@nagios objects]# vim service.cfg 
# 'check_http' command definition
# Disable notifications for this service by default, as not all users may have HTTP enabled.
define service{
        use                             local-service
        hostgroup_name                  web_server
        service_description             HTTP
        check_command                   check_http!-p 8080 ------------------>端口前面有修改
        notifications_enabled           0
}
#查看效果

mark

配置完毕

15-3-4 监控mha和atlas

15-3-4-1 监控atlas

#监控atlas我么监控两个端口就行了1234,2345
#nagios服务器配置,加入下面多行
[root@nagios objects]# vim service.cfg
#检查atlas
define service{
        use                             local-service
        host_name                       atlas
        service_description             atlas_listen
        check_command                   check_tcp!1234
}
define service{
        use                             local-service
        host_name                       atlas
        service_description             atlas_manager
        check_command                   check_tcp!2345
}
#检查,重二部曲
[root@nagios objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Total Warnings: 0
Total Errors:   0
Things look okay - No serious problems were detected during the pre-flight check
[root@nagios objects]# service httpd restart && service nagios restart
Stopping httpd:                                            [  OK  ]
Starting httpd:                                            [  OK  ]
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.

mark

15-3-4-2 监控mha

#自定义脚本
[root@alats libexec]# vim ma.sh 

#!/bin/bash
_status=`masterha_check_status --conf=/etc/masterha/app1.cnf|awk '{print $5}'|awk -F. '{print $4}'`
if [[ $_status -eq 50 ]]
then
echo "ha is running,the master is 192.168.1.50"
exit 0
elif [[ $_sttus -eq 51 ]]
then
echo "ha is running,the master is 192.168.1.51"
exit 0
elif [[ $_sttus -eq "" ]]
then
echo "ha is not running"
exit 2
else
echo "unknow error!!"
exit 3
fi
wq!
#修改配置文件
[root@alats libexec]# vim ../etc/nrpe.cfg 
# The following examples use hardcoded command arguments...
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 40% -c 20%
command[check_mha]=/usr/local/nagios/libexec/ma.sh
#服务器定义命令
#检查MHA
[root@nagios objects]# vim service.cfg 
define service{
        use                             local-service
        host_name                       atlas
        service_description             mha
        check_command                   check_nrpe!check_mha
}
#检查重启三部曲
[root@nagios objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[root@nagios objects]# service httpd restart && service nagios restart
#查看状态.

mark

15-3-5 监控mysql主从

15-3-5-1 mysql连接状态

#监控mysql状态
#命令
[root@nagios objects]# vim commands.cfg
# 'check_mysql'command definition
        define command{
        command_name    check_mysql
        command_line    $USER1$/check_mysql -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$ -P $ARG3$ -d $ARG4$
        }
#服务
#检查mysql连接
define service{
        use                             local-service
        hostgroup_name                  mysql_server
        service_description             check_mysql
        check_command                   check_mysql!mafei0728!mafei0728!3306!mafei0728
}
#三部曲
[root@nagios objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Total Warnings: 0
Total Errors:   0
[root@nagios objects]# service httpd restart && service nagios restart
#效果后面跟主从一起看

15-3-5-2 mysql主从复制

#只能自定义脚本了
[root@nagios libexec]# vim check_mysql_slave_status.sh
#!/bin/sh 
slave_status=($(mysql -umafei0728 -pmafei0728 -h $1 -e "show slave status\G"|grep "Running:" |awk '{print $2}'))
if [[ ${slave_status[0]} = Yes ]] && [[ ${slave_status[1]} = Yes ]]
     then
     echo "OK slave is running" 
     exit 0
else
     echo "slave is error" 
     exit 2
fi
wq!
#修改命令
[root@nagios objects]# vim commands.cfg
# 'check_mysql_slave'command definition
        define command{
        command_name    check_mysql_slave
        command_line    $USER1$/check_mysql_slave_status.sh $HOSTADDRESS$
        }
#修改服务
[root@nagios objects]# vim service.cfg 
#检查mysql主从复制
define service{
        use                             local-service
        host_name                       slave1,slave2
        service_description             check_mysql_slave_status
        check_command                   check_mysql_slave_status
}
#三部曲
[root@nagios objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Total Warnings: 0
Total Errors:   0
Things look okay - No serious problems were detected during the pre-flight check
[root@nagios objects]# service httpd restart && service nagios restart
Stopping httpd:                                            [  OK  ]
Starting httpd:                                            [  OK  ]
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
#查看效果

mark

15-3-6 监控NTP,DNS服务器

#监控这些异常简单,监控端口就可以了
#配置命令
[root@nagios objects]# vim commands.cfg
# 'check_ntp' command definition
define command{
        command_name    check_ntp
        command_line    $USER1$/check_ntp -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$
        }
# 'check_dns' command definition
define command{
        command_name    check_dns
        command_line    $USER1$/check_dns -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$
        }
#配置服务
[root@nagios objects]# vim service.cfg 
#监控dns服务器
define service{
        use                             local-service
        host_name                       dns
        service_description             check_dns
        check_command                   check_dns!1!3
}
#监控ntp
define service{
        use                             local-service
        host_name                       dns
        service_description             check_ntp
        check_command                   check_dns!1!3
 #测试重启三部曲
[root@nagios objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Checking misc settings...
Total Warnings: 0
Total Errors:   0
Things look okay - No serious problems were detected during the pre-flight check
[root@nagios libexec]# service httpd restart && service nagios restart
Stopping httpd:                                            [  OK  ]
Starting httpd:                                            [  OK  ]
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.
#看效果本地dns服务器有点卡

mark

配置完毕,现在把所有命令和服务配置贴出来,并贴出监控通用的监控配置(硬盘,cpu负载,内存,用户,总进程)

#servce.cfg
# Define a service to "ping" the local machine
define service{
        use                             local-service
        hostgroup_name                  generic_server
        service_description             PING
        check_command                   check_ping!100.0,20%!500.0,60%
}
# < 10% free space on partition.
define service{
        use                             local-service
        hostgroup_name                  generic_server
        service_description             Root Partition
        check_command                   check_local_disk!20%!10%!/
}
# if > 50 users.
define service{
        use                             local-service
        hostgroup_name                  generic_server
        service_description             Current Users
        check_command                   check_local_users!20!50
}
# > 400 users.

define service{
        use                             local-service
        hostgroup_name                  generic_server
        service_description             Total Processes
        check_command                   check_local_procs!250!400!RSZDT
}
# Define a service to check the load on the local machine

define service{
        use                             local-service
        hostgroup_name                  generic_server
        service_description             Current Load
        check_command                   check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}
# Define a service to check the swap usage the local machine. 
# Critical if less than 10% of swap is free, warning if less than 20% is free
define service{
        use                             local-service
        hostgroup_name                  generic_server
        service_description             Swap Usage
        check_command                   check_local_swap!20!10
}
# 'check_http' command definition
# Disable notifications for this service by default, as not all users may have HTTP enabled.

define service{
        use                             local-service
        hostgroup_name                  web_server
        service_description             HTTP
        check_command                   check_http!-p 8080
        notifications_enabled           0
}
###############################################################################
#检查keepalived服务器
define service{
        use                             local-service
        hostgroup_name                  lvs_keepalived
        service_description             keepalived
        check_command                   check_nrpe!check_keepalived
}
#检查NFS
define service{
        use                             local-service
        host_name                       nfs
        service_description             nfs
        check_command                   check_nrpe!check_nfs
}
#检查atlas
define service{
        use                             local-service
        host_name                       atlas
        service_description             atlas_listen
        check_command                   check_tcp!1234
}
define service{
        use                             local-service
        host_name                       atlas
        service_description             atlas_manager
        check_command                   check_tcp!2345
}
#检查MHA
define service{
        use                             local-service
        host_name                       atlas
        service_description             mha
        check_command                   check_nrpe!check_mha
}
#检查mysql连接
define service{
        use                             local-service
        hostgroup_name                  mysql_server
        service_description             check_mysql
        check_command                   check_mysql!mafei0728!mafei0728!3306!mafei0728
}
#检查mysql主从复制
define service{
        use                             local-service
        host_name                       slave1,slave2
        service_description             check_mysql_slave_status
        check_command                   check_mysql_slave_status
}
#监控dns服务器
define service{
        use                             local-service
        host_name                       dns
        service_description             check_dns
        check_command                   check_dns!1!3
}
#监控ntp
define service{
        use                             local-service
        host_name                       dns
        service_description             check_ntp
        check_command                   check_ntp!1!3
}
#commmands.cfg
# 'check_local_disk' command definition
define command{
        command_name    check_local_disk
        command_line    $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
        }


# 'check_local_load' command definition
define command{
        command_name    check_local_load
        command_line    $USER1$/check_load -w $ARG1$ -c $ARG2$
        }


# 'check_local_procs' command definition
define command{
        command_name    check_local_procs
        command_line    $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
        }


# 'check_local_users' command definition
define command{
        command_name    check_local_users
        command_line    $USER1$/check_users -w $ARG1$ -c $ARG2$
        }


# 'check_local_swap' command definition
define command{
    command_name    check_local_swap
    command_line    $USER1$/check_swap -w $ARG1$ -c $ARG2$
    }


# 'check_local_mrtgtraf' command definition
define command{
    command_name    check_local_mrtgtraf
    command_line    $USER1$/check_mrtgtraf -F $ARG1$ -a $ARG2$ -w $ARG3$ -c $ARG4$ -e $ARG5$
    }


################################################################################
# NOTE:  The following 'check_...' commands are used to monitor services on
#        both local and remote hosts.
################################################################################

# 'check_ftp' command definition
define command{
        command_name    check_ftp
        command_line    $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$
        }


# 'check_hpjd' command definition
define command{
        command_name    check_hpjd
        command_line    $USER1$/check_hpjd -H $HOSTADDRESS$ $ARG1$
        }


# 'check_snmp' command definition
define command{
        command_name    check_snmp
        command_line    $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$
        }


# 'check_http' command definition
define command{
        command_name    check_http
        command_line    $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
        }


# 'check_ssh' command definition
define command{
    command_name    check_ssh
    command_line    $USER1$/check_ssh $ARG1$ $HOSTADDRESS$
    }


# 'check_dhcp' command definition
define command{
    command_name    check_dhcp
    command_line    $USER1$/check_dhcp $ARG1$
    }


# 'check_ping' command definition
define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
        }


# 'check_pop' command definition
define command{
        command_name    check_pop
        command_line    $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$
        }


# 'check_imap' command definition
define command{
        command_name    check_imap
        command_line    $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$
        }


# 'check_smtp' command definition
define command{
        command_name    check_smtp
        command_line    $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$
        }


# 'check_tcp' command definition
define command{
    command_name    check_tcp
    command_line    $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
    }


# 'check_udp' command definition
define command{
    command_name    check_udp
    command_line    $USER1$/check_udp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
    }


# 'check_nt' command definition
define command{
    command_name    check_nt
    command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
    }
# 'check_ntp' command definition
define command{
    command_name    check_ntp
    command_line    $USER1$/check_ntp -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$
    }
# 'check_dns' command definition
define command{
    command_name    check_dns
    command_line    $USER1$/check_dns -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$
    }
#
# SAMPLE PERFORMANCE DATA COMMANDS
#
# These are sample performance data commands that can be used to send performance
# data output to two text files (one for hosts, another for services).  If you
# plan on simply writing performance data out to a file, consider using the 
# host_perfdata_file and service_perfdata_file options in the main config file.
#
################################################################################


# 'process-host-perfdata' command definition
define command{
    command_name    process-host-perfdata
    command_line    /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /usr/local/nagios/var/host-perfdata.out
    }


# 'process-service-perfdata' command definition
define command{
    command_name    process-service-perfdata
    command_line    /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios/var/service-perfdata.out
    }
# 'check_nrpe' command definition
        define command{
        command_name    check_nrpe
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
        }
# 'check_mysql'command definition
        define command{
        command_name    check_mysql
        command_line    $USER1$/check_mysql -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$ -P $ARG3$ -d $ARG4$
        }
# 'check_mysql_slave'command definition
        define command{
        command_name    check_mysql_slave_status
        command_line    $USER1$/check_mysql_slave_status.sh $HOSTADDRESS$
        }

15-3-7 配置短信报警

#查看默认邮件工具,我们安装sendmail发送
[root@nagios ~]# yum install sendmail -y
[root@nagios ~]# service postfix stop
Shutting down postfix:                                     [  OK  ]
[root@nagios ~]# chkconfig postfix off
[root@nagios ~]# service sendmail restart
Shutting down sm-client:                                   [  OK  ]
Shutting down sendmail:                                    [FAILED]
Starting sendmail:                                         [  OK  ]
Starting sm-client:                                        [  OK  ]
[root@nagios ~]# alternatives --config mta

There are 2 programs which provide 'mta'.

  Selection    Command
-----------------------------------------------
   1           /usr/sbin/sendmail.postfix
*+ 2           /usr/sbin/sendmail.sendmail

Enter to keep the current selection[+], or type selection number: 2
#测试邮件和短信是否正常
[root@nagios ~]# echo "hello,this is a test message" >11.11
[root@nagios ~]# mail -s 'hello mafei0728' 17689215817@wo.cn <11.11

mark

mark

#修改配置文件
define contact{
        contact_name                    nagiosadmin             ; Short name of user
        use                             generic-contact         ; Inherit default values from generic-contact template (defined above)
        alias                           Nagios Admin            ; Full name of user

        email                           1768921xxxxx@wo.cn       ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
        }                               (此处就不暴露我的手机邮箱了)

wq!
#测试重启,三部曲
[root@nagios ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Checking misc settings...
Total Warnings: 0
Total Errors:   0
[root@nagios ~]# service httpd restart && service nagios restart
Stopping httpd:                                            [  OK  ]
Starting httpd:                                            [  OK  ]
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
[root@nagios ~]# 
#测试报警是否正常,我们关闭ntp服务
[root@dns ~]# service ntpd stop
Shutting down ntpd:                                        [  OK  ]
#收到短信在开启来
[root@dns ~]# service ntpd restart
Shutting down ntpd:                                        [FAILED]
Starting ntpd:                                             [  OK  ]

ntp服务器短息报警和恢复截图

mark

ntp服务器邮件报警截图

mark

nagios报警配置完毕,现在nagios监控整个集群的项目就做完看

监控配置完毕,现在看下效果图

监控配置完毕,现在看下效果图

mark

mark

mark

Nagios监控》有9个想法

  1. Someone necessarily lend a hand to make seriously posts I would state. That is the very first time I frequented your web page and so far? I surprised with the analysis you made to create this particular publish extraordinary. Wonderful job!

  2. ffqvjqsnz,Thanks for ones marvelous posting! I actually enjoyed reading it, you will be a great author.I will always bookmark your blog and will bevtfqd,come back from now on. I want to encourage that you continue your great writing, have a nice afternoon!

  3. I simply want to mention I am just very new to weblog and actually loved your web-site. Probably I’m want to bookmark your site . You certainly come with very good articles and reviews. Bless you for sharing your web site.

发表评论

电子邮件地址不会被公开。 必填项已用*标注