Shell脚本监控并重启memcached进程

WEB服务器使用memcached,但是不知道为什么memcached老是挂掉(基本20分钟~50分钟左右),导致部分网站页面在访问的时候出错;定义日志后,查看日志也未能发现什么;初步判定由于之前更新libevent有关系。由于线上服务器,所以先用脚本来弥补下

#!/bin/sh 
pid=`ps aux|grep -v grep|grep memcached|awk '{print $2}'` 
memcached=`/usr/local/memcached/bin/memcached -u www &` 
nginx=`/usr/local/nginx/sbin/nginx -s reload &` 
if [ -z "$pid"] 
then 
echo $memcached 
echo $nginx 
fi

上面脚本主要温故2个知识点,一个是awk、一个是if的条件表达式;当然那些单引号、双引号、特殊单引号也是烦人的。只是一个基础脚本,很菜,不过可以实现我要的功能了,首先判断memcached进程是否存在,如果不存在则启动memcached和重载nginx。

最后加入到系统任务中,每隔5分钟判定一次:

*/15 * * * * /root/memcached.sh

完工!

Memcached key value数据库使用详解

简介

Memcached是一个开源、免费、高性能的分布式对象缓存系统,通过减少对数据库的读取以提高Web应用的性能;Memcached基于一个存储键/值对的hashmap。其守护进程(daemon )是用 C 写的,但是客户端可以用任何语言来编写,并通过memcached协议与守护进程通信。当某个服务器停止运行或崩溃了,所有存放在该服务器上的键/值对都将丢失。

Memcached的服务器端没有提供分布式功能,各个Memcached应用不会互相通信以共享信息。想要实现分布式通过,可以多搭建几个Memcached应用,通过算法实现此效果;

Memcached里有两个重要概念:

  • slab:为了防止内存碎片化,Memcached服务器端会预先将数据空间划分为一系列slab;举个例子,现在有一个100立方米的房间,为了合理规划这个房间放置东西,会在这个房间里放置 30 个 1 立方米的盒子、20 个 1.25 立方米的盒子、15 个 1.5 立方米的盒子…这些盒子就是slab;

  • LRU:最近最少使用算法;当同一个slat的格子满了,这时需要新加一个值时,不会考虑将这个新数据放到比当前slat更大的空闲slat,而是使用LRU移除旧数据,放入这个新数据;

部署

Memcached能够在大多数 Linux 和 类 BSD 系统上运行;官方没有给出Windows上安装Memcached的支持;

对于Debian / Ubuntu系统:

apt-get install memcached

对于Redhat / Fedora / CentOs系统:

yum install memcached

通过memcached -h查看帮助,同时也算是测试是否安装成功;
如果遇到错误,可参考官方上的FAQ;

使用

服务器端

启动一个Memcached应用,常见的启动方式是这样的:
开启一个memcached应用作守护进程,TCP连接,端口号是 11211;-u参数是运行Memcached应用的用户(这个参数也只有 root用户才能使用);

memcached -u root -p 11211 -d -vvv

其他常见的参数也有

  • -m :分配给Memcached应用使用的内存大小,默认是 64M;
  • -l :设置能访问Memcached应用的IP(默认:所有都允许;无论内外网或者本机更换IP,有安全隐患;若设置为127.0.0.1就只能本机访问);
  • -c :设置最大运行的并发连接数,默认是 1024;
  • -f :设置slat大小增长因子;默认是 1.25;比如说 10号slab大小是752,那么11号slab大小就是 752 * 1.25;

客户端

Memcached客户端与服务器端的通信比较简单,使用的基于文本的协议,而不是二进制协议;因此可以通过telnet进行交互;

telnet [host] [port]

按下Ctrl + ],并回车,即可回显;

Storage命令

set
存储数据。如果set的key已经存在,该命令可以更新该key所对应的原来的数据,也就是实现更新的作用。详细命令指南可参考菜鸟教程 – Memcached set 命令;

add
只有在set的key不存在的情况下,才会存储数据;详细命令指南可参考菜鸟教程 – Memcached add 命令;

replace
只有在set的key存在的情况下,才会替换数据;详细命令指南可参考菜鸟教程 – Memcached replace 命令;

append
向已存在的元素值后追加数据;详细命令指南可参考菜鸟教程 – Memcached append 命令;

prepend
向已存在的元素值的头部追加数据;详细命令指南可参考菜鸟教程 – Memcached prepend 命令;

cas
命令用于执行一个”检查并设置”的操作。它仅在当前客户端最后一次取值后,该key 对应的值没有被其他客户端修改的情况下,才能够将值写入。检查是通过cas_token参数进行的, 这个参数是Memcach指定给已经存在的元素的一个唯一的 64 位值。详细命令指南可参考菜鸟教程 – Memcached cas 命令;

Retrive命令

get
根据元素的键名获取值;详细命令指南可参考菜鸟教程 – Memcached get 命令;

gets
获取带有CAS令牌的数据值;详细命令指南可参考菜鸟教程 – Memcached gets 命令;

delete
删除已存在的元素;详细命令指南可参考菜鸟教程 – Memcached delete 命令;

incr/decr
对于已存在的键值进行自增或自减操作;详细命令指南可参考菜鸟教程 – Memcached incr/decr 命令;

Statistics命令

stats
查看memcached所有的统计信息;详细命令指南可参考菜鸟教程 – Memcached stats 命令;

stats items
显示各个slab中item的数目和存储时长等其它信息;详细命令指南可参考菜鸟教程 – Memcached stats items 命令;

stats slabs
显示各个slab的信息,包括chunk的大小、数目、使用情况等。详细命令指南可参考菜鸟教程 – Memcached stats slabs 命令;

stats sizes
用于显示所有item的大小和个数。该信息返回两列,第一列是 item 的大小,第二列是 item 的个数。详细命令指南可参考菜鸟教程 – Memcached stats sizes 命令;

flush_all
清除所有缓存数据;详细命令指南可参考菜鸟教程 – Memcached flush_all 命令;

分布式算法

取余算法

根据服务器节点数的余数来进行分散,就是通过hash函数求得的Key的整数哈希值再除以服务器节点数并取余数来选择服务器。这种算法取余计算简单,分散效果好,但是缺点是如果某一台机器宕机,那么应该落在该机器的请求就无法得到正确的处理,这时需要将当掉的服务器从算法从去除,此时候会有 (N-1) / N 的服务器的缓存数据需要重新进行计算;如果新增一台机器,会有N / (N+1)的服务器的缓存数据需要进行重新计算。对于系统而言,这通常是不可接受的颠簸(因为这意味着大量缓存的失效或者数据需要转移)。

【本段内容摘自大脸猫的博客】

一致性哈希

表现为一个封闭的圆环,圆环上的点分别代表0 ~ 2^32。各个memcached节点根据hash算法,分别占据圆环上的一个点,当某key进行存储操作,会针对key进行hash操作,hash后也是圆环上的一个点,那么这个key将被存储在顺时针方向的第一个节点上。

未分类

如上图:分配不均的节点,此时key将会被存储到节点C上。

此时,我们新增节点D,如下图。受影响的部分只有节点A~节点D中间的部分,这边分数据不会再映射到节点B上,而是映射到新增节点D上。减掉一个节点同理,只影响顺时针后面一个节点。

未分类

优点:动态的增删节点,服务器down机,影响的只是顺时针的下一个节点
缺点:当服务器进行hash后值较为接近会导致在圆环上分布不均匀,进而导致key的分布、服务器的压力不均匀。若中间某一权重较大的serverdown机,命中率下降明显;

在一致性哈希算法的基础上引入虚拟节点

未分类

引入虚拟节点的思想,解决一致性hash算法分布不均导致负载不均的问题。一个真实节点对应若干个虚拟节点,当key被映射到虚拟节点上时,则被认为映射到虚拟节点所对应的真实节点上。

优点:引入虚拟节点的思想,每个物理节点对应圆环上若干个虚拟节点(比如200~300个),当keyhash到虚拟节点,就会存储到实际的物理节点上,有效的实现了负载均衡;

【本段内容摘自鱼我所欲也的“memcached学习 – 分布式算法”文章】

工作中常见的问题

缓存雪崩现象

缓存雪崩一般是由某个缓存节点失效,导致其他节点的缓存命中率下降,缓存中缺失的数据去数据库查询,短时间内,造成数据库服务器崩溃;

重启DB,短期又被压垮,但缓存数据也多一些;DB反复多次启动多次,缓存重建完毕,DB才稳定运行;或者,是由于缓存周期性的失效,比如每 6 小时失效一次,那么每 6 小时,将有一个请求“峰值”,严重者甚至会令DB崩溃;

缓存的无底洞现象(multiget-hole)

该问题由 facebook 的工作人员提出的, facebook 在 2010 年左右,memcached节点就已经达3000 个.缓存数千 G 内容。

他们发现了一个问题,memcached 连接频率,效率下降了,于是加 memcached 节点,添加了后,发现因为连接频率导致的问题,仍然存在,并没有好转,称之为“无底洞现象”。

问题分析

以用户为例: user-133-age, user-133-name,user-133-height …..N 个 key,当服务器增多,133 号用户的信息,也被散落在更多的节点,所以,同样是访问个人主页,得到相同的个人信息, 节点越多,要连接的节点也越多。

对于 memcached 的连接数,并没有随着节点的增多,而降低。 于是问题出现。

multiget-hole 解决方案

把某一组key,按其共同前缀,来分布。比如 user-133-age, user-133-name,user-133-height 这 3 个 key,在用分布式算法求其节点时,应该以 ‘user-133’来计算,而不是以 user-133-age/name/height 来计算。

这样,3 个关于个人信息的 key,都落在同 1 个节点上,访问个人主页时,只需要连接 1 个节点。

永久数据被踢现象

网上有人反馈为”memcached 数据丢失”,明明设为永久有效,却莫名其妙的丢失了。

分析原因:

  • 如果 slab 里的很多 chunk,已经过期,但过期后没有被 get 过, 系统不知他们已经过期。
  • 永久数据很久没 get 了, 不活跃, 如果新增 item,则永久数据被踢了。
  • 当然,如果那些非永久数据被 get,也会被标识为 expire,从而不会再踢掉永久数据;

解决方案:永久数据和非永久数据分开放;

mysql(mariadb)启动报错数据恢复过程

一、启动mysql(mariadb)报错

(注:后文中mysql==mariadb):

未分类

二、查看mysql日志:

vim /var/log/mariadb/mariadb.log
InnoDB: End of page dump

160226 11:00:21  InnoDB: Page checksum 913642282 (32bit_calc: 472052024), prior-to-4.0.14-form checksum 2048873750
InnoDB: stored checksum 913642282, prior-to-4.0.14-form stored checksum 1622372148
InnoDB: Page lsn 0 142354744, low 4 bytes of lsn at page end 142348560
InnoDB: Page number (if stored to page already) 589,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be an update undo log page
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 589.
InnoDB: You may have to recover from a backup.
InnoDB: It is also possible that your operating
InnoDB: system has corrupted its own file cache
InnoDB: and rebooting your computer removes the
InnoDB: error.
InnoDB: If the corrupt page is an index page
InnoDB: you can also try to fix the corruption
InnoDB: by dumping, dropping, and reimporting
InnoDB: the corrupt table. You can use CHECK
InnoDB: TABLE to scan your table for corruption.
InnoDB: See also  http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
160226 11:00:21 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see  http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 5.5.44-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466713 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
InnoDB: End of page dump
160226 11:00:30  InnoDB: Page checksum 913642282 (32bit_calc: 472052024), prior-to-4.0.14-form checksum 2048873750
InnoDB: stored checksum 913642282, prior-to-4.0.14-form stored checksum 1622372148
InnoDB: Page lsn 0 142354744, low 4 bytes of lsn at page end 142348560
InnoDB: Page number (if stored to page already) 589,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be an update undo log page
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 589.
InnoDB: You may have to recover from a backup.
InnoDB: It is also possible that your operating
InnoDB: system has corrupted its own file cache
InnoDB: and rebooting your computer removes the
InnoDB: error.
InnoDB: End of page dump
160226 11:00:30  InnoDB: Page checksum 913642282 (32bit_calc: 472052024), prior-to-4.0.14-form checksum 2048873750
InnoDB: stored checksum 913642282, prior-to-4.0.14-form stored checksum 1622372148
InnoDB: Page lsn 0 142354744, low 4 bytes of lsn at page end 142348560
InnoDB: Page number (if stored to page already) 589,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be an update undo log page
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 589.
InnoDB: You may have to recover from a backup.
InnoDB: It is also possible that your operating
InnoDB: system has corrupted its own file cache
InnoDB: and rebooting your computer removes the
InnoDB: error.
InnoDB: If the corrupt page is an index page
InnoDB: you can also try to fix the corruption
InnoDB: by dumping, dropping, and reimporting
InnoDB: the corrupt table. You can use CHECK
InnoDB: TABLE to scan your table for corruption.
InnoDB: See also  http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
InnoDB: Ending processing because of a corrupt database page.
160226 11:00:30  InnoDB: Assertion failure in thread 140329989404736 in file buf0buf.c line 4032
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to  http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB:  http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
160226 11:00:30 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
160226 11:00:28  InnoDB: Page dump in ascii and hex (16384 bytes):
max_threads=153
thread_count=0 
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466713 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x0 
Attempting backtrace. You can use the following information to find out
160226 11:00:19  InnoDB: Page dump in ascii and hex (16384 bytes):
InnoDB: End of page dump
160226 11:00:21  InnoDB: Page checksum 913642282 (32bit_calc: 472052024), prior-to-4.0.14-form checksum 2048873750
InnoDB: stored checksum 913642282, prior-to-4.0.14-form stored checksum 1622372148
InnoDB: Page lsn 0 142354744, low 4 bytes of lsn at page end 142348560
InnoDB: Page number (if stored to page already) 589,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be an update undo log page
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 589.
InnoDB: You may have to recover from a backup.
InnoDB: It is also possible that your operating
InnoDB: system has corrupted its own file cache
InnoDB: and rebooting your computer removes the
InnoDB: error.
InnoDB: If the corrupt page is an index page
InnoDB: you can also try to fix the corruption
InnoDB: by dumping, dropping, and reimporting
InnoDB: the corrupt table. You can use CHECK
InnoDB: TABLE to scan your table for corruption.
InnoDB: See also  http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
InnoDB: Ending processing because of a corrupt database page.
160226 11:00:21  InnoDB: Assertion failure in thread 139871429470272 in file buf0buf.c line 4032
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to  http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
160226 11:00:21 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see  http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 5.5.44-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466713 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
InnoDB: End of page dump
160226 11:00:30  InnoDB: Page checksum 913642282 (32bit_calc: 472052024), prior-to-4.0.14-form checksum 2048873750
InnoDB: stored checksum 913642282, prior-to-4.0.14-form stored checksum 1622372148
InnoDB: Page lsn 0 142354744, low 4 bytes of lsn at page end 142348560
InnoDB: Page number (if stored to page already) 589,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be an update undo log page
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 589.
InnoDB: You may have to recover from a backup.
InnoDB: It is also possible that your operating
InnoDB: system has corrupted its own file cache
InnoDB: and rebooting your computer removes the
InnoDB: error.
InnoDB: If the corrupt page is an index page
InnoDB: you can also try to fix the corruption
InnoDB: by dumping, dropping, and reimporting
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see  http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 5.5.44-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466713 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48000
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x7fa11fb574ed]
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x7fa11f76d385]
/lib64/libpthread.so.0(+0xf100)[0x7fa11ee9d100]
/lib64/libc.so.6(gsignal+0x37)[0x7fa11d6515f7]
/lib64/libc.so.6(abort+0x148)[0x7fa11d652ce8]
/usr/libexec/mysqld(+0x6971a2)[0x7fa11f9651a2]
/usr/libexec/mysqld(+0x6a8b17)[0x7fa11f976b17]
/usr/libexec/mysqld(+0x6919ee)[0x7fa11f95f9ee]
/usr/libexec/mysqld(+0x66313a)[0x7fa11f93113a]
/usr/libexec/mysqld(+0x655f93)[0x7fa11f923f93]
/usr/libexec/mysqld(+0x656dfc)[0x7fa11f924dfc]
/usr/libexec/mysqld(+0x65954e)[0x7fa11f92754e]
/usr/libexec/mysqld(+0x64290e)[0x7fa11f91090e]
/usr/libexec/mysqld(+0x5fbb9c)[0x7fa11f8c9b9c]
/usr/libexec/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x48)[0x7fa11f76f408]
/usr/libexec/mysqld(+0x37bff5)[0x7fa11f649ff5]
/usr/libexec/mysqld(_Z11plugin_initPiPPci+0x551)[0x7fa11f64fa61]
/usr/libexec/mysqld(+0x2ee4ba)[0x7fa11f5bc4ba]
/usr/libexec/mysqld(_Z11mysqld_mainiPPc+0x546)[0x7fa11f5bf5d6]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa11d63db15]
/usr/libexec/mysqld(+0x2e869d)[0x7fa11f5b669d]
The manual page at  http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
160226 11:00:30 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended

三、接下来使用官方推荐的恢复数据方法:

1、设置恢复模式启动mysql(http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html)

vim /etc/my.cnf

添加配置项:

innodb_force_recovery = 1

其中后面的值设置为1、如果1不想再逐步增加为2/3/4等。直到能启动mysql为止!!!

未分类

2、使用恢复模式重启mysql

systemctl restart mariadb

未分类

重启成功!!!!
测试数据库连接:mysql -uroot -p123456;

未分类

正常!!!

3、备份全部数据库表:

mysqldump -uroot -p123456 --all-databases  > all_mysql_backup.sql

未分类

4、清除mysql数据(清除之前务必先stop mysql服务):

未分类

systemctl stop mariadb
cp -r  /var/lib/mysql/ /var/lib/mysql.bak
rm -rf /var/lib/mysql/*

重启mysql服务:

未分类

正常模式在启动mysql:

vim /etc/my.cnf

注释配置项:

#innodb_force_recovery = 1

再重启:

systemctl restart mariadb

5、数据库恢复为以前密码123456:

mysqladmin -u root password 123456

未分类

6、使用之间备份的sql文件恢复数据:

mysql -uroot -p123456 -e "source /root/all_mysql_backup.sql"

未分类

查看恢复好的数据:

未分类

实验完成!!!

mysql(mariadb)新建用户及用户授权管理

仅新建一个newuser用户

方法一:

MariaDB [(none)]> create user newuser@localhost identified by '123456';
Query OK, 0 rows affected (0.22 sec)

MariaDB [(none)]> select user from mysql.user;
+---------+
| user    |
+---------+
| aa      |
| root    |
| root    |
|         |
| aa      |
| bb      |
| lcz     |
| my      |
| mytest  |
| newuser |
| nome    |
| root    |
|         |
| root    |
+---------+
14 rows in set (0.00 sec)

MariaDB [(none)]> 

方法二:

MariaDB [(none)]> insert into mysql.user(user,host,password) values('ggo','localhost',password('1234'));
Query OK, 1 row affected, 4 warnings (0.24 sec)

MariaDB [(none)]> flush privileges;
Query OK, 0 rows affected (0.25 sec)

效果

[root@localhost ~]# mysql -uggo -p
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or g.
Your MariaDB connection id is 4
Server version: 5.5.52-MariaDB MariaDB Server

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

MariaDB [(none)]> 

新建一个my用户并且授权全部操作权限

MariaDB [(none)]> grant all privileges on *.* to my@localhost identified by '123456';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> select user from mysql.user;
+--------+
| user   |
+--------+
| aa     |
| root   |
| root   |
|        |
| aa     |
| bb     |
| lcz    |
| my     |
| mytest |
| nome   |
| root   |
|        |
| root   |
+--------+
13 rows in set (0.14 sec)

MariaDB [(none)]>

查看用户权限

MariaDB [(none)]> show grants for my@localhost;
+--------------------------------------------------------------------------------------------------------------------+
| Grants for my@localhost                                                                                            |
+--------------------------------------------------------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'my'@'localhost' IDENTIFIED BY PASSWORD '*6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9' |
+--------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [(none)]> 

指定部分授权

grant insert,update,delete,select on *.* to mytest@localhost;

centos7.3 系统MariaDB Galera Cluster多主集群搭建

1. 环境

CentOS Linux release 7.3
MariaDB 10.1.25

2. 安装MariaDB

配置mariadb10.1的yum源

[root@centos7-compute1 ~]# cat /etc/yum.repos.d/MariaDB.repo
[root@centos7-compute2 ~]# cat /etc/yum.repos.d/MariaDB.repo
[root@centos7-compute3 ~]# cat /etc/yum.repos.d/MariaDB.repo
MariaDB.repo:
[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.1/centos7-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1

3. 使用yum安装

[root@centos7-compute1 ~]# sudo yum install MariaDB-server MariaDB-client galera
[root@centos7-compute2 ~]# sudo yum install MariaDB-server MariaDB-client galera
[root@centos7-compute3 ~]# sudo yum install MariaDB-server MariaDB-client galera

4. 安全配置

[root@centos7-compute1 ~]# /usr/bin/mysql_secure_installation
[root@centos7-compute2 ~]# /usr/bin/mysql_secure_installation
[root@centos7-compute3 ~]# /usr/bin/mysql_secure_installation
compute1,compute2,compute3 三节点 启动MariaDB并赋权:
grant all privileges on *.* to root@"%" identified by "123456";
flush privileges;
然后全部节点关闭数据库
[root@centos7-compute1 ~]# systemctl stop mariadb
[root@centos7-compute2 ~]# systemctl stop mariadb
[root@centos7-compute3 ~]# systemctl stop mariadb
注意:此时需要全部节点关闭selinx,防火墙。防止接下来的影响集群通讯
 systemctl stop firewalld
systemctl disable firewalld

5. 配置MariaDB Galera Cluster

修改三台节点上的/etc/my.cnf.d/server.cnf 文件
compute1 配置如下:

[mysqld]

[galera]
wsrep_provider = /usr/lib64/galera/libgalera_smm.so
wsrep_cluster_name='my_wsrep_cluster'
wsrep_cluster_address = "gcomm://192.168.140.197,192.168.140.141,192.168.140.192"
wsrep_node_name = centos7-compute1
wsrep_node_address=192.168.140.197
wsrep_on=ON
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_slave_threads=1
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=1024M
wsrep_sst_method=rsync
wsrep_sst_auth=root:123456

[embedded]

[mariadb]

[mariadb-10.1]

compute2 配置如下:

[mysqld]

[galera]
wsrep_provider = /usr/lib64/galera/libgalera_smm.so
wsrep_cluster_name='my_wsrep_cluster'
wsrep_cluster_address = "gcomm://192.168.140.197,192.168.140.141,192.168.140.192"
wsrep_node_name = centos7-compute2
wsrep_node_address=192.168.140.141
wsrep_on=ON
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_slave_threads=1
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=1024M
wsrep_sst_method=rsync
wsrep_sst_auth=root:123456

[embedded]

[mariadb]

[mariadb-10.1]

compute3 配置如下:

[mysqld]

[galera]
wsrep_provider = /usr/lib64/galera/libgalera_smm.so
wsrep_cluster_name='my_wsrep_cluster'
wsrep_cluster_address = "gcomm://192.168.140.197,192.168.140.141,192.168.140.192"
wsrep_node_name = centos7-compute3
wsrep_node_address=192.168.140.192
wsrep_on=ON
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_slave_threads=1
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=1024M
wsrep_sst_method=rsync
wsrep_sst_auth=root:123456

[embedded]

[mariadb]

[mariadb-10.1]

6. 启动集群

引导集群创建
只在compute1操作:

[root@centos7-compute1 ~]# /usr/sbin/mysqld --wsrep-new-cluster --user=root &

查看集群状态

MariaDB [(none)]> show  status like "wsrep_cluster_size";  
+--------------------+-------------+  
| Variable_name      | Value |  
+--------------------+-------------+  
| wsrep_cluster_size | 1     |  
+--------------------+-------------+  


MariaDB [(none)]> show  status like "wsrep%";  
+------------------------------+------------------------------------------------+  
| Variable_name                | Value                                          |  
+------------------------------+------------------------------------------------+  
| wsrep_apply_oooe             | 0.000000                                       |  
| wsrep_apply_oool             | 0.000000                                       |  
| wsrep_apply_window           | 0.000000                                       |  
| wsrep_causal_reads           | 0                                              |  
| wsrep_cert_deps_distance     | 0.000000                                       |  
| wsrep_cert_index_size        | 0                                              |  
| wsrep_cert_interval          | 0.000000                                       |  
| wsrep_cluster_conf_id        | 1                                              |  
| wsrep_cluster_size           | 1                                              |  
| wsrep_cluster_state_uuid     | 1e434901-71b5-11e7-b190-7bb2f4bbed7a           |  
| wsrep_cluster_status         | Primary                                        |  
| wsrep_commit_oooe            | 0.000000                                       |  
| wsrep_commit_oool            | 0.000000                                       |  
| wsrep_commit_window          | 0.000000                                       |  
| wsrep_connected              | ON                                             |  
| wsrep_desync_count           | 0                                              |  
| wsrep_evs_delayed            |                                                |  
| wsrep_evs_evict_list         |                                                |  
| wsrep_evs_repl_latency       | 5.592e-06/1.25208e-05/2.5685e-05/8.62896e-06/5 |  
| wsrep_evs_state              | OPERATIONAL                                    |  
| wsrep_flow_control_paused    | 0.000000                                       |  
| wsrep_flow_control_paused_ns | 0                                              |  
| wsrep_flow_control_recv      | 0                                              |  
| wsrep_flow_control_sent      | 0                                              |  
| wsrep_gcomm_uuid             | 35623b8e-71ad-11e7-af9f-52f25b42ebcf           |  
| wsrep_incoming_addresses     | 192.168.140.197:3306                           |  
| wsrep_last_committed         | 0                                              |  
| wsrep_local_bf_aborts        | 0                                              |  
| wsrep_local_cached_downto    | 18446744073709551615                           |  
| wsrep_local_cert_failures    | 0                                              |  
| wsrep_local_commits          | 0                                              |  
| wsrep_local_index            | 0                                              |  
| wsrep_local_recv_queue       | 0                                              |  
| wsrep_local_recv_queue_avg   | 0.500000                                       |  
| wsrep_local_recv_queue_max   | 2                                              |  
| wsrep_local_recv_queue_min   | 0                                              |  
| wsrep_local_replays          | 0                                              |  
| wsrep_local_send_queue       | 0                                              |  
| wsrep_local_send_queue_avg   | 0.000000                                       |  
| wsrep_local_send_queue_max   | 1                                              |  
| wsrep_local_send_queue_min   | 0                                              |  
| wsrep_local_state            | 4                                              |  
| wsrep_local_state_comment    | Synced                                         |  
| wsrep_local_state_uuid       | 1e434901-71b5-11e7-b190-7bb2f4bbed7a           |  
| wsrep_protocol_version       | 7                                              |  
| wsrep_provider_name          | Galera                                         |  
| wsrep_provider_vendor        | Codership Oy <[email protected]>              |  
| wsrep_provider_version       | 25.3.20(r3703)                                 |  
| wsrep_ready                  | ON                                             |  
| wsrep_received               | 2                                              |  
| wsrep_received_bytes         | 155                                            |  
| wsrep_repl_data_bytes        | 0                                              |  
| wsrep_repl_keys              | 0                                              |  
| wsrep_repl_keys_bytes        | 0                                              |  
| wsrep_repl_other_bytes       | 0                                              |  
| wsrep_replicated             | 0                                              |  
| wsrep_replicated_bytes       | 0                                              |  
| wsrep_thread_count           | 2                                              |  
+------------------------------+------------------------------------------------+

向集群中添加其他节点:

[root@centos7-compute2 ~]# systemctl start mariadb
[root@centos7-compute3 ~]# systemctl start mariadb

查看集群状态:

MariaDB [(none)]> show status like "wsrep_cluster_size";  
+--------------------+-------+  
| Variable_name      | Value |  
+--------------------+-------+  
| wsrep_cluster_size | 3     |  
+--------------------+-------+  
1 row in set (0.00 sec)  

7. 测试数据同步

在 compute1上创建数据库,表并插入数据,观察compute2,compute3 数据情况

MariaDB [(none)]> create database galera;
MariaDB [galera]> create table t (id int primary key);
insert into t value(1);
insert into t value(2);
insert into t value(3);

查看compute2,compute3数据库

MariaDB [galera]> show tables;  
+------------------+  
| Tables_in_galera |  
+------------------+  
| t                |  
+------------------+  
1 row in set (0.00 sec)  
MariaDB [galera]> select * from t;  
+----+  
| id |  
+----+  
|  1 |  
|  2 |  
|  3 |  
+----+

经过测试,删除数据库,增删查改表都可以实时同步。

8. 故障测试

[root@centos7-compute3 ~]# systemctl stop mariadb

然后在compute1,compute2 分别插入数据时,等待compute3启动后自己会同步

并且此时集群节点为两个。

MariaDB [galera]> SHOW STATUS LIKE 'wsrep_cluster_size';  
+--------------------+-------+  
| Variable_name      | Value |  
+--------------------+-------+  
| wsrep_cluster_size | 2     |  
+--------------------+-------+  

通过mariadb二进制日志实现数据库增量备份

何为增量备份,简单理解就是使用日志记录每天数据库的操作情况,只需要每天把这个日志里的数据库操作还原到数据库中,从而避免每天都进行完全备份,这种情况下,每周进行一次完全备份即可
首先我们需要配置以下mariadb的配置文件,我使用的是yum安装,其配置文件位于/etc/my.cnf,内容如下

[mysqld]
log-bin=mysql-bin                   #只需要增加这行就可以了
#binlog_format=row
#skip-grant
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd

[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid

#
# include all files from the config directory
#
!includedir /etc/my.cnf.d

进入mariadb进行操作

[root@localhost mysql]# mysql -uroot -p
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or g.
Your MariaDB connection id is 4
Server version: 5.5.52-MariaDB MariaDB Server

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

MariaDB [(none)]> use bp
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [bp]> show tables;
+--------------+
| Tables_in_bp |
+--------------+
| mytest       |
| test         |
+--------------+
2 rows in set (0.00 sec)

MariaDB [bp]> create table bptest(id int ,name varchar(20));
Query OK, 0 rows affected (0.01 sec)

MariaDB [bp]> insert into bptest values(1,'a');
Query OK, 1 row affected (0.00 sec)

MariaDB [bp]> insert into bptest values(2,'b');
Query OK, 1 row affected (0.01 sec)

MariaDB [bp]> select * from bptest;
+------+------+
| id   | name |
+------+------+
|    1 | a    |
|    2 | b    |
+------+------+
2 rows in set (0.01 sec)

MariaDB [bp]> flush logs;                       #这里我还有点不明白,我是简单理解为日志的开始位置
Query OK, 0 rows affected (0.01 sec)

MariaDB [bp]> insert into bptest values(3,'c');
Query OK, 1 row affected (0.01 sec)

MariaDB [bp]> insert into bptest values(4,'d');
Query OK, 1 row affected (0.01 sec)

MariaDB [bp]> flush logs;                       #日志结束位置,该日志文件我们可以在/var/lib/mysql里面找到
Query OK, 0 rows affected (0.02 sec)

MariaDB [bp]> delete from bptest where id =3;
Query OK, 1 row affected (0.01 sec)

MariaDB [bp]> delete from bptest where id=1;
Query OK, 1 row affected (0.00 sec)

MariaDB [bp]> flush logs;
Query OK, 0 rows affected (0.02 sec)

MariaDB [bp]> truncate table bptest;#为了让效果更明显,我们直接清空表内容
Query OK, 0 rows affected (0.13 sec)

MariaDB [bp]> select * from bptest;
Empty set (0.00 sec)

我们可以进入/var/lib/mysql文件夹内查看,可以看到mysql-bin.000001,mysql-bin.000002文件
接下来我们来看一下日志文件内容

[root@localhost mysql]# mysqlbinlog mysql-bin.000001 
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#170725  2:04:19 server id 1  end_log_pos 245   Start: binlog v 4, server v 5.5.52-MariaDB created 170725  2:04:19
BINLOG '
kwl3WQ8BAAAA8QAAAPUAAAAAAAQANS41LjUyLU1hcmlhREIAbG9nAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAA2QAEGggAAAAICAgCAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAKUTwPA==
'/*!*/;
# at 245
#170725  2:04:51 server id 1  end_log_pos 311   Query   thread_id=4 exec_time=0 error_code=0
SET TIMESTAMP=1500973491/*!*/;
SET @@session.pseudo_thread_id=4/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN /*!*/;
# at 311
#170725  2:04:51 server id 1  end_log_pos 404   Query   thread_id=4 exec_time=0 error_code=0
use `bp`/*!*/;
SET TIMESTAMP=1500973491/*!*/;
insert into bptest values(3,'c') /*!*/;
# at 404
#170725  2:04:51 server id 1  end_log_pos 431   Xid = 47
COMMIT/*!*/;
# at 431
#170725  2:04:56 server id 1  end_log_pos 497   Query   thread_id=4 exec_time=0 error_code=0
SET TIMESTAMP=1500973496/*!*/;
BEGIN /*!*/;
# at 497
#170725  2:04:56 server id 1  end_log_pos 590   Query   thread_id=4 exec_time=0 error_code=0
SET TIMESTAMP=1500973496/*!*/;
insert into bptest values(4,'d') /*!*/;
# at 590
#170725  2:04:56 server id 1  end_log_pos 617   Xid = 48
COMMIT/*!*/;
# at 617
#170725  2:05:00 server id 1  end_log_pos 660   Rotate to mysql-bin.000002  pos: 4
DELIMITER ;
# End of log file ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[root@localhost mysql]#

在这个日志文件里面我们可以看到sql语句,且这些语句都位于mariadb操作里面的flush logs之间
现在我们就来进行备份的还原吧
现在我们使用mysql-bin.000001进行操作

[root@localhost mysql]# mysqlbinlog mysql-bin.000001|mysql -uroot -p
Enter password: 
[root@localhost mysql]# 

执行完毕,没有报错,我们再进数据库里面看看是否成功还原备份

MariaDB [bp]> select * from bptest;  #还原前
Empty set (0.00 sec)

MariaDB [bp]> select * from bptest;  #还原后
+------+------+
| id | name | +------+------+
|    3 | c    |
| 4 | d | +------+------+
2 rows in set (0.00 sec)

MariaDB [bp]>

MariaDB YUM安装及忘记密码解决方法

一、添加源

官方源

[mariadb] 
name = MariaDB 
baseurl = http://yum.mariadb.org/10.1/centos7-amd64 
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB 
gpgcheck=1

国内源

[mariadb]
name = MariaDB
baseurl = https://mirrors.tuna.tsinghua.edu.cn/mariadb/yum/10.1/centos7-amd64
gpgkey = https://mirrors.tuna.tsinghua.edu.cn/mariadb/yum//RPM-GPG-KEY-MariaDB
gpgcheck = 1

yum-complete-transaction错误处理

$ yum install yum-utils
$ yum clean all
$ /usr/sbin/yum-complete-transaction --cleanup-only

安装

$ yum install MariaDB-server MariaDB-client MariaDB-devel

二、MariaDB的root密码忘记后的解决方法

编辑/usr/lib/systemd/system/mariadb.service文件,在Service段中添加

# 在Server段中的ExecStart出添加如下;
ExecStart=/usr/bin/mysqld_safe --basedir=/usr --skip-grant-tables --skip-networking

键入systemctl daemon-reload使其立即生效

$ systemctl daemon-reload

重新启动MariaDB服务

$ systemctl restart mariadb.service

完结。

修复由于fstab文件错误导致KVM虚拟机无法启动的问题

最近客户反馈虚拟机在启动的过程中出现报错,详细如下图所示,在与他了解的过程中得知在重启之前在编辑过/etc/fstab文件,估计是因为这个原因造成的,于是有了这个修复的过程。

未分类

通过live CD启动Linux,我这里用的是KALI的系统,当然其它任何带live cd的系统光盘都可以。

未分类

启动完成以后,如果你使用mount /dev/vdb2 /mnt会出现报错:“不知道的文件系统”,这是因为LVM2的磁盘格式没有办法直接mount,需要通过以下步骤才能够进行mount的操作。

1、确保已经安装lvm2

未分类

2、确保能够通过fdisk -lu 识别所有物理卷

未分类

3、运行pvscan扫描所有磁盘的物理卷,这是为了确保您的LVM2硬盘能够被检测到。

未分类

4、运行vgscan扫描卷组

未分类

5、激活所有可用的卷组,这是显示已经激活3个逻辑卷

未分类

6、运行lvscan扫描所有磁盘的逻辑卷。您现在可以看到逻辑卷内的分区已经活动。

未分类

7、mount你需要编辑的逻辑卷至/mnt目录

mount /dev/cl/root /mnt

8、修改fstab文件

vi /mnt/etc/fstab

删除对应两行

未分类

9、重启服务器即可

reboot

未分类

kvm的网络桥接模式与快照管理介绍

先确认系统是否支持虚拟化技术

egrep '(vmx|svm)' --color=always /proc/cpuinfo

安装基本需要的组件

yum install -y qemu-kvm bridge-utils

给qemu-kvm命令添加到环境变量

ln -s /usr/libexec/qemu-kvm /sbin/

转载kvm模块

modprobe kvm

创建一个文件夹,用来存放将要创建的系统文件内容

qemu-img create -f qcow2 -o preallocation=metadata /PATH/FILENAME.img 20G

关闭networkmanager服务,并且创建一个br0的桥接网卡

cp /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-br0

ifcfg-eth0作出如下配置(此时eth0已经虚拟成一个交换机)

未分类

ifcfg-br0作出如下配置

未分类

创建完成使用命令查看

brctl show

未分类

编写一个启动网络脚本

vim /root/qemu-ifup

未分类

使用qemu-kvm创建并启动系统(安装windows 将if=virto改为if=ide即可)

qemu-kvm -cpu host -smp 1 -m 1G -name linux -drive  file=linux.img,media=disk,format=qcow2,if=virtio -drive file=/isofile/CentOS-7-x86_64-Minimal-1503-01.iso,media=cdrom -boot order=dc,once=d -net nic,macaddr=00:00:00:00:00:01 -net tap,script=/root/qemu-ifup -vnc 192.168.3.125:1

在另一个tty查看端口是否打开

未分类

在另外一台安装图形界面的主机安装vnc

yum install -y tigervnc

vncviewer 192.168.3.125:5901

未分类

确认安装程序完成后结束qemu-kvm进程

未分类

使用下面命令基于img磁盘启动(-daemonize后台脱离tty)

qemu-kvm -cpu host -smp 1 -m 1G -name centos7 -drive file=/kvm/linux.img,media=disk,format=qcow2,if=virtio -net nic,macaddr=00:00:00:00:00:01 -net tap,script=/root/qemu-ifup -vnc 192.168.3.125:1 -daemonize

未分类

快照使用方法:

创建快照

qemu-img snapshot -c 快照名称 /系统img/文件

查看创建的快照

qemu-img snapshot -l /系统img文件

快照恢复

qemu-img snapshot -a 快照的id号 /img文件

删除快照

qemu-img snapshot -d 快照id号 /img文件

快照检查(如遇到此类问题 Image is corrupt; cannot be opened read/write)

qemu-img check -r all /img文件位置

实验步骤:

未分类

配置keepalived主从切换时发送告警邮件

邮件脚本:

keepalived_notify.py

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import smtplib
from email.mime.text import MIMEText
from email.header import Header
import sys, time, subprocess



# 第三方 SMTP 服务
mail_host="smtp.exmail.qq.com"  #设置服务器
mail_user="xxx"    #用户名
mail_pass="xxx"   #口令


sender = '[email protected]'    # 邮件发送者
receivers = ['[email protected]', '[email protected]']  # 接收邮件,可设置为你的QQ邮箱或者其他邮箱

p = subprocess.Popen('hostname', shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
hostname = p.stdout.readline().split('n')[0]

message_to = ''
for i in receivers:
    message_to += i + ';'

def print_help():
    note = '''python script.py role ip vip
    '''
    print(note)
    exit(1)

time_stamp = time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))

if len(sys.argv) != 4:
    print_help()
elif sys.argv[1] == 'master':
    message_content = '%s server: %s(%s) change to Master, vIP: %s' %(time_stamp, sys.argv[2], hostname, sys.argv[3])
    subject = '%s change to Master -- keepalived notify' %(sys.argv[2])
elif sys.argv[1] == 'backup':
    message_content = '%s server: %s(%s) change to Backup, vIP: %s' %(time_stamp, sys.argv[2], hostname, sys.argv[3])
    subject = '%s change to Backup -- keepalived notify' %(sys.argv[2])
else:
    print_help()

message = MIMEText(message_content, 'plain', 'utf-8')
message['From'] = Header(sender, 'utf-8')
message['To'] =  Header(message_to, 'utf-8')

message['Subject'] = Header(subject, 'utf-8')

try:
    smtpObj = smtplib.SMTP()
    smtpObj.connect(mail_host, 25)    # 25 为 SMTP 端口号
    smtpObj.login(mail_user,mail_pass)
    smtpObj.sendmail(sender, receivers, message.as_string())
    print("邮件发送成功")
except smtplib.SMTPException as e:
    print("Error: 无法发送邮件")
    print(e)

使用方法:

python script.py{脚本名} role{master|backup} ip{本keepalived服务器IP} vip{虚拟IP}

master keepalived:

global_defs {
        notification_email {
                [email protected]
        }

        notification_email_from [email protected]
        smtp_server 127.0.0.1
        smtp_connect_timeout 30
        router_id LVS_DEVEL
}
## 上面的配置邮件只能发送到本机,mail 可查看


vrrp_script chk_http_port {
        script "</dev/tcp/127.0.0.1/80"
        interval 2
        weight -10
}


vrrp_instance VI_1 {
        state BACKUP        ############ MASTER|BACKUP
        interface ens160
        virtual_router_id 11
        mcast_src_ip 192.168.1.178
        priority 99                  ########### 权值要比 back 高
        advert_int 2

        authentication {
                auth_type PASS
                auth_pass 8u90u3fhE3FQ
        }

        track_script { 
                chk_http_port ### 执行监控的服务 
        }

        virtual_ipaddress {
                192.168.1.96
        }

        notify_master "/bin/python /script/keepalived_notify.py master 192.168.1.178 192.168.1.96"
        notify_backup "/bin/python /script/keepalived_notify.py backup 192.168.1.178 192.168.1.96"

}

backup keepalived:

global_defs {
        notification_email {
                [email protected]
        }

        notification_email_from [email protected]
        smtp_server 127.0.0.1
        smtp_connect_timeout 30
        router_id LVS_DEVEL
}
## 上面的配置邮件只能发送到本机,mail 可查看


vrrp_script chk_http_port {
        script "</dev/tcp/127.0.0.1/80"
        interval 2
        weight -10
}


vrrp_instance VI_1 {
        state BACKUP        ############ MASTER|BACKUP
        interface ens160
        virtual_router_id 11
        mcast_src_ip 192.168.1.174
        priority 100                  ########### 权值要比 back 高
        advert_int 2

        authentication {
                auth_type PASS
                auth_pass 8u90u3fhE3FQ
        }

        track_script { 
                chk_http_port ### 执行监控的服务 
        }

        virtual_ipaddress {
                192.168.1.96
        }

        notify_master "/bin/python /script/keepalived_notify.py master 192.168.1.174 192.168.1.96"
        notify_backup "/bin/python /script/keepalived_notify.py backup 192.168.1.174 192.168.1.96"

}