2018年1月 – 第3页 – Linux系统运维日志

安装Nginx源

执行以下命令：

rpm -ivh http://nginx.org/packages/centos/7/noarch/RPMS/nginx-release-centos-7-0.el7.ngx.noarch.rpm

安装该rpm后，我们就能在/etc/yum.repos.d/ 目录中看到一个名为nginx.repo 的文件。

安装Nginx

安装完Nginx源后，就可以正式安装Nginx了。

yum install -y nginx

Nginx默认目录

输入命令：

whereis nginx

即可看到类似于如下的内容：

nginx: /usr/sbin/nginx /usr/lib64/nginx /etc/nginx /usr/share/nginx

以下是Nginx的默认路径：

(1) Nginx配置路径：/etc/nginx/
(2) PID目录：/var/run/nginx.pid
(3) 错误日志：/var/log/nginx/error.log
(4) 访问日志：/var/log/nginx/access.log
(5) 默认站点目录：/usr/share/nginx/html

事实上，只需知道Nginx配置路径，其他路径均可在/etc/nginx/nginx.conf 以及/etc/nginx/conf.d/default.conf 中查询到。

常用命令

(1) 启动：

nginx

(2) 测试Nginx配置是否正确：

nginx -t

(3) 优雅重启：

nginx -s reload

(4) 查看nginx的进程号：

ps -ef |grep nginx

(5)nginx服务停止

nginx -s stop

kill -9 pid

当然，Nginx也可以编译源码安装，步骤相对要繁琐一些，总的来说，还是比较简单的，本文不作赘述

CentOS 6/7升级最新内核并开启Google BBR

Google BBR是一款TCP加速工具，但要求Linux内核必须大于4.9，之前分享过文章 https://www.xiaoz.me/archives/7945 ，这个方法虽然方便，但是发现在Raksmart上升级失败了，于是尝试手动升级内核。

未分类

CentOS 7升级最新内核

#导入ELRepo 公钥
wget https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm --import RPM-GPG-KEY-elrepo.org
#安装ELRepo
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
#升级最新内核
yum --enablerepo=elrepo-kernel install kernel-ml -y

内核升级完成后老的内核和新的会同时存在，CentOS 7 使用grub2引导程序，需要将最新内核优先级调整最高。先输入命令cat /boot/grub2/grub.cfg|grep menuentry查找所有内核，并找出最新内核的全名，并记录下来，如下截图。

未分类

#设置最新内核（请输入上面查询到的最新内核）
grub2-set-default "CentOS Linux (4.14.14-1.el7.elrepo.x86_64) 7 (Core)"
#设置完毕后，输入下面的命令查看是否成功
grub2-editenv list
[root@test2018119 ~]# grub2-editenv list
saved_entry=CentOS Linux (4.14.14-1.el7.elrepo.x86_64) 7 (Core)
#ok，没问题重启服务器生效
reboot

CentOS 6升级最新内核

#导入ELRepo 公钥
wget https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm --import RPM-GPG-KEY-elrepo.org
#安装ELRepo
rpm -Uvh http://www.elrepo.org/elrepo-release-6-8.el6.elrepo.noarch.rpm
#升级最新内核
yum --enablerepo=elrepo-kernel install kernel-ml -y

升级完毕后修改/etc/grub.conf将default=0修改为default=1，然后reboot重启服务器。

查看内核是否升级成功

输入uname -r可查看当前内核，如果大于4.9说明已经成功了，如果操作后发现你系统网络不通了，估计是升级失败挂掉了，这种情况只能从VNC控制台进入，参照上面的方式修改为原来老的内核启动。

#内核大于4.9
[root@test2018119 ~]# uname -r
4.14.14-1.el7.elrepo.x86_64

开启BBR

直接复制下面的命令即可：

#修改配置
cat >>/etc/sysctl.conf << EOF
net.core.default_qdisc=fq
net.ipv4.tcp_congestion_control=bbr
EOF
#使配置生效
sysctl -p

输入下面的命令来检测，如果看到返回的结果包含bbr 说明成功了，如下截图。

[root@test2018119 ~]# sysctl net.ipv4.tcp_available_congestion_control
net.ipv4.tcp_available_congestion_control = bbr cubic reno
[root@test2018119 ~]# lsmod | grep bbr
tcp_bbr                20480  0

总结

优先推荐使用秋水逸冰的一键脚本升级内核 https://www.xiaoz.me/archives/7945 ，如果失败了可以尝试上述方法手动升级。此方法适用于KVM/XEN虚拟化，OpenVZ虚拟化VPS请不要操作，一般不会成功，建议不要在生产环境操作，以免出现异常。

CentOS上zookeeper集群模式安装

本篇介绍在四个节点的集群中搭建zookeeper环境，zookeeper可配置三种模式运行：单机模式，伪集群模式，集群模式，本文使用集群模式搭建。

安装环境

虚拟机：VMware Workstation 12 Player
Linux版本：CentOS release 6.4 (Final)
zookeeper版本：zookeeper-3.4.5-cdh5.7.6.tar.gz
集群节点：
- master:192.168.137.11 内存1G
- slave1:192.168.137.12 内存512M
- slave2:192.168.137.13 内存512M
- slave3:192.168.137.14 内存512M
前提：java已安装，已配置ssh免密登录，停掉防火墙等。

上传安装包

将下载的zookeeper-3.4.5-cdh5.7.6.tar.gz安装包上传到CentOS指定目录，例如/opt。
上传方法很多，这里在SecureCRT用rz命令。

解压缩安装包：

tar -zxf zookeeper-3.4.5-cdh5.7.6.tar.gz

重命名文件夹：

mv zookeeper-3.4.5-cdh5.7.6 zookeeper

修改配置文件

配置文件在安装目录conf文件夹下的zoo_sample.cfg，需要先复制一个并且改文件名：

[root@master conf]# pwd
/opt/zookeeper/conf
[root@master conf]# cp zoo_sample.cfg zoo.cfg
[root@master conf]# ll
total 16
-rw-rw-r--. 1 root root  535 Feb 22  2017 configuration.xsl
-rw-rw-r--. 1 root root 2693 Feb 22  2017 log4j.properties
-rw-r--r--. 1 root root  808 Jan 23 10:06 zoo.cfg
-rw-rw-r--. 1 root root  808 Feb 22  2017 zoo_sample.cfg

修改zoo.cfg配置文件：

tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/opt/zookeeper/tmp
# the port at which the clients will connect
clientPort=2181
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
dataLogDir=/opt/zookeeper/logs
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888
server.4=slave3:2888:3888

参数说明：

tickTime: zookeeper中使用的基本时间单位, 毫秒值.
dataDir: 数据目录. 可以是任意目录.
dataLogDir: log目录, 同样可以是任意目录. 如果没有设置该参数, 将使用和dataDir相同的设置.
clientPort: 监听client连接的端口号.
initLimit: zookeeper集群中的包含多台server, 其中一台为leader, 集群中其余的server为follower. initLimit参数配置初始化连接时, follower和leader之间的最长心跳时间. 此时该参数设置为5, 说明时间限制为5倍tickTime, 即5*2000=10000ms=10s.
syncLimit: 该参数配置leader和follower之间发送消息, 请求和应答的最大时间长度. 此时该参数设置为2, 说明时间限制为2倍tickTime, 即4000ms.
server.X=A:B:C 其中X是一个数字, 表示这是第几号server. A是该server所在的IP地址. B配置该server和集群中的leader交换消息所使用的端口. C配置选举leader时所使用的端口.

由于我们修改了dataDir目录，在zookeeper目录中创建一个文件夹用于后面创建myid文件：

mkdir /opt/zookeeper/tmp

mkdir /opt/zookeeper/logs

复制安装包到其他节点

将zookeeper文件夹复制到其他三个服务器上：

scp -r /opt/zookeeper/ root@slave1:/opt
scp -r /opt/zookeeper/ root@slave2:/opt
scp -r /opt/zookeeper/ root@slave3:/opt

在master节点上用一下命令给每个节点上创建myid文件，文件中的id号与zoo.cfg配置文件中的对应：

[root@master zookeeper]# echo 1 > /opt/zookeeper/tmp/myid
[root@master zookeeper]# ssh slave1 "echo 2 > /opt/zookeeper/tmp/myid"
[root@master zookeeper]# ssh slave2 "echo 3 > /opt/zookeeper/tmp/myid"
[root@master zookeeper]# ssh slave3 "echo 4 > /opt/zookeeper/tmp/myid"

运行启动

由于没有配置环境变量，需要用全路径执行：

[root@master zookeeper]# /opt/zookeeper/bin/zkServer.sh start
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

其实配置文件中修改dataLogDir的本意是想让启动日志输出到配置的文件夹里，但是好像并没有，日志文件zookeeper.out还是在zookeeper的安装目录下生成。

查看zookeeper.out文件发现有错误：

2018-01-23 10:48:35,470 [myid:] - INFO  [main:QuorumPeerConfig@101] - Reading configuration from: /opt/zookeeper/bin/../conf/zoo.cfg
2018-01-23 10:48:35,484 [myid:] - WARN  [main:QuorumPeerConfig@290] - Non-optimial configuration, consider an odd number of servers.
2018-01-23 10:48:35,484 [myid:] - INFO  [main:QuorumPeerConfig@334] - Defaulting to majority quorums
2018-01-23 10:48:35,512 [myid:4] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2018-01-23 10:48:35,513 [myid:4] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2018-01-23 10:48:35,513 [myid:4] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2018-01-23 10:48:35,536 [myid:4] - INFO  [main:QuorumPeerMain@132] - Starting quorum peer
2018-01-23 10:48:35,587 [myid:4] - INFO  [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2018-01-23 10:48:35,611 [myid:4] - INFO  [main:QuorumPeer@913] - tickTime set to 2000
2018-01-23 10:48:35,612 [myid:4] - INFO  [main:QuorumPeer@933] - minSessionTimeout set to -1
2018-01-23 10:48:35,612 [myid:4] - INFO  [main:QuorumPeer@944] - maxSessionTimeout set to -1
2018-01-23 10:48:35,612 [myid:4] - INFO  [main:QuorumPeer@959] - initLimit set to 10
2018-01-23 10:48:35,639 [myid:4] - INFO  [main:QuorumPeer@429] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2018-01-23 10:48:35,643 [myid:4] - INFO  [main:QuorumPeer@444] - acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2018-01-23 10:48:35,652 [myid:4] - INFO  [Thread-1:QuorumCnxManager$Listener@486] - My election bind port: 0.0.0.0/0.0.0.0:3888
2018-01-23 10:48:35,674 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@670] - LOOKING
2018-01-23 10:48:35,679 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@740] - New election. My id =  4, proposed zxid=0x0
2018-01-23 10:48:35,692 [myid:4] - INFO  [slave3/192.168.137.14:3888:QuorumCnxManager$Listener@493] - Received connection request /192.168.137.11:34491
2018-01-23 10:48:35,704 [myid:4] - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 4 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)
2018-01-23 10:48:35,706 [myid:4] - WARN  [WorkerSender[myid=4]:QuorumCnxManager@368] - Cannot open channel to 2 at election address slave1/192.168.137.12:3888
java.net.ConnectException: Connection refused (Connection refused)
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)
    at java.lang.Thread.run(Thread.java:748)

提示Connection refused的异常，其实一开始先不急着百度这个问题，其实要所有节点上都启动zookeeper后再看看运行状态，现在查看运行状态都是没运行的，也找不到相应的进程：

[root@master zookeeper]# /opt/zookeeper/bin/zkServer.sh start
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@master zookeeper]# /opt/zookeeper/bin/zkServer.sh status
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.

到其他节点服务器上都启动zookeeper，过一会儿后每个服务器查看状态：

[root@master zookeeper]# /opt/zookeeper/bin/zkServer.sh status
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: follower
[root@master zookeeper]# jps
5488 QuorumPeerMain
5539 Jps

如果有Mode和QuorumPeerMain，就说明已经启动成功了。

如果要关闭zookeeper,需要在每个节点上执行：

/opt/zookeeper/bin/zkServer.sh stop

另外如果使用如下命令启动，就会在启动时输出日志信息：

/opt/zookeeper/bin/zkServer.sh start-foreground

批量启动和关闭

一台一台服务器去执行命令有点麻烦，写一个脚本批量执行：

#!/bin/bash
#下面变量修改zookeeper安装目录
zooHome=/opt/zookeeper
if  [ $1 != ""  ]
    then
        confFile=$zooHome/conf/zoo.cfg
        slaves=$(cat "$confFile" | sed '/^server/!d;s/^.*=//;s/:.*$//g;/^$/d')
        for salve in $slaves ; do
            ssh $salve "$zooHome/bin/zkServer.sh $1"
        done
    else
        echo "parameter empty! parameter:start|stop"
fi

将上面脚本保存为zooManager文件，调用执行：

sh zooManager start

sh zooManager stop

[root@master opt]# sh zooManager start
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

由于所有服务器节点都是使用root用户，所以没有考虑权限问题，实际情况要考虑的。

参考：http://coolxing.iteye.com/blog/1871009

CentOS 7 卸载home 扩大root空间

背景：检查gitlab备份服务器时，发现最近几天的备份文件没有根据设置上传到这台服务器。

由于这台服务器较新，除了接收的备份文件外并没有部署其他文件，所以最先可以排除inode不够的可能，但是备份文件大概也就40G+，这个服务器明明分配了100G的空间。

于是我在gitlab所在的服务器上，手动运行一次备份文件命令后，出现了“No space left on device”。

未分类

空间不足？于是我在备份服务器上查看空间占用。

df -h

未分类

发现虽然给了这个CentOS100G的磁盘空间，但root只有50G的可用空间，剩下的空间大都分配给了/home。

查找资料后了解到，centos7默认的root大小为50G，也就是说如果硬件分配时超过50G，大部分剩余空间都会分配给home。

软件如果装在/usr/local目录下，并且data等数据文件也配置在root下，则必须在装机后调整root的大小，否则运行一段时间后很容易导致磁盘空间不足。

看来这样的确是磁盘空间不足引发了这次问题，反正这台虚拟机里我并不需要安装什么服务，如果可以将home去掉，再将空间都给root就解决问题了。于是我查找资料后根据实际情况整理了这篇解决方案。

一、卸载home

1.1 备份home分区文件

tar cvf /tmp/home.tar /home

1.2 修改fstab（这一步非常重要，千万不要漏了）

准备卸载/home文件系统，centos启动时会对/etc/fstab的内容逐一检测，由于fstab默认有/home，如果不修改fstab，重启之后会发现centos跪了。

未分类

所以卸载之前，要先注释掉/home，不让系统开机检测/home。

yum install -y vim
vim /etc/fstab

对于/home的内容增加注释符，wq保存。

#/dev/mapper/centos-home /home                   xfs     defaults        0 0

1.3 安装psmisc

yum install -y psmisc

//Psmisc软件包包含三个帮助管理/proc目录的程序，安装下列程序: fuser、 killall、pstree和pstree.x11(到pstree的链接)

//fuser 显示使用指定文件或者文件系统的进程的PID。

//killall 杀死某个名字的进程，它向运行指定命令的所有进程发出信号。

//pstree 树型显示当前运行的进程。

//pstree.x11 与pstree功能相同，只是在退出前需要确认。

1.4 卸载/home文件系统

umount /home

如果提示无法卸载，是因为有进程占用/home，可以用下面的命令来停止占用的进程。

fuser -km /home/

1.5 删除/home所在的lv

lvremove /dev/mapper/centos-home

接着会出现确认的内容，输入“y”，回车。

未分类

二、扩大root

2.1 扩展/root所在的lv

由于之前/home占用了47G的空间，故我考虑将这些空间都加到/root里。

lvextend -L +47G /dev/mapper/centos-root

可是发现可用的空间并不是47G，应该是系统四舍五入了，减小一点换成48100MB。

这里说明，不去精确设置可用空间的原因是：我自己对如何获取可用空间的方法并不了解，与其花时间去了解这一块，我宁可浪费一点空间（毕竟几十MB的机械硬盘不值多少钱，即使是企业级硬盘）。

lvextend -L +48100M /dev/mapper/centos-root

未分类

出现下面的内容，说明/root所在的lv已经成功拓展成了96.97GB。

2.2 扩展/root文件系统

xfs_growfs /dev/mapper/centos-root

未分类

图中的13107200、25420800根据文件大小换算了下，和50G、97G都不符合，这个问题先搁置下，以后查资料看看。

2.3 检查/root文件系统的空间

df -h

未分类

可以发现/root从原来的50G提升到了97G。

在CentOS上安装Python3的三种方法

未分类

Centos7默认自带了Python2.7版本,但是因为项目需要使用Python3.x你可以按照此文的三个方法进行安装.

注：本文示例安装版本为Python3.5，

一、Python源代码编译安装

安装必要工具 yum-utils ，它的功能是管理repository及扩展包的工具 (主要是针对repository)

$ sudo yum install yum-utils

使用yum-builddep为Python3构建环境,安装缺失的软件依赖,使用下面的命令会自动处理.

$ sudo yum-builddep python

完成后下载Python3的源码包（笔者以Python3.5为例），Python源码包目录： https://www.python.org/ftp/python/ ，截至发博当日Python3的最新版本为 3.7.0

$ curl -O https://www.python.org/ftp/python/3.5.0/Python-3.5.0.tgz

最后一步，编译安装Python3，默认的安装目录是 /usr/local 如果你要改成其他目录可以在编译(make)前使用 configure 命令后面追加参数 “–prefix=/alternative/path” 来完成修改。

$ tar xf Python-3.5.0.tgz
$ cd Python-3.5.0
$ ./configure
$ make
$ sudo make install

至此你已经在你的CentOS系统中成功安装了python3、pip3、setuptools，查看python版本

$ python3 -V

如果你要使用Python3作为python的默认版本，你需要修改一下 bashrc 文件，增加一行alias参数

alias python='/usr/local/bin/python3.5'

由于CentOS 7建议不要动/etc/bashrc文件，而是把用户自定义的配置放入/etc/profile.d/目录中，具体方法为

vi /etc/profile.d/python.sh

输入alias参数 alias python=’/usr/local/bin/python3.5’，保存退出

如果非root用户创建的文件需要注意设置权限

chmod 755 /etc/profile.d/python.sh

重启会话使配置生效

source /etc/profile.d/python.sh

二、从EPEL仓库安装

最新的EPEL 7仓库提供了Python3（python 3.4）的安装源，如果你使用CentOS7或更新的版本的系统你也可以按照下面的步骤很轻松的从EPEL仓库安装。

安装最新版本的EPEL

$ sudo yum install epel-release

用yum安装python 3.4:

$ sudo yum install python34

注意：上面的安装方法并未安装pip和setuptools，如果你要安装这两个库可以使用下面的命令：

$ curl -O https://bootstrap.pypa.io/get-pip.py
$ sudo /usr/bin/python3.4 get-pip.py

三、从SCL(Software Collections)仓库安装

最后一种方法是通过Software Collections (SCL) repository来安装，需要注意的是SCL仓库仅支持CentOS 6.5以上版本，最新版的SCL提供了Python3.3版本，具体安装步骤：

$ sudo yum install python33

从SCL中使用python3，你需要一行命令来启用Python3：

$ scl enable python33 <command>

您还可以使用Python编译器来调用一个bash shell:

$ scl enable python33 bash

总结

笔者建议使用前两种方法，老司机使用方法一编译安装；新手使用方法二yum二进制安装，简单方便。

playbook简介

主要功能：将分组主机按照定义好的playbook执行。
play：定义好的角色task，task一般为ansible的模块。
playbook：将多个play组合在一起，就是playbook
playbook采用yaml语言编写，遵循yaml语法格式。

YAML介绍：

    YAML是一个可读性高的用来表达资料序列的格式。
    YAML参考了其他多种语言，包括：XML、C语言、Python、Perl以及电子邮件格式RFC2822等。
    Clark Evans在2001年在首次发表了这种语言，另外Ingy döt Net与Oren Ben-Kiki也是这语言的共同设计者

YAML特性：

    - YAML的可读性好
    - YAML和脚本语言的交互性好
    - YAML使用实现语言的数据类型
    - YAML有一个一致的信息模型
    - YAML易于实现
    - YAML可以基于流来处理
    - YAML表达能力强，扩展性好

YAML语法：

    - 在单一档案中，可用连续三个连字号(---)区分多个档案。另外，还有选择性的连续三个点号( ... )用来表示档案结尾
    - 次行开始正常写Playbook的内容，一般建议写明该Playbook的功能
    - 使用#号注释代码
    - 缩进必须是统一的，不能空格和tab混用
    - 缩进的级别也必须是一致的，同样的缩进代表同样的级别，程序判别配置的级别是通过缩进结合换行来实现的
    - YAML文件内容和Linux系统大小写判断方式保持一致，是区别大小写的，k/v的值均需大小写敏感
    - k/v的值可同行写也可换行写。同行使用:分隔
    - v可是个字符串，也可是另一个列表
    - 一个完整的代码块功能需最少元素需包括 name: task
    - 一个name只能包括一个task
    - YAML文件扩展名通常为yml或yaml
    - Dictionary：字典，通常由多个key与value构成，也可以将key:value 放置于{}中进行表示，用,分隔多个 key:value

playbook基础组件

Hosts：用于指定要执行指定任务的主机，须事先定义在主机清单中。

示例：

- hosts: websrvs：dbsrvs

remote_user:执行身份

（1）可用于Host和task中。
（2）通过指定其通过sudo的方式在远程主机上执行任务，其可用于play全局或某任务。
（3）可以在sudo时使用sudo_user指定sudo时切换的用户

示例：

- hosts: websrvs
  remote_user: root
  tasks:
   - name: test connection
     ping:
     remote_user: fz.hou
     sudo: yes     默认sudo为root
     sudo_user:fl  sudo为fl

task:任务列表

格式：
(1) action: module arguments
(2) module: arguments 建议使用
注意：shell和command模块后面跟命令，而非key=value
示例：

tasks:
 - name: disable selinux
   command: /sbin/setenforce 0

notify与handlers：
某任务的状态在运行后为changed时，可通过“notify”通知给相应的handlers，继而执行handlers之后的命令。

tags:标签
任务可以通过”tags“打标签，而后可在ansible-playbook命令上使用-t指定进行调用
注意：如果多个任务标签相同，标签被调用时，任务都会被执行。

示例：安装httpd，修改httpd配置文件，并重启服务。

- hosts: webservers
  remote_user: root

  tasks:
    - name: install httpd
      yum: name=httpd
    - name: modify config
      copy: src=~/httpd.conf dest=/etc/httpd/conf/httpd.conf
      tags: modify
      notify: restart httpd
    - name: start httpd
      service: name=httpd state=started enabled=yes

  handlers:
    - name: restart httpd
      service: name=httpd state=restarted

示例结果：

未分类

注意：如果命令或脚本的退出码不为零，可以使用如下方式替代：

tasks:
  - name: run this command and ignore the result
    shell: /usr/bin/somecommand || /bin/true

或者使用ignore_errors来忽略错误信息：

tasks:
  - name: run this command and ignore the result
    shell: /usr/bin/somecommand
    ignore_errors: True

运行playbook

运行playbook的方式

ansible-playbook <filename.yml> ... [options]

常见选项
–check 只检测可能会发生的改变，但不真正执行操作
–list-hosts 列出运行任务的主机
–limit 主机列表只针对主机列表中的主机执行
-v 显示过程 -vv -vvv 更详细

playbook变量

变量名：仅能由字母、数字和下划线组成，且只能以字母开头
变量来源：

1、ansible setup facts 远程主机的所有变量都可直接调用

示例：

ansible myhosts -m setup -a 'filter=ansible_nodename'

filter是用来匹配后面的字符串，可以使用正则表达式。
也可以使用grep过滤，-C选项查看上下文三行。

示例结果：

未分类

2、在/etc/ansible/hosts中定义

普通变量：主机组中主机单独定义，优先级高于公共变量
公共（组）变量：针对主机组中所有主机定义统一变量
普通变量示例：在/etc/ansible/hosts文件中定义

[myhosts]
172.18.18.22 http_port=85 hname=nginx
172.18.18.23 http_port=86 hname=httpd

编写playbook：

cat /root/ansible/vars4.yml

  ---
  - hosts: myhosts
    remote_user: root

    tasks:
     - name: set hostname
       hostname: name={{ hname }}-{{ http_port }}

示例结果：

未分类

公共（组）变量示例：在/etc/ansible/hosts文件中定义

[myhosts:vars]
myh=HFZ

编写playbook：

cat /root/ansible/vars5.yml

  ---
  - hosts: myhosts
    remote_user: root

    tasks:
     - name: set hostname
       hostname: name={{ myh }}-{{ hname }}-{{ http_port }}

示例结果：

未分类

3、通过命令行指定变量，优先级最高

ansible-playbook –e varname=value

示例：

cat /root/ansible/vars.yml

  ---
  - hosts: myhosts
    remote_user: root

    tasks:
     - name: install package
       yum: name={{ pkname }}

示例结果：

未分类

4、在playbook中定义

示例：

cat cat vars2.yml

  ---
  - hosts: myhosts
    remote_user: root
    vars:
     - username: user1
     - groupname: group1

    tasks:
     - name: create group
       group: name={{ groupname }} state=present
     - name: create user
       user: name={{ username }} group{{ groupname }} home=/home/{{ username }}dir

示例结果：

未分类

5、可以在文件中定义变量，并在playbook中调用文件。

示例：在vars.yml文件中定义变量

hi： hello
wd： world

编写playbook：

- hosts: myhosts
  remote_user: root
  vars_files:
   - vars.yml

  tasks:
   - name: create file
     file: name=/root/{{ hi }}-{{ wd }}.log state=touch

示例结果：

未分类

6、在role中定义

playbook中的templates模板

templates特点：

基于Jinja2语言的文本文件,嵌套有脚本。

templates功能：

根据模块文件动态生成对应的配置文件

templates格式：

templates文件必须存放于templates目录下，且命名为 .j2 结尾。

yaml/yml 文件需和templates目录平级，目录结构如下：

./
 ├── temnginx.yml
 └── templates
   └── nginx.conf.j2

Jinja2语言：

使用字面量：

    字符串：使用单引号或双引号
    数字：整数，浮点数
    列表：[item1, item2, ...]
    元组：(item1, item2, ...)
    字典：{key1:value1, key2:value2, ...}
    布尔型：true/false
算术运算：+, -, *, /, //, %, **
比较操作：==, !=, >, >=, <, <=
逻辑运算：and, or, not
流表达式：for、if、when

示例：在centos6与centos7主机上安装httpd服务，并修改相应配置文件。

1、创建文件夹

mkdir ~/ansible/templats -pv

2、拷贝centos6与centos7主机上的httpd配置文件到主机。并修改文件名

ansible myhosts -m fetch -a 'src=/etc/httpd/conf/httpd.conf dest=~/ansible/'

3、复制文件到templats文件夹下并修改文件名，修改文件内容

mv ~/ansible/172.18.18.22/httpd.conf ~/ansible/templats/httpd-7.conf.j2
mv ~/ansible/172.18.18.22/httpd.conf ~/ansible/templats/httpd-6.conf.j2

4、编写playbook,注意httpd.yml与templats文件夹同级

cat httpd.yml 
    - hosts: myhosts
      remote_user: root

      tasks:
        - name: install httpd
          yum: name=httpd

        - name: templates-7
          template: src=httpd-7.conf.j2 dest=/etc/httpd/conf/httpd.conf
          when: ansible_distribution_major_version == "7"
          notify: restart httpd
          tags: conf

        - name: templates-6
          template: src=httpd-6.conf.j2 dest=/etc/httpd/conf/httpd.conf
          when: ansible_distribution_major_version == "6"
          notify: restart httpd
          tags: conf

        - name: start httpd
          service: name=httpd state=started

      handlers:
         - name: restart httpd
           service: name=httpd state=restarted

示例演示：

未分类

playbook迭代

迭代：当有需要重复性执行的任务时，可以使用迭代机制
对迭代项的引用，固定变量名为”item”
要在task中使用with_items给定要迭代的元素列表
列表格式：
字符串
字典

示例：创建固定组，并把新建用户加入到固定组中。

cat items.yml：
    - hosts: myhosts
      remote_user: root

      tasks: 
        - name: create groups
          group: name={{item}}
          with_items:
            - itemgroup1
            - itemgroup2
            - itemgroup3
        - name: create users
          user: name={{item.username}} group={{item.groupname}}
          with_items:
            - {username: 'testuser1',groupname: 'itemgroup1'}
            - {username: 'testuser2',groupname: 'itemgroup2'}
            - {username: 'testuser3',groupname: 'itemgroup3'}

示例结果：

未分类

playbook中template for if

示例：利用for-if和templates编写playbook

cat for-if.yml 
    - hosts: myhosts
      remote_user: root
      vars:
        hosts:
          - {listen_prot: 8080,web: nginx1,name: web1.fz.com}
          - {listen_prot: 8081,web: nginx2,name: web2.fz.com}
          - {listen_prot: 8082,web: nginx3}

      tasks:
        - name: for-if
          template: src=for-if.j2 dest=/root/for-if

cat templates/for-if.j2
    {% for host in hosts %}
    server{
            listen: {{host.listen_prot}};
    {%if host.name is defined%}
            name: {{host.name}};
    {%endif%}
            web: {{host.web}};
    }
    {%endfor%}

示例结果：

未分类

playbook加密

    - ansible-vault：管理加密解密yml文件
    - ansible-vault encrypt hello.yml 加密
    - ansible-vault decrypt hello.yml 解密
    - ansible-vault view hello.yml 查看
    - ansible-vault edit hello.yml 编辑加密文件
    - ansible-vault rekey hello.yml 修改口令
    - ansible-vault create new.yml 创建新文件

查看有无安装

ps -ef | grep vsftpd

出现上图返回的信息表示没有安装ftp

安装vsftpd

yum install vsftpd -y # -y表示确认同意安装

未分类

关闭匿名登录

vi /etc/vsftpd/vsftpd.conf

找到anonymous_enable设置为NO，然后按Esc，输入:wq保存退出

未分类

开启服务

systemctl start vsftpd

查看服务是否开启

ps -ef | grep vsftpd # 查看服务是否开启

设置开机自启

systemctl is-enabled vsftpd # 查看是否为开机自启
systemctl enable vsftpd # 设置开机自启
systemctl is-enabled vsftpd # 成功后你会看到返回enabled

未分类

Ansible简介

Ansible是由Python开发的一个运维工具，因为工作需要接触到Ansible，经常会集成一些东西到Ansible，所以对Ansible的了解越来越多。

那Ansible到底是什么呢？在我的理解中，原来需要登录到服务器上，然后执行一堆命令才能完成一些操作。而Ansible就是来代替我们去执行那些命令。并且可以通过Ansible控制多台机器，在机器上进行任务的编排和执行，在Ansible中称为playbook。

那Ansible是如何做到的呢？简单点说，就是Ansible将我们要执行的命令生成一个脚本，然后通过sftp将脚本上传到要执行命令的服务器上，然后在通过ssh协议，执行这个脚本并将执行结果返回。

那Ansible具体是怎么做到的呢？下面从模块和插件来看一下Ansible是如何完成一个模块的执行

PS：下面的分析都是在对Ansible有一些具体使用经验之后，通过阅读源代码进一步得出的执行结论，所以希望在看本文时，是建立在对Ansible有一定了解的基础上，最起码对于Ansible的一些概念有了解，例如inventory，module，playbooks等

Ansible模块

模块是Ansible执行的最小单位，可以是由Python编写，也可以是Shell编写，也可以是由其他语言编写。模块中定义了具体的操作步骤以及实际使用过程中所需要的参数

执行的脚本就是根据模块生成一个可执行的脚本。

那Ansible是怎么样将这个脚本上传到服务器上，然后执行获取结果的呢？

Ansible插件

connection插件

连接插件，根据指定的ssh参数连接指定的服务器，并切提供实际执行命令的接口

shell插件

命令插件，根据sh类型，来生成用于connection时要执行的命令

strategy插件

执行策略插件，默认情况下是线性插件，就是一个任务接着一个任务的向下执行，此插件将任务丢到执行器去执行。

action插件

动作插件，实质就是任务模块的所有动作，如果ansible的模块没有特别编写的action插件，默认情况下是normal或者async（这两个根据模块是否async来选择），normal和async中定义的就是模块的执行步骤。例如，本地创建临时文件，上传临时文件，执行脚本，删除脚本等等，如果想在所有的模块中增加一些特殊步骤，可以通过增加action插件的方式来扩展。

Ansible执行模块流程

ansible命令实质是通过ansible/cli/adhoc.py来运行，同时会收集参数信息
- 设置Play信息，然后通过TaskQueueManager进行run，
- TaskQueueManager需要Inventory(节点仓库)，variable_manager(收集变量),options(命令行中指定的参数),stdout_callback(回调函数)
在task_queue_manager.py中找到run中
- 初始化时会设置队列
- 会根据options，，variable_manager，passwords等信息设置成一个PlayContext信息(playbooks/playcontext.py)
- 设置插件(plugins)信息callback_loader(回调), strategy_loader(执行策略), module_loader(任务模块)
- 通过strategy_loader（strategy插件）的run（默认的strategy类型是linear，线性执行），去按照顺序执行所有的任务（执行一个模块，可能会执行多个任务）
- 在strategy_loader插件run之后，会判断action类型。如果是meta类型的话会单独执行(不是具体的ansible模块时)，而其他模块时，会加载到队列_queue_task
- 在队列中会调用WorkerProcess去处理，在workerproces实际的run之后，会使用TaskExecutor进行执行
- 在TaskExecutor中会设置connection插件，并且根据task的类型（模块。或是include等）获取action插件，就是对应的模块，如果模块有自定义的执行，则会执行自定义的action，如果没有的会使用normal或者async，这个是根据是否是任务的async属性来决定
在Action插件中定义着执行的顺序，及具体操作，例如生成临时目录，生成临时脚本，所以要在统一的模式下，集成一些额外的处理时，可以重写Action的方法
通过Connection插件来执行Action的各个操作步骤

扩展Ansible实例

执行节点Python环境扩展

实际需求中，我们扩展的一些Ansible模块需要使用三方库，但每个节点中安装这些库有些不易于管理。ansible执行模块的实质就是在节点的python环境下执行生成的脚本，所以我们采取的方案是，指定节点上的Python环境，将局域网内一个python环境作为nfs共享。通过扩展Action插件，增加节点上挂载nfs，待执行结束后再将节点上的nfs卸载。具体实施步骤如下：

扩展代码：

重写ActionBase的execute_module方法

# execute_module

from __future__ import (absolute_import, division, print_function)
__metaclass__ = type

import json
import pipes

from ansible.compat.six import text_type, iteritems

from ansible import constants as C
from ansible.errors import AnsibleError
from ansible.release import __version__

try:
    from __main__ import display
except ImportError:
    from ansible.utils.display import Display
    display = Display()


class MagicStackBase(object):

    def _mount_nfs(self, ansible_nfs_src, ansible_nfs_dest):
        cmd = ['mount',ansible_nfs_src, ansible_nfs_dest]
        cmd = [pipes.quote(c) for c in cmd]
        cmd = ' '.join(cmd)
        result = self._low_level_execute_command(cmd=cmd, sudoable=True)
        return result

    def _umount_nfs(self, ansible_nfs_dest):
        cmd = ['umount', ansible_nfs_dest]
        cmd = [pipes.quote(c) for c in cmd]
        cmd = ' '.join(cmd)
        result = self._low_level_execute_command(cmd=cmd, sudoable=True)
        return result

    def _execute_module(self, module_name=None, module_args=None, tmp=None, task_vars=None, persist_files=False, delete_remote_tmp=True):
        '''
        Transfer and run a module along with its arguments.
        '''

        # display.v(task_vars)

        if task_vars is None:
            task_vars = dict()

        # if a module name was not specified for this execution, use
        # the action from the task
        if module_name is None:
            module_name = self._task.action
        if module_args is None:
            module_args = self._task.args

        # set check mode in the module arguments, if required
        if self._play_context.check_mode:
            if not self._supports_check_mode:
                raise AnsibleError("check mode is not supported for this operation")
            module_args['_ansible_check_mode'] = True
        else:
            module_args['_ansible_check_mode'] = False

        # Get the connection user for permission checks
        remote_user = task_vars.get('ansible_ssh_user') or self._play_context.remote_user

        # set no log in the module arguments, if required
        module_args['_ansible_no_log'] = self._play_context.no_log or C.DEFAULT_NO_TARGET_SYSLOG

        # set debug in the module arguments, if required
        module_args['_ansible_debug'] = C.DEFAULT_DEBUG

        # let module know we are in diff mode
        module_args['_ansible_diff'] = self._play_context.diff

        # let module know our verbosity
        module_args['_ansible_verbosity'] = display.verbosity

        # give the module information about the ansible version
        module_args['_ansible_version'] = __version__

        # set the syslog facility to be used in the module
        module_args['_ansible_syslog_facility'] = task_vars.get('ansible_syslog_facility', C.DEFAULT_SYSLOG_FACILITY)

        # let module know about filesystems that selinux treats specially
        module_args['_ansible_selinux_special_fs'] = C.DEFAULT_SELINUX_SPECIAL_FS

        (module_style, shebang, module_data) = self._configure_module(module_name=module_name, module_args=module_args, task_vars=task_vars)
        if not shebang:
            raise AnsibleError("module (%s) is missing interpreter line" % module_name)

        # get nfs info for mount python packages
        ansible_nfs_src = task_vars.get("ansible_nfs_src", None)
        ansible_nfs_dest = task_vars.get("ansible_nfs_dest", None)

        # a remote tmp path may be necessary and not already created
        remote_module_path = None
        args_file_path = None
        if not tmp and self._late_needs_tmp_path(tmp, module_style):
            tmp = self._make_tmp_path(remote_user)

        if tmp:
            remote_module_filename = self._connection._shell.get_remote_filename(module_name)
            remote_module_path = self._connection._shell.join_path(tmp, remote_module_filename)
            if module_style in ['old', 'non_native_want_json']:
                # we'll also need a temp file to hold our module arguments
                args_file_path = self._connection._shell.join_path(tmp, 'args')

        if remote_module_path or module_style != 'new':
            display.debug("transferring module to remote")
            self._transfer_data(remote_module_path, module_data)
            if module_style == 'old':
                # we need to dump the module args to a k=v string in a file on
                # the remote system, which can be read and parsed by the module
                args_data = ""
                for k,v in iteritems(module_args):
                    args_data += '%s=%s ' % (k, pipes.quote(text_type(v)))
                self._transfer_data(args_file_path, args_data)
            elif module_style == 'non_native_want_json':
                self._transfer_data(args_file_path, json.dumps(module_args))
            display.debug("done transferring module to remote")

        environment_string = self._compute_environment_string()

        remote_files = None

        if args_file_path:
            remote_files = tmp, remote_module_path, args_file_path
        elif remote_module_path:
            remote_files = tmp, remote_module_path

        # Fix permissions of the tmp path and tmp files.  This should be
        # called after all files have been transferred.
        if remote_files:
            self._fixup_perms2(remote_files, remote_user)


        # mount nfs
        if ansible_nfs_src and ansible_nfs_dest:
            result = self._mount_nfs(ansible_nfs_src, ansible_nfs_dest)
            if result['rc'] != 0:
                raise AnsibleError("mount nfs failed!!! {0}".format(result['stderr']))

        cmd = ""
        in_data = None

        if self._connection.has_pipelining and self._play_context.pipelining and not C.DEFAULT_KEEP_REMOTE_FILES and module_style == 'new':
            in_data = module_data
        else:
            if remote_module_path:
                cmd = remote_module_path

        rm_tmp = None
        if tmp and "tmp" in tmp and not C.DEFAULT_KEEP_REMOTE_FILES and not persist_files and delete_remote_tmp:
            if not self._play_context.become or self._play_context.become_user == 'root':
                # not sudoing or sudoing to root, so can cleanup files in the same step
                rm_tmp = tmp

        cmd = self._connection._shell.build_module_command(environment_string, shebang, cmd, arg_path=args_file_path, rm_tmp=rm_tmp)
        cmd = cmd.strip()
        sudoable = True
        if module_name == "accelerate":
            # always run the accelerate module as the user
            # specified in the play, not the sudo_user
            sudoable = False


        res = self._low_level_execute_command(cmd, sudoable=sudoable, in_data=in_data)

        # umount nfs
        if ansible_nfs_src and ansible_nfs_dest:
            result = self._umount_nfs(ansible_nfs_dest)
            if result['rc'] != 0:
                raise AnsibleError("umount nfs failed!!! {0}".format(result['stderr']))

        if tmp and "tmp" in tmp and not C.DEFAULT_KEEP_REMOTE_FILES and not persist_files and delete_remote_tmp:
            if self._play_context.become and self._play_context.become_user != 'root':
                # not sudoing to root, so maybe can't delete files as that other user
                # have to clean up temp files as original user in a second step
                tmp_rm_cmd = self._connection._shell.remove(tmp, recurse=True)
                tmp_rm_res = self._low_level_execute_command(tmp_rm_cmd, sudoable=False)
                tmp_rm_data = self._parse_returned_data(tmp_rm_res)
                if tmp_rm_data.get('rc', 0) != 0:
                    display.warning('Error deleting remote temporary files (rc: {0}, stderr: {1})'.format(tmp_rm_res.get('rc'), tmp_rm_res.get('stderr', 'No error string available.')))

        # parse the main result
        data = self._parse_returned_data(res)

        # pre-split stdout into lines, if stdout is in the data and there
        # isn't already a stdout_lines value there
        if 'stdout' in data and 'stdout_lines' not in data:
            data['stdout_lines'] = data.get('stdout', u'').splitlines()

        display.debug("done with _execute_module (%s, %s)" % (module_name, module_args))
        return data

集成到normal.py和async.py中，记住要将这两个插件在ansible.cfg中进行配置

from __future__ import (absolute_import, division, print_function)
__metaclass__ = type

from ansible.plugins.action import ActionBase
from ansible.utils.vars import merge_hash

from common.ansible_plugins import MagicStackBase


class ActionModule(MagicStackBase, ActionBase):

    def run(self, tmp=None, task_vars=None):
        if task_vars is None:
            task_vars = dict()

        results = super(ActionModule, self).run(tmp, task_vars)
        # remove as modules might hide due to nolog
        del results['invocation']['module_args']
        results = merge_hash(results, self._execute_module(tmp=tmp, task_vars=task_vars))
        # Remove special fields from the result, which can only be set
        # internally by the executor engine. We do this only here in
        # the 'normal' action, as other action plugins may set this.
        #
        # We don't want modules to determine that running the module fires
        # notify handlers.  That's for the playbook to decide.
        for field in ('_ansible_notify',):
            if field in results:
                results.pop(field)

        return results

配置ansible.cfg，将扩展的插件指定为ansible需要的action插件
重写插件方法，重点是execute_module
执行命令中需要指定Python环境，将需要的参数添加进去nfs挂载和卸载的参数

ansible 51 -m mysql_db -a "state=dump name=all target=/tmp/test.sql" -i hosts -u root -v -e "ansible_nfs_src=172.16.30.170:/web/proxy_env/lib64/python2.7/site-packages ansible_nfs_dest=/root/.pyenv/versions/2.7.10/lib/python2.7/site-packages ansible_python_interpreter=/root/.pyenv/versions/2.7.10/bin/python"
1
ansible 51 -m mysql_db -a "state=dump name=all target=/tmp/test.sql" -i hosts -u root -v -e "ansible_nfs_src=172.16.30.170:/web/proxy_env/lib64/python2.7/site-packages ansible_nfs_dest=/root/.pyenv/versions/2.7.10/lib/python2.7/site-packages ansible_python_interpreter=/root/.pyenv/versions/2.7.10/bin/python"

一些基础概念

ansible是什么？
它是一个”配置管理工具”，它是一个”自动化运维工具”，如果你没有使用过任何配置管理工具，不要害怕，看完这篇文章，你自然会对ansible有所了解。

ansible能做什么？
正如其他配置管理工具一样，ansible可以帮助我们完成一些批量任务，或者完成一些需要经常重复的工作。
比如：同时在100台服务器上安装nginx服务，并在安装后启动它们。
比如：将某个文件一次性拷贝到100台服务器上。
比如：每当有新服务器加入工作环境时，你都要为新服务器部署redis服务，也就是说你需要经常重复的完成相同的工作。
这些场景中我们都可以使用到ansible。

看到这里，你可能会说，我编写一些脚本，也能够满足上面的工作场景，为什么还要使用ansible呢？没错，使用脚本也可以完成这些工作，不过我还是推荐你使用ansible，因为ansible支持一些优秀的特性，比如”幂等性”，”幂等性”是什么意思呢？举个例子，你想把一个文件拷贝到目标主机的某个目录上，但是你不确定此目录中是否已经存在此文件，当你使用ansible完成这项任务时，就非常简单了，因为如果目标主机的对应目录中已经存在此文件，那么ansible则不会进行任何操作，如果目标主机的对应目录中并不存在此文件，ansible就会将文件拷贝到对应目录中，说白了，ansible是”以结果为导向的”，我们指定了一个”目标状态”，ansible会自动判断，”当前状态”是否与”目标状态”一致，如果一致，则不进行任何操作，如果不一致，那么就将”当前状态”变成”目标状态”，这就是”幂等性”，”幂等性”可以保证我们重复的执行同一项操作时，得到的结果是一样的，这种特性在很多场景中相对于脚本来说都有一定优势，单单这样说，可能并不容易理解，当你在后面真正使用到时，自然会有自己的体会，所以此处不用纠结，继续向下看。

如果你了解过其他的配置管理工具，比如puppet或者saltstack，那么你一定知道，如果我们想要使用puppet管理100台主机，就要在这100台主机上安装puppet对应的agent（客户端代理程序），而ansible则不同，ansible只需要依赖ssh即可正常工作，不用在受管主机上安装agent，也就是说，只要你能通过ssh连接到对应主机，你就可以通过ansible管理对应的主机。

经过上述描述，我想你应该对ansible已经有了一个初步的、大概的印象：
ansible是一个配置管理工具，可以帮助我们完成一些批量工作或者重复性工作，ansible通过ssh管理其他受管主机，并且具有一些特性，比如幂等性、剧本、模板，角色等，我们会慢慢的介绍这些特性以及怎样使用ansible。

怎样使用ansible呢？我们通过一条简单的命令开始认识它吧，命令如下
注：执行如下命令前，需要进行一些配置，如下命令才能正常执行，后文中会对这些操作进行描述，此处先行略过
Shell

ansible 10.1.1.60 -m ping

上述命令表示，使用ansible去ping 10.1.1.60这台主机，很容易理解吧。
“ping”是ansible中的一个模块，这个模块的作用就是ping对应的主机，ansible调用ping模块，就相当于我们手动执行ping命令一样，上述命令中的”-m ping”表示调用ping模块，当然，ansible肯定不止这一个模块，它有很多模块，不同的模块可以帮助我们完成不同的工作，你应该已经猜到了，我们在实际使用时，会使用到各种模块，ansible是基于这些模块完成实际任务的。

刚才，我们使用了一个简单的ansible命令作为示例，但是如果想要让上述命令正常执行，则必须同时满足两个最基本的条件，如下
条件一、ansible所在的主机可以通过ssh连接到受管主机。
条件二、受管主机的IP地址等信息已经添加到ansible的”管理清单”中。

之前说过，ansible不用在受管主机上安装agent，但是它需要依赖ssh，所以，条件一并不难理解，但是，在满足条件一的情况下，还要同时满足条件二，也就是说，即使ansible所在的主机能够通过ssh连接到受管主机，仍然需要将受管主机的IP地址、ssh端口号等信息添加到一个被称作为”清单(Inventory)”的配置文件中，如果对应的主机信息在ansible的”清单”中不存在，那么ansible则无法操作对应主机，后文会详细的介绍怎样配置ansible的”清单”。

好了，基本概念先了解到这里，现在需要动动手了。

一些基础配置

我们首先要做的就是安装ansible。
但是在安装之前，先介绍一下我的演示环境。
我有四台主机，IP地址分别如下

10.1.1.71

10.1.1.70

10.1.1.61

10.1.1.60

我将主机10.1.1.71（后文中简称71）作为配置管理主机，所以我们需要在71上安装ansible，剩下的主机作为受管主机，主机71和主机70的的操作系统版本为centos7.4，主机61和主机60的操作系统版本为centos6.9。

我使用yum源的方式安装ansible，因为安装ansible需要epel源，所以我配置了阿里的epel源和centos7系统镜像源，yum源配置如下
Shell

# pwd
/etc/yum.repos.d

# cat aliBase.repo
[aliBase]
name=aliBase
baseurl=https://mirrors.aliyun.com/centos/$releasever/os/$basearch/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/centos/$releasever/os/$basearch/RPM-GPG-KEY-CentOS-$releasever

# cat aliEpel.repo
[aliEpel]
name=aliEpel
baseurl=https://mirrors.aliyun.com/epel/$releaseverServer/$basearch/
enabled=1
gpgcheck=0

yum源配置完成后，安装ansible

yum install ansible

此时yum源中对应的版本为ansible-2.4.2.0-1

安装完毕，不过别急，我们还需要做一些其他的基本配置，在介绍ansible的概念时，我们说过，如果想要通过ansible管理某主机，还需要将对应主机的信息添加到ansible的”配置清单”中，清单中没有的主机无法通过ansible进行配置管理，现在，我们就来介绍一下ansible的”清单”，当安装完ansible以后，ansible会提供一个默认的”清单”，这个清单就是/etc/ansible/hosts，打开此文件，你会看到一些配置示例，没错，还是熟悉的配方，还是熟悉的味道，此文件使用的就是INI的配置风格，那么，我们一起来看看怎样进行配置吧。

以我们的演示环境为例，我们想要通过ansible主机管理60主机，所以，最直接的方式就是将它的IP地址写入到/etc/ansible/hosts文件中，配置如下，在/etc/ansible/hosts文件底部写入如下IP
10.1.1.60
就是这么简单，那么，完成上述配置，就能够通过ansible主机管理10.1.1.60这台主机了吗？我们来动手试试，看看会发生什么情况。

执行之前的示例命令：ansible 10.1.1.60 -m ping
使用ansible去ping主机10.1.1.60，返回结果如下

未分类

从命令的返回信息中可以看到，10.1.1.60不可达，也就是说，ansible无法通过ssh连接到主机60。
返回上述信息是正常的，因为ansible主机并不知道10.1.1.60这台主机的用户名和密码，所以ansible无法通过ssh连接到它。
所以，我们还需要在清单中，配置10.1.1.60主机的ssh信息，才能够进行正确的进行连接，配置示例如下：

未分类

修改清单文件，在之前的主机IP后加入ssh的相关配置信息，如上图所示
ansible_port 用于配置对应主机上的sshd服务端口号，在实际的生产环境中，各个主机的端口号通常不会使用默认的22号端口，所以用此参数指定对应端口。
ansible_user 用于配置连接到对应主机时所使用的用户名称。
ansible_ssh_pass 用于配置对应用户的连接密码。
所以，上图中的配置表示，10.1.1.60这台主机的sshd服务监听在22号端口，当ansible通过ssh连接到主机60时，会使用主机60的root用户进行连接，主机60的root用户的密码为123123
好了，主机60的ssh信息已经配置完毕，我们再来尝试一下，看看之前的命令能不能正常执行，如下

未分类

可以看到，上述命令已经正常执行了，ansible主机成功的ping通了10.1.1.60，从此以后，我们就可以通过ansible主机，管理10.1.1.60这台主机了。

其实，为了更加方便的使用，ansible还支持对主机添加别名，当主机存在别名时，我们可以通过主机的”别名”管理对应主机。
比如，我们想要将10.1.1.60这台主机的别名命名为test60，那么，我们在配置清单时，可以进行如下配置

未分类

如上图所示，当为主机配置别名时，主机的IP地址必须使用anible_host关键字进行指明，否则ansible将无法正确的识别对应的主机。
主机的别名配置完成后，则可以使用主机的别名管理对应主机，示例如下。

未分类

不过，如果你只使用了上述方式配置了主机，则无法通过主机的IP进行管理了，除非你同时使用了别名的方式与IP的方式配置两个主机条目。

注意：上述配置参数都是ansible2.0版本以后的写法，2.0版本之前，应遵从如下写法

ansible_port应该写成ansible_ssh_port
ansible_user应该写成ansible_ssh_user
ansible_host应该写成ansible_ssh_host

因为当前演示环境的ansible版本为2.4，所以，我们使用新的写法进行演示，2.4版本同时也兼容之前的语法。

上述参数，其实都是为了创建ssh连接所使用的，而说到ssh，我们都知道，创建ssh连接时，可以基于密码进行认证，也可以基于密钥进行认证，而在生产环境中，为了提高安全性，我们通常会基于密钥进行ssh认证，甚至会禁用密码认证，那么，当ansible主机需要与受管主机建立ssh连接时，能够基于密钥进行认证码？必须能的。
其实，在实际的使用环境中，我们通常会在”配置管理机（ansible主机）”中生成密钥，然后通过公钥认证的方式连接到对应的受管主机中，如果你对基于密钥认证的方式还不是特别了解，则可以参考如下文章，此处不再对相应配置进行详细的描述：

http://www.zsythink.net/archives/2375

那么，我们就在ansible主机中生成密钥，并进行相应的配置吧。
首先，生成默认格式的密钥对，私钥与公钥。
Shell

# ssh-keygen

然后将生成的公钥加入到10.1.1.60的认证列表

# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]

好了，公钥认证的相关操作配置完成，此刻，我们已经可以通过ansible主机免密码连接到主机60中了。

因为配置了密钥认证，所以可以实现免密码创建ssh连接，既然已经能够免密码创建ssh连接，那么在配置”主机清单”时，就没有必要再提供对应主机的用户名与密码了，所以，在完成了密钥认证的相关配置后，我们可以将清单中的配置精简为如下格式。

或者使用别名的格式

当然，如果你的受管服务器中的sshd服务使用了默认的22号端口，上述配置中的ansible_port也是可以省略的，为了方便演示，演示环境中的所有受管主机均使用默认的sshd端口号。

如果你的ansible主机上同时存在多对密钥，有可能需要通过不同的密钥连接不同的受管主机，这个时候，你可以通过ssh-agent帮助我们管理密钥，如果你还不了解ssh-agent，那么可以参考如下文章：

http://www.zsythink.net/archives/2407

如果你不想使用ssh-agent管理密钥，也可以通过ansible_ssh_private_key_file参数，指定连接对应主机时所使用的私钥，由于演示环境中并没有同时使用多对密钥，所以此处不再赘述。

在今后的演示中，默认使用密钥认证的方式连接到对应主机，我会提前配置好各个受管主机的密钥认证，后文中将不再对密钥认证的配置过程进行描述。
好了，说了这么多，我想你应该已经了解了ansible的基本概念，以及ansible的一些最基本的配置，在之后的文章中，我们会徐徐渐进，慢慢的介绍ansible的。