2017年9月 – 第2页 – Linux系统运维日志

定时清除gitlab runner产生的npm僵死进程

在使用gitlab runner执行npm install安装模块时，有可能npm一直不退出，导致占满runner的可用进程数，后面再执行pipeline就会出现pending的状态，一直等待不执行。下面我们使用一个脚本定时检测僵死的npm进程，并kill掉它们。
脚本如下：

for i in `ps -eo pid,etimes,cmd | grep npm | awk '{if($2 > 3600){print $1}}'`;do
    kill $i
done

加入到定时任务，每10分钟执行一次。

*/10 * * * * /data/sh/clean_death_npm.sh

shell中字符串截取命令：cut，printf，awk，sed

一、cut

cut 命令不能在分割符是空格的字符串中截取列，只能是制表符或具体的分割符。

1、选项

-b ：仅显示行中指定直接范围的内容；
-c ：仅显示行中指定范围的字符；
-d ：指定字段的分隔符，默认的字段分隔符为”TAB”；
-f ：显示指定字段的内容；
-n ：与”-b”选项连用，不分割多字节字符；
–complement ：补足被选择的字节、字符或字段；
–out-delimiter=<字段分隔符> ：指定输出内容是的字段分割符；
–help ：显示指令的帮助信息；
–version ：显示指令的版本信息。

2、使用

-d :分隔符（ –delimiter 按照指定分隔符分割列）
-b : 表示字节
-c : 表示字符
-f : 表示字段(列号) （ –field 提取第几列）
N- : 从第N个字节、字符、字段到结尾
N-M : 从第N个字节、字符、字段到第M个
-M : 从第一个字节、字符、字段到第M个

$> cat user.txt
01, zhang, M, 18
02, wang, M, 20
03, li, M, 21

# 以","分隔，显示第二列
$> cut -d "," -f 2 user.txt
 zhang
 wang
 li

# 以","分隔，显示第1列和第3列
$> cut -d "," -f 1,3 user.txt
01, M,
02, M,
03, M,

# 以","分隔，显示第1-3列
$> cut -d "," -f 1-3 user.txt
01, zhang, M,
02, wang, M,
03, li, M,

# 以","分隔，显示除第1列以外的其他列
$> cut -d "," -f 1 --complement user.txt
 zhang, M, 18
 wang, M, 20
 li, M, 21
## 注意前面是有空格的！

###### 字符串
$> cut -c1-5 user.txt
01, zh
02, wa
03, li

二、printf

printf ‘输出类型输出格式’ 内容

1、输出类型

%ns: 输出字符串。n是数字，指输出几个字符
%ni: 输出整数。n是数字，指输出几个数字
%m.nf: 输出浮点数。m和n是数字，分别指输出的整数位数和小数位数。如%8.2f代表共输出8位数，其中2是小数，6是整数。

2、输出格式

a : 输出警告声音
b : 输出退格键，也就是Backspace键
f : 清除屏幕
n : 换行
r : 回车，也就是Enter键
t : 水平输出退格键，也就是Tab键
v : 垂直输出退格键，也就是Tab键

printf %s 1 2 3 4 5 6 # 把123456当成一个字符串输出，没有格式
printf %s %s %s 1 2 3 4 5 6 # 把%s%s123456当做字符串输出，没有格式
printf '%s ' 1 2 3 4 5 6 # 把1 2 3 4 5 6当做字符串输出，输出格式为空格
printf '%sn' 1 2 3 4 5 6 # 输出格式为1个一行
printf '%s %s %s' 1 2 3 4 5 6 # 把内容当做字符串三个为一组输出，1 2 34 5 6
printf '%s %s %sn' 1 2 3 4 5 6 # 输出格式为3个一行
printf '%s' $(cat user.txt) # 输出文本内容为字符串
printf '%st %st %st %sn' $(cat user.txt)　　# 把文本内容格式化输出

建议 man printf 查看可用的输出格式，和C的 printf 格式一样。

三、awk

1、命令

awk ‘条件1{动作1} 条件2{动作2} …’ 文件名
（如果条件1，执行动作1；如果条件2，执行动作2 ）

2、条件（pattern）：一般使用关系表达式作为条件

x > 10 : 判断变量x是否大于10
x >= 10 : 大于等于
x <= 10 : 小于等于

3、动作（Action）

格式化输出
流程控制语句

4、例子

# 大括号前面没有条件，直接执行命令，这里的printf 是awk的命令，$2 提取文件第二列，$3 提取文件第三列，$0 提取所有列
awk '{printf $2 "t" $3"n"}' user.txt

# 打印三列，这里的print是awk的命令，系统并没有print命令，所以只能在awk里使用；与printf的区别是：print会自动在行尾加换行符，而printf不会
df -h | awk '{print $1 "t" $5 "t" $6 "t"}'

# 提取系统已使用硬盘空间，可以把结果赋给一个变量，判断是否大于某个值，进行报警
df -h | grep sda3 | awk '{print $5}' | cut -d '%' -f 1

# 查看剩余内存不包含单位M
free -h | grep Mem | awk '{print $4}' | cut -d 'M' -f 1

5、说明

grep 取行，awk 按条件取指定列，cut 按分隔符取指定列。
BEGIN：先执行一条多余的动作

awk 'BEGIN{print "this is a text"} {print $2 "t" $3}' user.txt

END：用于在所有命令处理完之后执行
FS内置变量：用于定义分割符，如果需要手工定义分割符，一定要在分割符前面加BEGIN；

awk 'BEGIN{FS=":"} END{print "this is end text"} {print $1 "t" $3}' /etc/passwd

BEGIN、END也是条件。
关系运算符：

# user.txt中不包含ID这行，提取满足条件为第四列值大于18的第二列
cat user.txt | grep -v ID | awk '$4 > 18 {printf $2 "n"}'

四、sed

sed主要是用来将数据进行选取、替换、删除、新增的命令。可以放在管道符之后处理。

1、命令

sed [选项] ‘[动作]’ 文件名
sed命令有两种形式：sed [options] ‘command’ file(s)；sed [options] -f scriptfile file(s)

2、选项

-n : 一般sed命令会把所有数据都输出到屏幕；如果加入此选项，则只会把经过sed命令处理的行输出到屏幕。
- sed -n ‘2p’ user.txt # 输出第二行
-e : 允许对输入数据应用多条sed命令编辑
-f : 添加脚本文件的内容到执行的动作
-i : 用sed的修改结果直接修改读取数据的文件，而不是由屏幕输出

3、动作：（要加双引号）

`a` : 追加，在当前行后添加一行或多行。添加多行时，除最后一行外，每行末尾需要用””代表数据未完结。
`c` : 行替换，用c后面的字符串替换原数据行，替换多行时，除最后一行外，每行末尾需要用””代表数据未完结。
`i` : 插入，在当前行前插入一行或多行。插入多行时，除最后一行外，每行末尾需要用””代表数据未完结。
d : 删除，删除指定的行。
p : 打印，输出指定的行。
s : 字串替换，用一个字符串替换另外一个字符串。格式为“行范围s/旧字串/新字串/g”（和vim中的替换格式类似）

4、例子

sed -n '2p' user.txt    　　　　# 输出第二行, p一般都要和-n使用，不-n会显示出所有的行
df -h | sed -n '2p'    　　  　　# 管道符结果作为操作内容
sed '2,4d' user.txt    　　　　# 删除文件的第2行到第4行，显示剩下的行，没有加 i 选项，不会更改文件内容
sed '2a hello' user.txt    　　# 在第二行后追加hello，仅仅修改命令输出
sed '2i hello 
　　world' user.txt    　　　　# 在第二行前插入两行数据，仅仅修改命令输出
sed '2c No person' user.txt    # 把第二行替换为No person
sed '2s/M/F/g' user.txt    　　# 把第二行的M替换为F后输出
sed -i '2s/M/F/g' user.txt　　# 把替换后的结果写入文件
sed -e 's/zhang//g ; s/wang//g' user.txt    # -e 允许多条命令顺序执行，用分号隔开，s前面不加数字表示所有行

在正常情况下，sed将待处理的行读入模式空间，脚本中的命令就一条接着一条的对该行进行处理，直到脚本执行完毕，然后该行被输出，模式空间请空；然后重复刚才的动作，文件中的新的一行被读入，直到文件处理完备。但是，各种各样的原因，比如用户希望在某个条件下脚本中的某个命令被执行，或者希望模式空间得到保留以便下一次的处理，都有可能使得sed在处理文件的时候不按照正常的流程来进行。这个时候，sed设置了一些高级命令来满足用户的要求。如果想要学习sed的高级命令，首先要了解如下两个缓存区：

1、模式空间(pattern space)的定义：模式空间就是一个缓存区，保存sed刚刚从输入端读取的。
2、暂存空间(hold space)的定义：暂存空间就是在处理模式空间数据的时候，临时缓存数据用的。

还有几个命令参数：

g：将hold space中的内容拷贝到pattern space中，原来pattern space里的内容清除
G：将hold space中的内容append到pattern spacen后
h：将pattern space中的内容拷贝到hold space中，原来的hold space里的内容被清除
H：将pattern space中的内容append到hold spacen后
x：交换pattern space和hold space的内容

比如咱们想要倒排一个文件的内容，文件如下：

[[email protected] ~]$ cat tmp
  1-line
  2-line
  3-line

执行如下命令：

[[email protected] ~]$ sed '2,$G;h;$!d' tmp
  3-line
  2-line
  1-line

下面咱们逐步理解上面的执行过程

一、让咱们来分析一下如下三条命令:

2,$G:从第二行到最后一行执行G命令
h:执行h命令
$!d:删除除了最后一行的所有行

二、具体的操作

1、扫描到第一行

将1-line放入模式空间；此时模式空间还是1-line；
直接执行h命令，此时暂存空间是1-line；
执行d命令，删除了模式空间仅有的一行数据，删除之后，模式空间是空的

2、扫描到第二行

将2-line放入模式空间
执行G命令，将暂存空间的1-line添加到模式空间2-line的后面，此时模式空间是2-linen1-line；
执行h命令，此时暂存空间的内容是2-linen1-line；
执行d命令，模式空间被清空

3、扫描到第三行

将3-line放入模式空间，
执行G命令，将暂存空间的2-linen1-line添加到模式空间3-line的后面，此时模式空间是3-linen2-linen1-line；
执行h命令，此时暂存空间的内容是3-linen2-linen1-line；
不执行$!d；

4、直接输出 3-linen2-linen1-line

当然，命令：sed ‘1!G;h;$!d’ tmp 也能有这个效果。

搭建saltstack的备份机器

高可用是运维的基本要求之一，那么运维自身的工具首先要达到这个要求。因此，需要给saltstack做个主备，以下是过程，非常简单：

一、同步主saltstack的文件到备机

rsync -av /etc/salt/* slave_salt_host:/etc/salt/

二、启动备机的salt-master

/etc/init.d/salt-master restart

三、修改minion端的配置，增加备机的IP

master:

– 10.2..

部署python web环境

在这篇文章里，我们将搭建一个简单的 Web 应用，在虚拟环境中基于 Flask 框架，用 Gunicorn 做 wsgi 容器，用 Supervisor 管理进程，然后使用 OneAPM Python 探针来监测应用性能，形成一个「闭环」！希望能对大家有所帮助，首先简单来介绍一下环境：

系统环境：ubuntu 14.04 Python 2.7.6

安装组件库

第一步安装所需要的存储库，因为打算用到虚拟环境，用到 pip 安装和管理 Python 组件，所以先更新本地包，然后安装组件：

sudo apt-get update
sudo apt-get install python-pip python-dev nginx

创建虚拟环境 virtualenv

在一个系统中创建不同的 Python 隔离环境，相互之间还不会影响，为了使系统保持干净，遂决定用 virtualenv 跑应用程序，创建一个容易识别的目录，开始安装，再创建项目目录 super，然后激活环境：

sudo pip install virtualenv
mkdir ~/supervisor && cd ~/supervisor
virtualenv super
source super/bin/activate

安装 Flask 框架

好了，现在在虚拟环境里面，开始安装 Flask 框架，flask 依赖两个库 werkzeug 和 jinjia2, 采用 pip 方式安装即可, pip 是一个重要的工具，Python 用它来管理包：

pip install flask

先用 Flask 写一个简单的 Web 服务 myweb.py ，因为后面要做一些测试，所以设置了几个请求：

from flask import Flask
app = Flask(__name__)
@app.route('/')
def index():
    return 'hello world  supervisor gunicorn '
@app.route('/1')
def index1():
    return 'hello world  supervisor gunicorn  ffffff'
@app.route('/qw/1')
def indexqw():
    return 'hello world  supervisor gunicorn fdfdfbdfbfb '
if __name__ == '__main__':
    app.debug = True
    app.run()

启动 Flask 看看！

python myweb.py

在浏览器中访问 http://127.0.0.1:5000 就可以看到了「几个路径都试一试」

用 Gunicorn 部署 Python Web

现在我们使用 Flask 自带的服务器，完成了 Web 服务的启动。生产环境下，Flask 自带的服务器，无法满足性能要求。所以我们这里采用 Gunicorn 做 wsgi 容器，用来部署 Python，首先还是安装 Gunicorn：

pip install gunicorn

当我们安装好 Gunicorn 之后，需要用 Gunicorn 启动 Flask，Flask 用自带的服务器启动时，Flask 里面的 name 里面的代码启动了 app.run()。而这里我们使用 Gunicorn，myweb.py 就等同于一个库文件，被 Gunicorn 调用，这样启动：

gunicorn -w 4 -b 0.0.0.0:8000 myweb:app

其中 myweb 就是指 myweb.py，app 就是那个 wsgifunc 的名字，这样运行监听 8000 端口，原先的 5000 端口并没有启用，-w 表示开启多少个 worker，-b 表示 Gunicorn 开发的访问地址。

想要结束 Gunicorn 只需执行 pkill Gunicorn，但有时候还要 ps 找到 pid 进程号才能 kill。可是这对于一个开发来说，太过于繁琐，因此出现了另外一个神器 —supervisor，一个专门用来管理进程的工具，还可以管理系统的工具进程。

安装 Supervisor

pip install supervisor

echo_supervisord_conf > supervisor.conf  # 生成 supervisor 默认配置文件
gedit  supervisor.conf                   # 修改 supervisor 配置文件，添加 gunicorn 进程管理

在 supervisor.conf 底部下添加 myweb.py 的配置 /home/wang/supervisor/super 是我的项目目录

[program:myweb]
command=/home/wang/supervisor/super/bin/gunicorn -w 4 -b 0.0.0.0:8000 myweb:app                                                                    
directory=/home/wang/supervisor                                            
startsecs=0                                                                  
stopwaitsecs=0                                                                  
autostart=false                                                                
autorestart=false                                                                
user=wang                                                                    
stdout_logfile=/home/wang/supervisor/log/gunicorn.log                  
stderr_logfile=/home/wang/supervisor/log/gunicorn.err

supervisor 的基本使用命令：

supervisord -c supervisor.conf    
supervisorctl -c supervisor.conf status                  查看supervisor的状态                                      
supervisorctl -c supervisor.conf reload                  重新载入 配置文件
supervisorctl -c supervisor.conf start [all]|[appname]   启动指定/所有 supervisor 管理的程序进程
supervisorctl -c supervisor.conf stop [all]|[appname]    关闭指定/所有 supervisor 管理的程序进程

supervisor 还有一个 web 的管理界面，可以激活。更改下配置：

[inet_http_server]     ; inet (TCP) server disabled by default
port=127.0.0.1:9001    ; (ip_address:port specifier, *:port for alliface)
username=wang          ; (default is no username (open server)
password=123           ; (default is no password (open server))
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL  for a unix socket
serverurl=http://127.0.0.1:9001       ; use an http:// url to specify an inet socket
username=wang                         ; should be same as http_username if set
password=123                          ; should be same as http_password if set
;prompt=mysupervisor                  ; cmd line prompt (default "supervisor")
;history_file=~/.sc_history           ; use readline history if available

现在可以使用 supervsior 启动 gunicorn 啦。运行命令:

supervisord -c supervisor.conf

浏览器访问 http://127.0.0.1:9001 可以得到 supervisor 的 web 管理界面，访问http://127.0.0.1:8000 可以看见 gunicorn 启动的返回的页面。

配置 Nginx

前面我们已经在系统环境下安装了 Nginx, 安装好的 Nginx 二进制文件放在 /usr/sbin/ 文件夹下，接下来使用 Supervisor 来管理 Nginx。这里需要注意一个问题，权限问题。Nginx 是 sudo 的方式安装，启动的适合也是 root 用户，那么我们现在也需要用 root 用户启动 supervisor。在 supervisor.conf 下添加配置文件：

[program:nginx]
command=/usr/sbin/nginx
startsecs=0
stopwaitsecs=0
autostart=false
autorestart=false
stdout_logfile=/home/wang/supervisor/log/nginx.log
stderr_logfile=/home/wang/supervisor/log/nginx.err

好了，都配置完之后，启动 supervisor：

supervisord -c supervisor.conf

访问页面，也可以用 ab 进行压力测试：

ab -c 100 -n 100 http://127.0.0.1:8000/qw/1

-c 用于指定压力测试的并发数, -n 用于指定压力测试总共的执行次数。

安装 Python 探针

搭建好了 web，想实时监控应用数据，有什么好的工具，用 OneAPM 的 Python 探针试试，
首先也是安装 Python 探针：

pip install -i http://pypi.oneapm.com/simple --upgrade blueware

根据 License Key 生成配置文件：

blueware-admin generate-config (License Key) = blueware.ini

由于是在虚拟环境下，所以要特别注意路径，修改 supervisor.conf 里面两项：

[program:myapp]
command = /home/wang/supervisor/super/bin/blueware-admin run-program /home/wang/supervisor/super/bin/gunicorn -w 4 -b 0.0.0.0:8000 myapp:app
environment = BLUEWARE_CONFIG_FILE=blueware.ini

重启应用

supervisorctl    # 进入命令行
supervisor>  reload    # 重新加载

向上面一样访问页面，也可以用 ab 进行压力测试
几分钟后有下图，可以看到页面加载时间，web 事物，页面吞吐量，其中后面是设置自定义事物「Business Transaction」。

OpenResty下安装luarocks

在做一些openresty的项目的时候，经常会借助一些第三方包来协助开发，为了方便管理，我们可以使用openresy官方的opm，或者lua的包管理工具luarocks，只不过opm的包数量还不是太多，用的较多的还是luarocks，现在只能期待opm社区不断的发展壮大了。

安装luarocks

wget https://luarocks.org/releases/luarocks-2.4.1.tar.gz
tar -xzvf luarocks-2.4.1.tar.gz
cd luarocks-2.4.1/

./configure --prefix=/usr/local/openresty/luajit 
    --with-lua=/usr/local/openresty/luajit/ 
    --lua-suffix=jit 
    --with-lua-include=/usr/local/openresty/luajit/include/luajit-2.1
make build
# 安装需要root权限
sudo make install

此处要做说明的是

–prefix 设定 luarocks 的安装目录
–with-lua 则是系统中安装的 lua 的根目录
–lua-suffix 版本后缀，此处因为openresyt的lua解释器使用的是 luajit ,所以此处得写 jit
–with-lua-include 设置 lua 引入一些头文件头文件的目录

之后我们就可以看到 luarocks 命令就被安装在了 /usr/local/openresty/luajit/bin 下面

然后我们把它添加到到 PATH 中

vi ~/.bash_profile

export PATH=$PATH:/usr/local/openresty/luajit/bin）

执行 luarocks install package 就可以安装lua的包了

luarocks install package –tree=path 还可以指定你安装的包的存放路径

更多命令大家可以直接使用luarocks help 来查看。

docker一键式安装nginx

一、准备Dockerfile文件

FROM hub.c.163.com/library/centos:latest

RUN echo "Asia/shanghai" > /etc/timezone;
RUN cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

COPY nginx-1.8.1.sh /usr/local/
COPY run.sh /usr/local

二、创建 nginx-1.8.1.sh 安装脚本

#!/bin/bash
#install nginx-1.8.1

#安装目录
INSTALL_DIR=/opt/
SRC_DIR=/opt/software

[ ! -d ${INSTALL_DIR} ] && mkdir -p ${INSTALL_DIR}
[ ! -d ${SRC_DIR} ] && mkdir -p ${SRC_DIR}

# Check if user is root
if [ $(id -u) != "0" ]; then
    echo "Error: You must be root to run this script!!"
    exit 1
fi

#安装依赖包
for Package in unzip wget gcc gcc-c++ autoconf automake zlib zlib-devel openssl openssl-devel pcre pcre-devel
do
    yum -y install $Package
done

Install_Nginx()
{
#更新版本信息
NGINX="nginx-1.8.1"
PCRE="pcre-8.35"
ZLIB="zlib1211"
OPENSSL="openssl-1.0.1i"

NGINXFEATURES="--prefix=${INSTALL_DIR}nginx 
--user=nginx 
--group=nginx 
--with-http_ssl_module 
--with-http_gzip_static_module 
--with-http_stub_status_module 
--with-http_realip_module 
--pid-path=/var/run/nginx.pid 
--with-pcre=${SRC_DIR}/${PCRE} 
--with-zlib=${SRC_DIR}/zlib-1.2.11 
--with-openssl=${SRC_DIR}/${OPENSSL}
"

cd ${SRC_DIR}
#下载所需安装包
echo 'Downloading NGINX'
if [ ! -f ${NGINX}.tar.gz ]
then
  wget -c http://nginx.org/download/${NGINX}.tar.gz
else
  echo 'Skipping: NGINX already downloaded'
fi

echo 'Downloading PCRE'
if [ ! -f ${PCRE}.tar.gz ]
then
  wget -c https://sourceforge.net/projects/pcre/files/pcre/8.35/${PCRE}.tar.gz
else
  echo 'Skipping: PCRE already downloaded'
fi

echo 'Downloading ZLIB'
if [ ! -f ${ZLIB}.zip ]
then
  wget -c http://zlib.net/${ZLIB}.zip
else
  echo 'Skipping: ZLIB already downloaded'
fi

echo 'Downloading OPENSSL'
if [ ! -f ${OPENSSL}.tar.gz ]
then
  wget -c http://www.openssl.org/source/${OPENSSL}.tar.gz
else
  echo 'Skipping: OPENSSL already downloaded'
fi

echo '----------Unpacking downloaded archives. This process may take serveral minutes---------'

echo "Extracting ${NGINX}..."
tar xzf ${NGINX}.tar.gz
echo 'Done.'

echo "Extracting ${PCRE}..."
tar xzf ${PCRE}.tar.gz
echo 'Done.'

echo "Extracting ${ZLIB}..."
unzip ${ZLIB}.zip
echo 'Done.'

echo "Extracting ${OPENSSL}..."
tar xzf ${OPENSSL}.tar.gz
echo 'Done.'

#添加用户
groupadd -r nginx
useradd -r -g nginx nginx

#编译
echo '###################'
echo 'Compile NGINX'
echo '###################'
cd ${SRC_DIR}/${NGINX}
./configure ${NGINXFEATURES}
make
make install
cd ../

mkdir -p ${INSTALL_DIR}/nginx/conf/vhosts

}

Install_Nginx

三、创建运行脚本 run.sh

#!/bin/bash
source /etc/profile

echo `pwd`

sh /usr/local/nginx1.8.sh

/usr/local/nginx/sbin/nginx

while true; do sleep 1; done

四、构建

docker build -t nginx:0.1 .

五、准备yaml文件 nginx.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  minReadySeconds: 10
  template:
    metadata:
      labels:
        app: nginx
        version: V20170907134852
    spec:
      volumes:
        - name: logs
          hostPath:
            path: /data/log
        - name: jdk
          hostPath:
            path: /usr/local/java/jdk1.8.0_102
        - name: project
          hostPath:
            path: /data/appdeploy
      containers:
        - name: nginx
          image: nginx:0.1
          ports:
            - containerPort: 80
          volumeMounts:
          - mountPath: /data/log
            name: logs
          - mountPath: /usr/local/java/jdk1.8.0_102
            name: jdk
            readOnly: true
          - mountPath: /data/appdeploy
            name: project
          env:
            - name: DEPLOYMENT_DEMO_VER
              value: V20170907134852
          command:
            - /usr/local/run.sh
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  type: NodePort
  selector:
    app: nginx
  ports:
  - name: http
    port: 80
    nodePort: 80
    targetPort: 80
    protocol: TCP
  selector:
    app: nginx

六、启动yaml

kubectl create -f y nginx.yaml

大功告成！

通过配置nginx的proxy_set_header解决无法正确获取客户端访问ip地址总显示127.0.0.1

一、前言

为了防止本站资源（小木人印象www.xwood.net）被恶意下载，最近实现安全控制模块-通过分析用户访问IP地址在有效时间内的对本站资源合理下载量，作为黑名单规则，但是发现获取通过之前HttpClientIpUtils工具类获取的ip地址都是127.0.0.1，无法获取终端访问用户有效的ip地址,导致黑名单库无法创建。

二、解决方法

由于nginx配置服务端的反向代理导致，之前反向配置如下

location ^~/open-api/{
    proxy_pass   http://127.0.0.1:8080/openapi/;  
}

应该调整配置为如下(增加配置项proxy_set_header x-forwarded-for $remote_addr;)

location ^~/open-api/{
    proxy_pass   http://127.0.0.1:8080/openapi/;
    proxy_set_header x-forwarded-for  $remote_addr;
}

三、黑名单代码分享

1、访问客户端安全控制类ClientUserController，代码如下

public class ClientUserController {

    private static final Logger logger = Logger.getLogger(ClientUserController.class);
    private  static  ConcurrentMap<String,ClientUser>  downloadUsers=new ConcurrentHashMap<String,ClientUser>();
    private  static  List<String>  blackIplist=new CopyOnWriteArrayList<String>();

    //12小时最大下载量
    private  static   int   maxDayDownloadTimes=1000;

    //验证期限
    private  static   long  validTimeSec=12*60*60;

    public  static  void  register(String ip){

        if(StringUtils.isEmpty(ip)||"127.0.0.1".equalsIgnoreCase(ip))
            return ;

        if(!isPermission(ip))
            return ;

        if(downloadUsers.containsKey(ip)){
            downloadUsers.get(ip).setDownloadTimes(downloadUsers.get(ip).getDownloadTimes()+1);
            logger.info(" downloadUser login --------------:"+ip+" times----------------:"+downloadUsers.get(ip).toString());
        }else{
            downloadUsers.put(ip,new ClientUser(ip));
            logger.info(" New downloadUser  register --------------:"+ip+" times----------------:1");
        }

    }


    public  static  boolean  isPermission(String ip){

        if(StringUtils.isEmpty(ip)){
            logger.info(" downloadUser  isPermission  false,becase you  have't  clientIp <<<<<<<<<<<<<<<<<<<<<<<< ");
            return  false;
        }

        if("127.0.0.1".equalsIgnoreCase(ip)){
            logger.info(" downloadUser can't  get ip ; ======================================== 127.0.0.1 ");
            return true;
        }


        if(blackIplist.contains(ip)){
            logger.info(" downloadUser@"+ip+"@  is danger downloadUser  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  ");
            logger.info(" downloadUser@"+ip+"@  is danger downloadUser  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  ");
            logger.info(" downloadUser@"+ip+"@  is danger downloadUser  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  ");
            return false;
        }

        if(downloadUsers.containsKey(ip)){

            ClientUser  checkClientUser=downloadUsers.get(ip);

            if(System.currentTimeMillis()-checkClientUser.getLastTime()>=validTimeSec){

                if(checkClientUser.getDownloadTimes()>=maxDayDownloadTimes){
                    blackIplist.add(ip);
                    logger.info(" downloadUser@"+ip+"@  add  to  blacklist !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  ");
                    logger.info(" downloadUser@"+ip+"@  add  to  blacklist !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  ");
                    logger.info(" downloadUser@"+ip+"@  add  to  blacklist !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  ");
                    return false;
                }else{
                    downloadUsers.remove(ip);
                }

            }else{

                if(checkClientUser.getDownloadTimes()>=maxDayDownloadTimes){
                    blackIplist.add(ip);
                    logger.info(" downloadUser@"+ip+"@  add  to  blacklist !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  ");
                    logger.info(" downloadUser@"+ip+"@  add  to  blacklist !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  ");
                    logger.info(" downloadUser@"+ip+"@  add  to  blacklist !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  ");
                    logger.info(" downloadUser@"+ip+"@  add  to  blacklist !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  ");
                    logger.info(" downloadUser@"+ip+"@  add  to  blacklist !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  ");
                    logger.info(" downloadUser@"+ip+"@  add  to  blacklist !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  ");
                    return false;
                }

            }


        }

        return true;
    }


}

2、客户端用户类ClientUser,代码如下

public class ClientUser {

    private  String ip;

    private  Integer downloadTimes=1;

    private  Long  lastTime;

    public ClientUser() {
        super();
        lastTime=System.currentTimeMillis();
    }

    public ClientUser(String ip) {
        super();
        this.ip = ip;
        lastTime=System.currentTimeMillis();
    }

    public String getIp() {
        return ip;
    }

    public void setIp(String ip) {
        this.ip = ip;
    }

    public Integer getDownloadTimes() {
        return downloadTimes;
    }

    public void setDownloadTimes(Integer downloadTimes) {
        this.downloadTimes = downloadTimes;
    }

    public Long getLastTime() {
        return lastTime;
    }

    public void setLastTime(Long lastTime) {
        this.lastTime = lastTime;
    }


    public static  void  main(String[] args) throws Exception{
        ClientUser  u=new ClientUser();
        u.lastTime=System.currentTimeMillis();
        Thread.sleep(2000);
        System.out.println((System.currentTimeMillis()-u.lastTime)/1000);
    }

    @Override
    public String toString() {
        return "ClientUser [ip=" + ip + "]";
    }

    @Override
    public boolean equals(Object obj) {

        ClientUser _this=(ClientUser)obj;
        if(_this==null)
            return false;

        if(this.getIp().equalsIgnoreCase(_this.getIp()))
            return true;

        return false;
    }

}

nginx+lua 实现请求流量上报kafka

环境依赖

前面26、27、28讲到的博文环境即可，上报kafka ，只需在应用层nginx上操作（192.168.0.16,192.168.0.17）

请求上报kafka 其实很简单，大致思路是：

下载lua-resty-kafka，提供lua 操作kafka的方法类库
lua 获取nginx 请求参数，组装上报对象
上报对象 encode cjson 编码
lua kakfa 上报即可

代码实现

引入 lua-resty-kafka 类库

yum install -y unzip
cd /usr/local/servers/ && wget https://github.com/doujiang24/lua-resty-kafka/archive/master.zip
unzip master.zip
cp -rf lua-resty-kafka-master/lib/resty/kafka /usr/local/test/lualib/resty/
/usr/local/servers/nginx/sbin/nginx -s reload

lua 获取请求，组装上报对象，encode对象并上报（注意：以下代码只对流量上报代码进行注释，其他代码参考前面 28 “分发层 + 应用层” 双层nginx 架构之应用层实现）

vim /usr/local/test/lua/test.lua

代码如下：

// 引入 kafka 生产者 类库
local producer = require("resty.kafka.producer")
// 引入json 解析类库
local cjson = require("cjson")
// 构造kafka 集群节点 broker
local broker_list = {
    { host = "192.168.0.16", port = 9092},
    { host = "192.168.0.17", port = 9092},
    { host = "192.168.0.18", port = 9092}
}
// 定义上报对象
local log_obj = {}
// 自定义模块名称
log_obj["request_module"] = "product_detail_info"
// 获取请求头信息
log_obj["headers"] = ngx.req.get_headers()
// 获取请求uri 参数
log_obj["uri_args"] = ngx.req.get_uri_args()
// 获取请求body
log_obj["body"] = ngx.req.read_body()
// 获取请求的http协议版本
log_obj["http_version"] = ngx.req.http_version()
// 获取请求方法
log_obj["method"] = ngx.req.get_method()
// 获取未解析的请求头字符串
log_obj["raw_reader"] = ngx.req.raw_header()
// 获取解析的请求body体内容字符串
log_obj["body_data"] = ngx.req.get_body_data()
// 上报对象json 字符串编码
local message = cjson.encode(log_obj)

local uri_args = ngx.req.get_uri_args()
local product_id = uri_args["productId"]
local shop_id = uri_args["shopId"]
// 创建kafka producer 连接对象，producer_type = "async" 异步
local async_producer = producer:new(broker_list, {producer_type = "async"})
// 请求上报kafka，kafka 生产者发送数据，async_prodecer:send（a，b，c），a : 主题名称，b：分区（保证相同id，全部到相同的kafka node 去，并且顺序一致），c：消息（上报数据）
local ok, err = async_producer:send("access-log", product_id, message)
// 上报异常处理
if not ok then
   ngx.log(ngx.ERR, "kafka send err:", err)
   return
end
local cache_ngx = ngx.shared.test_cache
local product_cache_key = "product_info_"..product_id
local shop_cache_key = "shop_info_"..shop_id
local product_cache = cache_ngx:get(product_cache_key)
local shop_cache = cache_ngx:get(shop_cache_key)
if product_cache == "" or product_cache == nil then
      local http = require("resty.http")
      local httpc = http.new()

      local resp, err = httpc:request_uri("http://192.168.0.3:81",{
        method = "GET",
            path = "/getProductInfo?productId="..product_id
      })
      product_cache = resp.body
      cache_ngx:set(product_cache_key, product_cache, 10 * 60)
end
if shop_cache == "" or shop_cache == nil then
      local http = require("resty.http")
      local httpc = http.new()
      local resp, err = httpc:request_uri("http://192.168.0.3:81",{
        method = "GET",
            path = "/getShopInfo?shopId="..shop_id
      })
      shop_cache = resp.body
      cache_ngx:set(shop_cache_key, shop_cache, 10 * 60)
end
local product_cache_json = cjson.decode(product_cache)
local shop_cache_json = cjson.decode(shop_cache)
local context = {
      productId = product_cache_json.id,
      productName = product_cache_json.name,
      productPrice = product_cache_json.price,
      productPictureList = product_cache_json.pictureList,
      productSecification = product_cache_json.secification,
      productService = product_cache_json.service,
      productColor = product_cache_json.color,
      productSize = product_cache_json.size,
      shopId = shop_cache_json.id,
      shopName = shop_cache_json.name,
      shopLevel = shop_cache_json.level,
      shopRate = shop_cache_json.rate
}
local template = require("resty.template")
template.render("product.html", context)

配置nginx DNS resolver实例，避免 DNS 解析失败

vim /usr/local/servers/nginx/conf/nginx.conf

在 http 部分添加以下内容，如下图：

resolver 8.8.8.8

未分类

配置nginx dns resolver
（注：以上操作 nginx 应用服务器（192.168.0.16,192.168.0.17）都需要进行）

配置 kafka advertised.host.name 参数（避免通过机器名无法找到对应的机器）（所有kafka 节点都配置）

advertised.host.name = 本机ip

vim /usr/local/kafka/config/server.properties

未分类

配置advertised.host.name

nginx 校验及重载

/usr/local/servers/nginx/sbin/nginx -t && /usr/local/servers/nginx/sbin/nginx -s reload

启动kafka(确保 zookeeper 已启动)

cd /usr/local/kafka && nohup bin/kafka-server-start.sh config/server.properties &

kafka 中创建 access-log 主题

cd cd /usr/local/kafka && bin/kafka-topics.sh –zookeeper my-cache1:2181,my-cache2:2181,my-cache3:2181 –topic access-log –replication-factor 1 –partitions 1 –create

打开kafka consumer 查看数据

bin/kafka-console-consumer.sh –zookeeper my-cache1:2181,my-cache2:2181,my-cache3:2181 –topic access-log –from-beginning

浏览器请求nginx

未分类

nginx请求

未分类

shell 打开kafka 消费端查看请求上报kafka 数据

完毕，利用nginx + lua 实现请求流量上报kafka就到这里了。

以上就是本章内容，如有不对的地方，请多多指教，谢谢！

nginx+lua在帐号系统中的应用

我们的帐号系统要应用到多个产品里，所以设计的系统需要满足高并发的特性。项目A更新密码，项目B就得下一次触发接口时，自动登出。

我们帐号系统没有使用Oauth2.0，而是采用了简单的JWT(Json Web Token)的方式来处理的用户认证。所以，帐号系统要提供一个验证用户密码修改的API。

这里就不展开讲jwt了。不了解的可以去google。jwt一共三段：xxx.yyy.zzz, 我们把重要的信息放在payload中，也就是yyy的位置，可以通过base64解码，类似于我们在session存的用户信息。payload也可以做加密处理。

payload一般里面会有一些默认的字段，sub代表用户主体，比如用户的id就可以赋值给sub，也可以是手机号。除了公共的字段，我们也可以定义私有字段，比如seq，可以用在单个应用内来处理单设备登录问题。这里我们要增加一个全局的字段表示密码的状态，或者说用户的状态，比如冻结用户，解冻，踢掉用户登录状态。我们先解决密码状态问题，增加一个字段passwd_seq,初始值1。每更新一次密码passwd_seq加一。所有应用内需要认证的接口都需要校验密码状态。所以都会调用该接口（/token）。失效后，返回401，重新登录。

初步调研A项目每日的接口调用次数10w（接口数百个），除了注册占比较低，忽略不计。就也是说/token接口至少会调用10w次一天。

我在自己的电脑上测试。配置如截图：

未分类

redis压测数据:

$ redis-benchmark -t set,get
====== SET ======
  100000 requests completed in 2.65 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

32.18% <= 1 milliseconds
98.80% <= 2 milliseconds
99.62% <= 3 milliseconds
99.71% <= 4 milliseconds
99.79% <= 5 milliseconds
99.93% <= 6 milliseconds
99.95% <= 10 milliseconds
99.95% <= 11 milliseconds
99.96% <= 12 milliseconds
99.97% <= 13 milliseconds
99.98% <= 14 milliseconds
99.99% <= 15 milliseconds
99.99% <= 16 milliseconds
100.00% <= 16 milliseconds
37664.79 requests per second

====== GET ======
  100000 requests completed in 2.60 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

34.93% <= 1 milliseconds
99.62% <= 2 milliseconds
99.91% <= 3 milliseconds
100.00% <= 3 milliseconds
38491.14 requests per second

nginx+lua读写redis接口测试，单核测试。

测试的nginx配置文件如下:

worker_processes  1;   # we could enlarge this setting on a multi-core machine
error_log  logs/error.log warn;

events {
    worker_connections  2048;
}

http {
    lua_package_path 'conf/?.lua;;';

    server {
        listen       8080;
        server_name  localhost;

        #lua_code_cache on;

        location /test {

            access_by_lua_block {
                local jwt = require "resty.jwt"
                local foo = require "foo"

                local err_msg = {
                   x_token = {err = "set x-token in request, please!"},
                   payload = {err = "payload not found"},
                   user_key = {err = "用户配置信息有问题"},
                   password = {err = "密码已修改,请重新登录"},
                   ok = {ok = "this is my own error page content"},
                }

                -- 判断token是否传递
                local req_headers = ngx.req.get_headers()
                if req_headers.x_token == nil then
                   foo:response_json(422, err_msg.x_token)
                   return
                end

                local jwt_obj = jwt:load_jwt(req_headers.x_token)

        local redis = require "resty.redis"
        local red = redis:new()

        red:set_timeout(1000) -- 1 sec

        local ok, err = red:connect("127.0.0.1", 6379)
        if not ok then
            ngx.say("failed to connect: ", err)
            return
        end

        -- 请注意这里 auth 的调用过程
        local count
        count, err = red:get_reused_times()
        if 0 == count then
            ok, err = red:auth("test123456")
            if not ok then
            ngx.say("failed to auth: ", err)
            return
            end
        elseif err then
            ngx.say("failed to get reused times: ", err)
            return
        end

                if jwt_obj.payload == nil then
                   foo:response_json(422, err_msg.payload)
                   return
                end
                local sub = jwt_obj.payload.sub
                user_key, err = red:get('user:' .. sub)

                if user_key == ngx.null then
                    foo:response_json(500, err_msg.user_key)
                    return
                elseif tonumber(user_key) > 3  then
                   foo:response_json(401, err_msg.password)
                   return
                else
                   foo:response_json(200, err_msg.ok)
                   return
                end

        -- 连接池大小是200个，并且设置最大的空闲时间是 10 秒
        local ok, err = red:set_keepalive(10000, 200)
        if not ok then
            ngx.say("failed to set keepalive: ", err)
            return
        end
    }
        }
    }
}

上面的配置文件代码格式化，目前没找到合适的工具.

测试结果如下：

$   ab -c10 -n5000 -H 'x-token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwOi8vYWNjb3VudC1hcGkuc3VwZXJtYW4yMDE0LmNvbS9sb2dpbiIsImlhdCI6MTUwNTQ2Njg5OSwiZXhwIjoxNTA1NDcwNDk5LCJuYmYiOjE1MDU0NjY4OTksImp0aSI6ImJ0TWFISmltYmtxSGVUdTEiLCJzdWIiOjIsInBydiI6Ijg3MTNkZTA0NTllYTk1YjA0OTk4NmFjNThlYmY1NmNkYjEwMGY4NTUifQ.yiXqkHBZrYXuxtUlSiy5Ialle--q_88G32lxUsDZO0k'  http://127.0.0.1:8080/token
This is ApacheBench, Version 2.3 <$Revision: 1757674 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests


Server Software:        openresty/1.11.2.5
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /token
Document Length:        175 bytes

Concurrency Level:      10
Time taken for tests:   0.681 seconds
Complete requests:      5000
Failed requests:        0
Non-2xx responses:      5000
Total transferred:      1655000 bytes
HTML transferred:       875000 bytes
Requests per second:    7344.73 [#/sec] (mean)
Time per request:       1.362 [ms] (mean)
Time per request:       0.136 [ms] (mean, across all concurrent requests)
Transfer rate:          2374.13 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.2      1       4
Processing:     0    1   0.4      1       5
Waiting:        0    1   0.4      1       4
Total:          1    1   0.5      1       6

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      2
  95%      2
  98%      3
  99%      4
 100%      6 (longest request)

单核每秒的qps在7k以上（几乎没优化lua代码）。php之前的测试数据在60左右（大家可以实际测试下）。

看到这个比例。单机单核每日的请求量最大上面是604,800k，每天可以处理6亿个请求。

比如我们优化后再测试，nginx上的lua_code_cache开启，同时开启了2个worker, 测试结果如下:

 ab -c10 -n5000 -H 'x-token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwOi8vYWNjb3VudC1hcGkuc3VwZXJtYW4yMDE0LmNvbS9sb2dpbiIsImlhdCI6MTUwNTQ2Njg5OSwiZXhwIjoxNTA1NDcwNDk5LCJuYmYiOjE1MDU0NjY4OTksImp0aSI6ImJ0TWFISmltYmtxSGVUdTEiLCJzdWIiOjIsInBydiI6Ijg3MTNkZTA0NTllYTk1YjA0OTk4NmFjNThlYmY1NmNkYjEwMGY4NTUifQ.yiXqkHBZrYXuxtUlSiy5Ialle--q_88G32lxUsDZO0k'  http://127.0.0.1:8080/token
This is ApacheBench, Version 2.3 <$Revision: 1757674 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests


Server Software:        openresty/1.11.2.5
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /token
Document Length:        175 bytes

Concurrency Level:      10
Time taken for tests:   0.608 seconds
Complete requests:      5000
Failed requests:        0
Non-2xx responses:      5000
Total transferred:      1655000 bytes
HTML transferred:       875000 bytes
Requests per second:    8217.29 [#/sec] (mean)
Time per request:       1.217 [ms] (mean)
Time per request:       0.122 [ms] (mean, across all concurrent requests)
Transfer rate:          2656.18 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.2      1       2
Processing:     0    1   0.2      1       2
Waiting:        0    1   0.2      1       2
Total:          1    1   0.3      1       3

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      1
  95%      1
  98%      2
  99%      3
 100%      3 (longest request)

除了测试用具ab，还可以用Python写的boom或者go写的hey。可以去github下。其他的两个用具测试结果要比ab工具更稳定。

项目的部署工具可以选用walle开源项目（https://github.com/meolu/walle-web），但是不支持docker部署方式，docker一般部署有两种方式，把代码包到docker image或者做目录映射。我基于walle v1.2.0打了一个patch。如下：

我们项目的开发部署环境可以使用：openresty image ，其实我们可以把这个项目clone下来。做些处理，或者直接继承这个image。

开发的项目最好使用绝对路径引入lua文件。

一般的项目路径如下：

.
├── config
│   ├── domain
│   │   └── nginx_token.conf
│   ├── nginx.conf
│   └── resources.properties.json
├── lua
│   ├── init.lua
│   └── token_controller.lua
└── lualib
    └── luka
        └── response.lua

5 directories, 6 files

感觉lua这种脚本语言还是不错的，在性能上不比编译型语言差多少，但开发效率却高出不少。后期准备把laravel写的那些中间件都改为lua+nginx，各种各样的校验的事交给lua处理会提升不少性能，甚至，某些不复杂的接口也可以改用lua编写。

lua真是个不错的家伙。借助nginx的高性能，让它成为运行最快的脚本语言。开发web 应用目前已经非常方便了。openresty提供了不少库，比如:mysql,redis,memcache,jwt,json等等吧。