2017年4月 – Linux系统运维日志

编译支持mysql-5.1.73版本的xtrabackup

一、基础介绍

mysql5.1在源码中配备了两个版本的innodb存储引擎源码：innobase和innodb_plugin，编译安装的时候可以通过参数–with-plugins=innobase,innodb_plugin来指定是否将innodb存储引擎引入，具体这两个参数引入对编译后的mysql产生怎样的差异，后面再做解析。然而对于PerconaXtraBackup，在release版本中有2.0，2.1，2.2三个大版本，然后每个版本都是与mysql发布的innodb存储引擎版本对应，举例来说：

percona-xtrabackup-2.0.8在BUILD.txt中指定了utils/build.sh参数如下：

The script needs the codebase for which the building is targeted, you must
provide it with one of the following values or aliases:

  ================== =========  =============================================
  Value              Alias      Server
  ================== =========  =============================================
  innodb51_builtin   5.1    build against built-in InnoDB in MySQL 5.1
  innodb51           plugin build against InnoDB plugin in MySQL 5.1
  innodb55           5.5    build against InnoDB in MySQL 5.5
  xtradb51           xtradb     build against Percona Server with XtraDB 5.1
  xtradb55           xtradb55   build against Percona Server with XtraDB 5.5
  innodb56           5.6        build against InnoDB in MySQL 5.6
  ================== =========  =============================================

而在percona-xtrabackup-2.1.9在BUILD.txt中指定了utils/build.sh参数如下：

The script needs the codebase for which the building is targeted, you must
provide it with one of the following values or aliases:

  ================== =========  =============================================
  Value              Alias      Server
  ================== =========  =============================================
  innodb51           plugin build against InnoDB plugin in MySQL 5.1
  innodb55           5.5    build against InnoDB in MySQL 5.5
  xtradb51           xtradb     build against Percona Server with XtraDB 5.1
  xtradb55           xtradb55   build against Percona Server with XtraDB 5.5
  innodb56           5.6        build against InnoDB in MySQL 5.6
  ================== =========  =============================================

解释一下：从percona-xtrabackup2.1开始取消对innodb built-in版本支持，即：用Percona-Xtrabackup2.1备份5.1built-in版本将会出现如下报错：

innobackupex: Error: Support for MySQL 5.1 with builtin InnoDB (not the plugin) was removed in Percona XtraBackup 2.1. The last version to support MySQL 5.1 with builtin InnoDB was Percona XtraBackup 2.0.

二、Percona-Xtrabackup2.0.8编译安装

cd /tmp
wget https://github.com/percona/percona-xtrabackup/archive/percona-xtrabackup-2.0.8.tar.gz
tar xf percona-xtrabackup-2.0.8.tar.gz
cd percona-xtrabackup-percona-xtrabackup-2.0.8
wget http://downloads.mysql.com/archives/mysql-5.1/mysql-5.1.59.tar.gz 
./utils/build.sh 5.1

编译安装完，备份工具保存在 src/xtrabackup_51，打包xtrabackup_51和innobackupex 到目的服务器的/usr/local/bin目录下，即可执行备份

使用lua模拟tail -n命令读取最后n行

最近需要使用lua读取文件的最后n行数据，但不想调用linux中的tail命令来获取，于是使用纯lua来实现。

实现思路

把文件指针偏移距离文件尾x个字节
读取x个字节数据
在这x个字节数据中查找换行符n,如果找到n个换行符,把文件指针偏移到第n个换行符的位置,输出全部内容
如果找不到足够的换行符,继续把文件指针在当前位置向文件头方向偏移x个字节
返回2步骤循环，直到找到足够换行符或到文件头

lua代码

tail.lua

#!/usr/bin/lua

if arg[1] == "-n" then
    tail_lines = arg[2]
    filepath = arg[3]
else
    tail_lines = 10
    filepath = arg[1]
end

-- 一次读取512字节数据
read_byte_once = 512
offset = 0
fp = io.open(filepath,"r")
if fp == nil then
    print("open file "..filepath.." failed.")
    os.exit(0)
end
line_num = 0
while true do
    -- 每次偏移read_byte_once字节
    offset = offset - read_byte_once
    -- 以文件尾为基准偏移offset
    if fp:seek("end",offset) == nil then
        -- 偏移超出文件头后将出错,这时如果是第一次读取的话,直接将文件指针偏移到头部,否则跳出循环输出所有内容
        if offset + read_byte_once == 0 then
            fp:seek("set")
        else
            break
        end
    end
    data = fp:read(read_byte_once)
    -- 倒转数据,方便使用find方法来从尾读取换行符
    data = data:reverse()
    index = 1
    while true do
        -- 查找换行符
        start = data:find("n",index, true)
        if start == nil then
            break
        end
        -- 找到换行符累加
        line_num = line_num + 1
        -- 找到足够换行符
        if tail_lines + 1 == line_num then
            -- 偏移文件符指针到第line_num个换行符处
            fp:seek("end",offset+read_byte_once-start+1)
            io.write(fp:read("*all"))
            fp:close()
            os.exit(0)
        end
        index = start + 1
    end
end

-- 找不到足够的行,就输出全部
fp:seek("set")
io.write(fp:read("*all"))
fp:close()

用法

读取centos.log最后10行

./tail.lua centos.log

读取centos.log最后20行

./tail.lua -n 20 centos.log

OpenResty(Nginx Lua)获取Nginx Worker CPU使用率

在上文我们介绍了三种获取进程cpu使用率的方法，本文介绍使用openresty来获取所有nginx worker的cpu使用率，然后提供一个接口来输出cpu使用率。由于收集cpu使用率需要获取两次，两次之间需要等待一些时间，为了保证此接口的性能，决定不采用接口实时统计，采用后台定时统计，然后接口查询其数据就行。
所有步骤思路为：

在init_worker阶段获取所有的worker pid
在init_worker阶段启动一个定时器来统计所有nginx worker cpu使用率并存储到共享字典
接口查询共享字典中的结果返回给客户端

int_worker定时统计cpu使用率

http {
    [...]
    lua_shared_dict dict 10m;
    init_worker_by_lua_block {
        -- 获取所有worker pid到字典
        local worker_pid = ngx.worker.pid()
        local worker_id = ngx.worker.id()
        ngx.shared.dict:set(worker_id,worker_pid)

        -- 统计cpu使用率函数
        local function count_cpu_usage(premature)
            -- 首次获取cpu时间
            local worker_cpu_total1 = 0
            local cpu_total1 = 0

            local worker_count = ngx.worker.count()

            for i=0, worker_count - 1 do    
                local worker_pid = ngx.shared.dict:get(i)
                local fp = io.open("/proc/"..worker_pid.."/stat","r")
                local data = fp:read("*all")
                fp:close()
                local res, err = ngx.re.match(data, "(.*? ){13}(.*?) (.*?) ", "jio")
                worker_cpu = res[2] + res[3]
                worker_cpu_total1 = worker_cpu_total1 + worker_cpu
            end

            local fp = io.open("/proc/stat","r")
            local cpu_line = fp:read()
            fp:close()
            local iterator, err = ngx.re.gmatch(cpu_line,"(\d+)")
            while true do
                local m, err = iterator()
                if not m then
                    break
                end

                cpu_total1 = cpu_total1 + m[0]
            end

            -- 第二次获取cpu时间
            ngx.sleep(0.5)
            local worker_cpu_total2 = 0
            local cpu_total2 = 0

            for i=0, worker_count -1 do    
                local worker_pid = ngx.shared.dict:get(i)
                local fp = io.open("/proc/"..worker_pid.."/stat","r")
                local data = fp:read("*all")
                fp:close()
                local res, err = ngx.re.match(data, "(.*? ){13}(.*?) (.*?) ", "jio")
                worker_cpu = res[2] + res[3]
                worker_cpu_total2 = worker_cpu_total2 + worker_cpu
            end

            local fp = io.open("/proc/stat","r")
            local cpu_line = fp:read()
            fp:close()
            local iterator, err = ngx.re.gmatch(cpu_line,"(\d+)")
            while true do
                local m, err = iterator()
                if not m then
                    break
                end

                cpu_total2 = cpu_total2 + m[0]
            end

            -- 获取cpu核心数
            local cpu_core = 0
            local fp = io.open("/proc/cpuinfo")
            local data = fp:read("*all")
            fp:close()
            local iterator, err = ngx.re.gmatch(data, "processor","jio")
            while true do
                local m, err = iterator()
                if not m then
                    break
                end
                cpu_core = cpu_core + 1
            end

            -- 计算出cpu时间
            local nginx_workers_cpu_time = ((worker_cpu_total2 - worker_cpu_total1) / (cpu_total2 - cpu_total1)) * 100*cpu_core
            nginx_workers_cpu_time = string.format("%d", nginx_workers_cpu_time)
            ngx.shared.dict:set("nginx_workers_cpu_time",nginx_workers_cpu_time)
        end


        -- 定时任务
        local function count_cpu_usage_timed_job()
            -- 定义间隔执行时间
            local delay = 2
            local count
            count = function(premature)
                if not premature then
                    local ok, err = pcall(count_cpu_usage, premature)
                    if not ok then
                        log(ERR, "count cpu usage error:",err)
                    end    
                    local ok, err = ngx.timer.at(delay, count)
                    if not ok then
                        return
                    end
                end
            end
            local ok, err = ngx.timer.at(delay, count)
            if not ok then
                return
            end
        end

        -- 执行定时任务
        count_cpu_usage_timed_job()
    }
    [...]
}

定义获取cpu使用率的接口

location /cpu {
    content_by_lua_block {
        local nginx_workers_cpu_time = ngx.shared.dict:get(nginx_workers_cpu_time)
        ngx.header.content_type = 'text/plain'
        ngx.say("nginx_workers_cpu_time")
    }
}

获取进程CPU使用率的3种方法

一个进程的CPU使用率是进程性能的一个重要指标。通过CPU使用率可以获知当然进程繁忙程度，或者获取该进程CPU使用异常情况。下面我们介绍3种方法来获取进程的CPU使用率。

通过zabbix获取

从zabbix 3.0开始，zabbix提供了一个item来获取进程CPU使用率，item为：

proc.cpu.util[<name>,<user>,<type>,<cmdline>,<mode>,<zone>]

各参数介绍如下

name - process name (default is all processes)
user - user name (default is all users)
type - CPU utilisation type:
total (default), user, system
cmdline - filter by command line (it is a regular expression) 
mode - data gathering mode: avg1 (default), avg5, avg15 
zone - target zone: current (default), all. This parameter is supported only on Solaris platform. Since Zabbix 3.0.3 if agent has been compiled on Solaris without zone support but is running on a newer Solaris where zones are supported and <zone> parameter is default or current then the agent will return NOTSUPPORTED (the agent cannot limit results to only current zone). However, <zone> parameter value all is supported in this case.

使用示例:

Examples:
⇒ proc.cpu.util[,root] → CPU utilisation of all processes running under the “root” user
⇒ proc.cpu.util[zabbix_server,zabbix] → CPU utilisation of all zabbix_server processes running under the zabbix user

The returned value is based on single CPU core utilisation percentage. For example CPU utilisation of a process fully using two cores is 200%.

来自：zabbix文档

使用top命令

top命令是可以很直观的查看所有进程的信息，如内存使用，CPU使用率等。下面是使用top命令获取进程当前CPU使用率。

pid="123970"
top -b -n 1 -p $pid  2>&1 | awk -v pid=$pid '{if ($1 == pid)print $9}'

从/proc/读取

从/proc/[pid]/stat获取该进程的cpu时间，从/proc/stat获取系统cpu总时间，等待1秒，重新获取两个cpu时间。然后((cpu_time2 – cpu_time1) / (total_time2 – total_time1)) * 100*cpu_core;其中cpu_time为进程cpu使用时间，total_time为系统cpu使用时间,cpu_core为cpu核心数，通过/proc/cpuinfo获取。代码如下：

pid=3456
cpu_core=$(grep -c processor /proc/cpuinfo)
total_time1=$(awk '{if ($1 == "cpu") {sum = $2 + $3 + $4 + $5 + $6 + $7 + $8 + $9 + $10 + $11;print sum}}' /proc/stat)
cpu_time1=$(awk '{sum=$14 + $15;print sum}' /proc/$pid/stat)
sleep 1
total_time2=$(awk '{if ($1 == "cpu") {sum = $2 + $3 + $4 + $5 + $6 + $7 + $8 + $9 + $10 + $11;print sum}}' /proc/stat)
cpu_time2=$(awk '{sum=$14 + $15;print sum}' /proc/$pid/stat)
awk -v cpu_time1=$cpu_time1 -v total_time1=$total_time1 -v cpu_time2=$cpu_time2 -v total_time2=$total_time2 -v cpu_core=$cpu_core 'BEGIN{cpu=((cpu_time2 - cpu_time1) / (total_time2 - total_time1)) * 100*cpu_core;print cpu}'

使用shell awk获取Nginx Apache一分钟内的网站访问日志

之前我们有写过如何获取网站一分钟内的日志，不过日志格式是要求以tab分隔，当遇到日志以空格分隔，前面的脚本就无效了，这里我们提供以空格分隔的日志格式获取一分钟内日志的shell脚本。

# 日志目录
LOG_DIR="/etc/apache2/logs/domlogs/"
# 临时目录
TEMP_DIR="/tmp/log/"
mkdir -p $TEMP_DIR
cd $LOG_DIR

log_names=`find ./ -maxdepth 1 -type f | grep -v -E "bytes_log|offsetftpbytes"`

for log_name in $log_names;
do

#设置路径
split_log="$TEMP_DIR/$log_name"
access_log="${LOG_DIR}/$log_name"

#取出最近一分钟日志
tac $access_log  | awk '
BEGIN{
cmd="date -d "1 minute ago" +%s"
cmd|getline oneMinuteAgo
}
{
day = substr($4,2,2)
month = substr($4,5,3)
year = substr($4,9,4)
time = substr($4,14,8)
time_str = day" "month" "year" "time
cmd="date -d ""time_str"" +%s"
cmd|getline log_date
if (log_date>=oneMinuteAgo){
print
} else {
exit;
}
}' > $split_log

done

# 删除空文件
find ./ -size 0 -exec rm {} ;