awk:模式扫描和数据处理语言
描述: awk是一种编程语言,用于Linux/unix下对文本和数据进行扫描与处理,数据可以来自标准输入、文件、管道。工作流程是:逐行扫描文件,寻找特定匹配模式的行,并进行相应的处理动作。awk读取文件文件内容每一行时,将对比该行是否与给定的模式相匹配,如果匹配,则执行相应处理动作,否则不对该行进行处理。如果没有指定的处理脚本,则把匹配的行显示到标准输出(默认print动作),如果没有指定模式匹配,则默认匹配所有数据
语法:
awk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
awk [ POSIX or GNU style options ] [ -- ] program-text file ...
选项:
- -F fs 使用fs作为输入行的分隔符(默认是空格或者制表符)
-
-v var=val 在处理过程之前,给var设置一个变量val
-
-f program-file 从文件中读取awk的处理内容
内置变量:
- ARGC 命令行参数个数
-
ARGV 命令行参数的一个排列,索引从0到ARGC-1
-
ARGIND ARGV最近处理文件的索引
-
FILENAME 当前输入文档的名称
-
FNR 当前输入文档的记录编号
-
NR 输入流的当前记录编号(行号)
-
FS 字段分隔符
-
NF 当前记录的字段个数
-
OFS 输出字段分隔符,默认为空格
-
RS 输出记录分隔符默认是换行符n
-
ORS 输出记录分隔符,默认是换行符n
AWK patterns may be one of the following:
BEGIN
END
/regular expression/
relational expression
pattern && pattern
pattern || pattern
pattern ? pattern : pattern
(pattern)
! pattern
pattern1, pattern2
例子:
1、-F 指定分隔符
[root@python ~]# cat test.txt
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
[root@python ~]# awk '{print $2}' test.txt
48049/tcp
48128/udp
48129/tcp
48129/udp
48556/udp
48619/tcp
[root@python ~]# awk -F'/' '{print $2}' test.txt
tcp # 3GPP Cell Broadcast Service Protocol
udp # Image Systems Network Services
tcp # Bloomberg locator
udp # Bloomberg locator
udp # com-bardac-dw
tcp # iqobject
#指定空格或者/作为分分隔符。+代表重复前面的字符一次或者多次
[root@python ~]# awk -F'[ /]+' '{print $2}' test.txt
48049
48128
48129
48129
48556
48619
2、-v 变量赋值
[root@python ~]# awk -v a=2 '{print $a}' test.txt
48049/tcp
48128/udp
48129/tcp
48129/udp
48556/udp
48619/tcp
[root@python ~]# awk -v a=342 'BEGIN{print a}'
342
3、-f 从文件中读取awk的内容
#编辑awk脚本文件
[root@python ~]# cat a.txt
/^$/ {print "BLANK LINE"}
#有几个空行就打印多少行的BLANK LINE
[root@python ~]# awk -f a.txt /etc/ssh/ssh_config
BLANK LINE
BLANK LINE
BLANK LINE
BLANK LINE
4、记录和字段
#$0表示将匹配的内容完全输出
[root@python ~]# echo "I am a bird"|awk '{print $0}'
I am a bird
[root@python ~]# echo "I am a bird"|awk '{print $1,$2}'
I am
#NF表达总的字段的个数(可以理解为列数)
[root@python ~]# echo "I am a bird"|awk '{print NF}'
4
#$NF表示最后一个字段
[root@python ~]# echo "I am a bird"|awk '{print $NF}'
bird
#NR表示输入流的当前记录编号,可以理解为(匹配的行编号)
[root@python ~]# echo "I am a bird"|awk '{print NR}'
1
[root@python ~]# echo -e "I am a birdnhello"|awk '{print NR}'
1
2
5、OFS指定输出的分隔符
echo "I am a bird"|awk 'BEGIN{OFS="#"}{print $1,$2,$3,$4}'
I#am#a#bird
6、正则匹配
#匹配含有tcp的行
[root@python ~]# awk '/tcp/{print $0}' test.txt
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
blp5 48129/tcp # Bloomberg locator
iqobject 48619/tcp # iqobject
#匹配以blp5开头的行
[root@python ~]# awk '/^blp5/{print $0}' test.txt
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
#匹配以tor结束的行
[root@python ~]# awk '/tor$/{print $0}' test.txt
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
#逻辑或||
[root@python ~]# awk '/blp5/||/3gpp/{print $0}' test.txt
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
#逻辑与&&
[root@python ~]# awk '/blp5/&&/tcp/{print $0}' test.txt
blp5 48129/tcp # Bloomberg locator
#逻辑非
[root@python ~]# awk '!/blp5/&& !/3gpp/{print $0}' test.txt
isnetserv 48128/udp # Image Systems Network Services
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
7、匹配范围
[root@python ~]# awk '/3gpp/,/blp5/{print $0}' test.txt
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
8、BEGIN和END格式(打印标签)
#打印行首
[root@python ~]# awk 'BEGIN{print "SERVICEttPORTtttDESCRIPION"}{print $0}' test.txt
SERVICE PORT DESCRIPION
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
#打印行尾
[root@python ~]# awk '{print $0}END{print "The ending..."}' test.txt
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
The ending...
#行首行尾都打印
[root@python ~]# awk 'BEGIN{print "SERVICEttPORTtttDESCRIPION"}{print $0}END{print "The ending..."}' test.txt
SERVICE PORT DESCRIPION
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
The ending...
9、表达式与操作符:如果在awk定义的变量中没有初始化,则初始值为空字符串或者0,字符操作一定要加引号(a=”How are you”)。
#统计所有的空白行
[root@python ~]# awk '/^$/{print x+=1}' /etc/ssh/ssh_config
1
2
3
4
#打印出空白行的总个数
[root@python ~]# awk '/^$/{x+=1}END{print x}' /etc/ssh/ssh_config
4
#~(匹配) 、!~(不匹配).打印出root的ID号。匹配第一个字段为root的行,打印出其UID
[root@python ~]# awk -F':' '$1~/root/{print $3}' /etc/passwd
0
#打印出UID大于400的用户
[root@python ~]# awk -F':' '$3>400{print $1}' /etc/passwd
saslauth
mysql
dianel
[root@python ~]# awk -F':' '$3>400{x+=1}END{print x}' /etc/passwd
3
10、awk的高级应用
if条件判断
#判断boot分区可用容量小于20M时报警,否则就显示OK
[root@python ~]# df|grep 'boot'|awk '{if($4<20000)print"alart";else print "ok"}'
ok
[root@python ~]# seq 5|awk '{if($0==3)print $0}'
3
[root@python ~]# seq 5|awk '{if($0==3)print $0;else print "no"}'
no
no
3
no
no
while循环:语法格式
while (condition) statement
do statement while (condition)
两种格式:
#因为i和total都初始化,默认为0
[root@python ~]# awk 'BEGIN{do {i++;total+=i}while(i<100)print total}'
5050
[root@python ~]# awk 'BEGIN{while(i<100){i++;total+=i}print total}'
5050
for循环:语法格式
1.for (expr1; expr2; expr3) statement
2.for (var in array) statement
[root@python ~]# awk 'BEGIN{for(i=0;i<101;i++){total+=i} print total}'
5050
break和continue
- break:跳出循环
-
continue:终止当前循环
打印IP:
[root@python ~]# ifconfig eth0|awk '/Bcast/'|awk -F'[ :]+' '{print $4}'
192.168.1.13