awk用法与示例介绍

awk:模式扫描和数据处理语言

描述: awk是一种编程语言,用于Linux/unix下对文本和数据进行扫描与处理,数据可以来自标准输入、文件、管道。工作流程是:逐行扫描文件,寻找特定匹配模式的行,并进行相应的处理动作。awk读取文件文件内容每一行时,将对比该行是否与给定的模式相匹配,如果匹配,则执行相应处理动作,否则不对该行进行处理。如果没有指定的处理脚本,则把匹配的行显示到标准输出(默认print动作),如果没有指定模式匹配,则默认匹配所有数据

语法:

awk  [ POSIX or GNU style options ] -f program-file [ --  ] file ...
awk [ POSIX or GNU style options ] [ --  ]  program-text file ...

选项:

  • -F fs 使用fs作为输入行的分隔符(默认是空格或者制表符)

  • -v var=val 在处理过程之前,给var设置一个变量val

  • -f program-file 从文件中读取awk的处理内容

内置变量:

  • ARGC 命令行参数个数

  • ARGV 命令行参数的一个排列,索引从0到ARGC-1

  • ARGIND ARGV最近处理文件的索引

  • FILENAME 当前输入文档的名称

  • FNR 当前输入文档的记录编号

  • NR 输入流的当前记录编号(行号)

  • FS 字段分隔符

  • NF 当前记录的字段个数

  • OFS 输出字段分隔符,默认为空格

  • RS 输出记录分隔符默认是换行符n

  • ORS 输出记录分隔符,默认是换行符n

AWK patterns may be one of the following:

BEGIN
END
/regular expression/
relational expression
pattern && pattern
pattern || pattern
pattern ? pattern : pattern
(pattern)
! pattern
pattern1, pattern2

例子:

1、-F 指定分隔符

[root@python ~]# cat test.txt
3gpp-cbsp       48049/tcp               # 3GPP Cell Broadcast Service Protocol  
isnetserv       48128/udp               # Image Systems Network Services
blp5            48129/tcp               # Bloomberg locator
blp5            48129/udp               # Bloomberg locator
com-bardac-dw   48556/udp               # com-bardac-dw
iqobject        48619/tcp               # iqobject
[root@python ~]# awk '{print $2}' test.txt
48049/tcp
48128/udp
48129/tcp
48129/udp
48556/udp
48619/tcp
[root@python ~]# awk -F'/' '{print $2}'             test.txt
tcp               # 3GPP Cell Broadcast         Service Protocol
udp               # Image Systems Network       Services
tcp               # Bloomberg locator
udp               # Bloomberg locator
udp               # com-bardac-dw
tcp               # iqobject

#指定空格或者/作为分分隔符。+代表重复前面的字符一次或者多次
[root@python ~]# awk -F'[ /]+' '{print $2}'         test.txt
48049
48128
48129
48129
48556
48619

2、-v 变量赋值

[root@python ~]# awk -v a=2 '{print $a}' test.txt
48049/tcp
48128/udp
48129/tcp
48129/udp
48556/udp
48619/tcp
[root@python ~]# awk -v a=342 'BEGIN{print a}'
342

3、-f 从文件中读取awk的内容

#编辑awk脚本文件
[root@python ~]# cat a.txt 
/^$/ {print "BLANK LINE"}
#有几个空行就打印多少行的BLANK LINE
[root@python ~]# awk -f a.txt /etc/ssh/ssh_config
BLANK LINE
BLANK LINE
BLANK LINE
BLANK LINE

4、记录和字段

#$0表示将匹配的内容完全输出
[root@python ~]# echo "I am a bird"|awk '{print $0}'
I am a bird
[root@python ~]# echo "I am a bird"|awk '{print $1,$2}'
I am

#NF表达总的字段的个数(可以理解为列数)
[root@python ~]# echo "I am a bird"|awk '{print NF}'
4

#$NF表示最后一个字段
[root@python ~]# echo "I am a bird"|awk '{print $NF}'
bird

#NR表示输入流的当前记录编号,可以理解为(匹配的行编号)
[root@python ~]# echo "I am a bird"|awk '{print NR}'
1
[root@python ~]# echo -e "I am a birdnhello"|awk '{print NR}'
1
2

5、OFS指定输出的分隔符

echo "I am a bird"|awk 'BEGIN{OFS="#"}{print $1,$2,$3,$4}'
I#am#a#bird

6、正则匹配

#匹配含有tcp的行
[root@python ~]# awk '/tcp/{print $0}' test.txt
3gpp-cbsp       48049/tcp               # 3GPP Cell Broadcast Service Protocol
blp5            48129/tcp               # Bloomberg locator
iqobject        48619/tcp               # iqobject

#匹配以blp5开头的行
[root@python ~]# awk '/^blp5/{print $0}' test.txt
blp5            48129/tcp               # Bloomberg locator
blp5            48129/udp               # Bloomberg locator

#匹配以tor结束的行
[root@python ~]# awk '/tor$/{print $0}' test.txt
blp5            48129/tcp               # Bloomberg locator
blp5            48129/udp               # Bloomberg locator

#逻辑或||
[root@python ~]# awk '/blp5/||/3gpp/{print $0}' test.txt
3gpp-cbsp       48049/tcp               # 3GPP Cell Broadcast Service Protocol
blp5            48129/tcp               # Bloomberg locator
blp5            48129/udp               # Bloomberg locator

#逻辑与&&
[root@python ~]# awk '/blp5/&&/tcp/{print $0}' test.txt
blp5            48129/tcp               # Bloomberg locator

#逻辑非
[root@python ~]# awk '!/blp5/&& !/3gpp/{print $0}' test.txt
isnetserv       48128/udp               # Image Systems Network Services
com-bardac-dw   48556/udp               # com-bardac-dw
iqobject        48619/tcp               # iqobject

7、匹配范围

[root@python ~]# awk '/3gpp/,/blp5/{print $0}' test.txt
3gpp-cbsp       48049/tcp               # 3GPP Cell Broadcast Service Protocol
isnetserv       48128/udp               # Image Systems Network Services
blp5            48129/tcp               #       Bloomberg locator

8、BEGIN和END格式(打印标签)

#打印行首
[root@python ~]# awk 'BEGIN{print "SERVICEttPORTtttDESCRIPION"}{print $0}' test.txt
SERVICE         PORT                    DESCRIPION
3gpp-cbsp       48049/tcp               # 3GPP Cell Broadcast Service Protocol
isnetserv       48128/udp               # Image Systems Network Services
blp5            48129/tcp               # Bloomberg locator
blp5            48129/udp               # Bloomberg locator
com-bardac-dw   48556/udp               # com-bardac-dw
iqobject        48619/tcp               # iqobject

#打印行尾
[root@python ~]# awk '{print $0}END{print "The ending..."}' test.txt
3gpp-cbsp       48049/tcp               # 3GPP Cell Broadcast Service Protocol
isnetserv       48128/udp               # Image Systems Network Services
blp5            48129/tcp               # Bloomberg locator
blp5            48129/udp               # Bloomberg locator
com-bardac-dw   48556/udp               # com-bardac-dw
iqobject        48619/tcp               # iqobject
The ending...

#行首行尾都打印
[root@python ~]# awk 'BEGIN{print "SERVICEttPORTtttDESCRIPION"}{print $0}END{print "The ending..."}' test.txt
SERVICE         PORT                    DESCRIPION
3gpp-cbsp       48049/tcp               # 3GPP Cell Broadcast Service Protocol
isnetserv       48128/udp               # Image Systems Network Services
blp5            48129/tcp               # Bloomberg locator
blp5            48129/udp               # Bloomberg locator
com-bardac-dw   48556/udp               # com-bardac-dw
iqobject        48619/tcp               # iqobject
The ending...

9、表达式与操作符:如果在awk定义的变量中没有初始化,则初始值为空字符串或者0,字符操作一定要加引号(a=”How are you”)。

#统计所有的空白行
[root@python ~]# awk '/^$/{print x+=1}' /etc/ssh/ssh_config
1
2
3
4

#打印出空白行的总个数
[root@python ~]# awk '/^$/{x+=1}END{print x}' /etc/ssh/ssh_config
4

#~(匹配) 、!~(不匹配).打印出root的ID号。匹配第一个字段为root的行,打印出其UID
[root@python ~]# awk -F':' '$1~/root/{print $3}' /etc/passwd
0

#打印出UID大于400的用户
[root@python ~]# awk -F':' '$3>400{print $1}' /etc/passwd
saslauth
mysql
dianel
[root@python ~]# awk -F':' '$3>400{x+=1}END{print x}' /etc/passwd
3

10、awk的高级应用

if条件判断

#判断boot分区可用容量小于20M时报警,否则就显示OK
[root@python ~]# df|grep 'boot'|awk '{if($4<20000)print"alart";else print "ok"}'
ok

[root@python ~]# seq 5|awk '{if($0==3)print $0}'
3
[root@python ~]# seq 5|awk '{if($0==3)print $0;else print "no"}'
no
no
3
no
no

while循环:语法格式

while (condition) statement
do statement while (condition)

两种格式:

#因为i和total都初始化,默认为0
[root@python ~]# awk 'BEGIN{do {i++;total+=i}while(i<100)print total}' 
5050

[root@python ~]# awk 'BEGIN{while(i<100){i++;total+=i}print total}' 
5050

for循环:语法格式

1.for (expr1; expr2; expr3) statement
2.for (var in array) statement

[root@python ~]# awk 'BEGIN{for(i=0;i<101;i++){total+=i} print total}'
5050

break和continue

  • break:跳出循环

  • continue:终止当前循环

打印IP:

[root@python ~]# ifconfig eth0|awk '/Bcast/'|awk -F'[ :]+' '{print $4}'
192.168.1.13