Linux系统运维日志 – 第339页 – 又一个WordPress站点

使用heartbeat实现DRBD主从自动切换

这里简单介绍一下heartbeat和drbd。
如果主服务器宕机，造成的损失是不可估量的。要保证主服务器不间断服务，就需要对服务器实现冗余。在众多的实现服务器冗余的解决方案中，heartbeat为我们提供了廉价的、可伸缩的高可用集群方案。我们通过heartbeat+drbd在Linux下创建一个高可用(HA)的集群服务器。

DRBD是一种块设备，可以被用于高可用(HA)之中。它类似于一个网络RAID-1功能。当你将数据写入本地文件系统时，数据还将会被发送到网络中另一台主机上。以相同的形式记录在一个文件系统中。本地(主节点)与远程主机(备节点)的数据可以保证实时同步。当本地系统出现故障时，远程主机上还会保留有一份相同的数据，可以继续使用。在高可用(HA)中使用DRBD功能，可以代替使用一个共享盘阵。因为数据同时存在于本地主机和远程主机上。切换时，远程主机只要使用它上面的那份备份数据，就可以继续进行服务了。

下面我们部署这一高可用。首先安装heartbeat，执行yum install heartbeat即可，不建议编译安装heartbeat，因为安装时间特长，容易出问题；接着安装drbd，安装方法见：http://devops.webres.wang/2012/02/drbd-compile-install-deploy/，唯一不同的是在./configure命令中添加–with-heartbeat，安装完成后会在/usr/local/drbd/etc/ha.d/resource.d生成drbddisk和drbdupper文件，把这两个文件复制到/usr/local/heartbeat/etc/ha.d/resource.d目录,命令cp -R /usr/local/drbd/etc/ha.d/resource.d/* /etc/ha.d/resource.d。
我们的主机ip是192.168.79.130，备机ip:192.168.79.131，虚拟ip:192.168.79.135，drbd同步的分区/dev/sdb1，挂载的目录/data。

drbd配置

1、首先对/dev/sdb分区出/dev/sdb1,建立目录/data。
2、配置global和resource。
配置drbd.conf:

vi /usr/local/drbd/etc/drbd.conf

写入：

include "drbd.d/global_common.conf";
include "drbd.d/*.res";

配置global_common.conf

vi /usr/local/drbd/etc/drbd.d/global_common.conf

写入：

global {
usage-count yes;
}
common {
net {
protocol C;
}
}

配置r0资源：

vi /usr/local/drbd/etc/drbd.d/r0.res

写入：

resource r0 {
on node1 {
device /dev/drbd1;
disk /dev/sdb1;
address 192.168.79.130:7789;
meta-disk internal;
}
on node2 {
device /dev/drbd1;
disk /dev/sdb1;
address 192.168.79.131:7789;
meta-disk internal;
}
}

3、设置hostname。

vi /etc/sysconfig/network

修改HOSTNAME为node1
编辑hosts

vi /etc/hosts

添加：

192.168.79.130 node1
192.168.79.131 node2

使node1 hostnmae临时生效

hostname node1

node2设置类似。
4、设置resource
以下操作需要在node1和node2操作。

modprobe drbd //载入 drbd 模块
dd if=/dev/zero of=/dev/sdb1 bs=1M count=100 /把一些资料塞到 sdb 內 (否则 create-md 时有可能会出现错误)
drbdadm create-md r0 //建立 drbd resource
drbdadm up r0 //启动 resource r0

5、设置Primary Node
以下操作仅在node1执行。
设置node1为primary node:

drbdadm primary –force r0

6、创建DRBD文件系统
以下操作仅在node1执行。
上面已经完成了/dev/drbd1的初始化，现在来把/dev/drbd1格式化成ext3格式的文件系统。

mkfs.ext3 /dev/drbd1

然后将/dev/drbd1挂载到之前创建的/data目录。

mount /dev/drbd1 /data

heartbeat配置

总共有三个文件需要配置:
ha.cf 监控配置文件
haresources 资源管理文件
authkeys 心跳线连接加密文件
1、同步两台节点的时间

rm -rf /etc/localtime
cp -f /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
yum install -y ntp
ntpdate -d cn.pool.ntp.org

2、配置ha.cf

vi /etc/ha.d/ha.cf

debugfile /var/log/ha-debug #打开错误日志报告
keepalive 2 #两秒检测一次心跳线连接
deadtime 10 #10 秒测试不到主服务器心跳线为有问题出现
warntime 6 #警告时间（最好在 2 ～ 10 之间）
initdead 120 #初始化启动时 120 秒无连接视为正常，或指定heartbeat
#在启动时，需要等待120秒才去启动任何资源。
udpport 694 #用 udp 的 694 端口连接
ucast eth0 192.168.79.131 #单播方式连接（主从都写对方的 ip 进行连接）
node node1 #声明主服(注意是主机名uname -n不是域名)
node node2 #声明备服(注意是主机名uname -n不是域名)
auto_failback on #自动切换（主服恢复后可自动切换回来）这个不要开启
respawn hacluster /usr/lib/heartbeat/ipfail #监控ipfail进程是否挂掉，如果挂掉就重启它

3、配置authkeys

vi /etc/ha.d/authkeys

写入：

auth 1
1 crc

4、配置haresources

vi /etc/ha.d/haresources

写入：

node1 IPaddr::192.168.79.135/24/eth0 drbddisk::r0 Filesystem::/dev/drbd1::/data::ext3

node1:master主机名
IPaddr::192.168.79.135/24/eth0:设置虚拟IP
drbddisk::r0:管理资源r0
Filesystem::/dev/drbd1::/data::ext3:执行mount与unmout操作
node2配置基本相同，不同的是ha.cf中的192.168.79.131改为192.168.79.130。

DRBD主从自动切换测试

首先先在node1启动heartbeat，接着在node2启动，这时，node1等node2完全启动后，相继执行设置虚拟IP，启动drbd并设置primary，并挂载/dev/drbd1到/data目录，启动命令为：

service heartbeat start

这时，我们执行ip a命令，发现多了一个IP 192.168.79.135，这个就是虚拟IP，cat /proc/drbd查看drbd状态，显示primary/secondary状态，df -h显示/dev/drbd1已经挂载到/data目录。
然后我们来测试故障自动切换，停止node1的heartbeat服务或者断开网络连接，几秒后到node2查看状态。
接着恢复node1的heartbeat服务或者网络连接，查看其状态。

You will need re2c 0.13.4 or later if you want to regenerate PHP parsers

编译PHP可能会出现错误：You will need re2c 0.13.4 or later if you want to regenerate PHP parsers，解决方法是安装或升级re2c 0.13.4以上版本。
下面我们用rpm包安装此库。
centos-5 32位：http://pkgs.repoforge.org/re2c/re2c-0.13.5-1.el5.rf.i386.rpm
centos-5 64位：http://pkgs.repoforge.org/re2c/re2c-0.13.5-1.el5.rf.x86_64.rpm
centos-6 32位：http://pkgs.repoforge.org/re2c/re2c-0.13.5-1.el6.rf.i686.rpm
centos-6 64位：http://pkgs.repoforge.org/re2c/re2c-0.13.5-1.el6.rf.x86_64.rpm
根据对应的系统下载好rpm包后，执行rpm -i xxx.rpm安装re2c。

heartbeat配置文件中英对照

ha.cf

#
# There are lots of options in this file. All you have to have is a set
# of nodes listed {“node …} one of {serial, bcast, mcast, or ucast},
# and a value for “auto_failback”.
# 这文件下面有很多的选项，你必须设置的有节点列表集{node …}，{serial,bcast,mcast,或ucast}中的一个，auto_failback的值
#
# ATTENTION: As the configuration file is read line by line,
# THE ORDER OF DIRECTIVE MATTERS!
# 注意：配置文件是逐行读取的，并且选项的顺序是会影响最终结果的。
#
# In particular, make sure that the udpport, serial baud rate
# etc. are set before the heartbeat media are defined!
# debug and log file directives go into effect when they
# are encountered.
# 特别注意，确保udpport,serial baud rate等配置在心跳检测媒体（heartbeat media）前！他们将影响debug和log file指令。
# 也就是是在定义网卡，串口等心跳检测接口前先要定义端口号。
#
# All will be fine if you keep them ordered as in this example.
# 如果你保持他们在此例子中的顺序的话一切都不会有问题。
#
# Note on logging:
# If all of debugfile, logfile and logfacility are not defined,
# logging is the same as use_logd yes. In other case, they are
# respectively effective. if detering the logging to syslog,
# logfacility must be “none”.
# 记录日志方面的注意事项：
# 如果debugfile,logfile和logfacility都没有定义，日志记录就相当于use_logd yes。否则，他们将分别生效。如果要阻止记录日志到syslog，那么logfacility必须设置为“none”
#
# File to write debug messages to
# 写入debug消息的文件
#debugfile /var/log/ha-debug
#
#
# File to write other messages to
# 写入其他消息的文件
#logfile /var/log/ha-log
#
#
# Facility to use for syslog()/logger
# 用于syslog()/logger的设备
logfacility local0
#
#
# A note on specifying “how long” times below…
# 在下面指定多长时间时应该注意
# The default time unit is seconds
# 缺省的时间单位是秒
# 10 means ten seconds
# 10就代表10秒
#
# You can also specify them in milliseconds
# 1500ms means 1.5 seconds
# 你也可以指定他们以毫秒为单位
# 1500ms表示 1.5秒
#
# keepalive: how long between heartbeats?
# keepalive: 在heartbeat之间连接保持多久
#keepalive 2
#
# deadtime: how long-to-declare-host-dead?
# deadtime：
# If you set this too low you will get the problematic
# split-brain (or cluster partition) problem.
# See the FAQ for how to use warntime to tune deadtime.
# 如果这个时间值设置得太低可能会导致出现很难判断的问题，如何使用warntime来调节deadtime请查看FAQ。
#
#deadtime 30
#
# warntime: how long before issuing “late heartbeat” warning?
# See the FAQ for how to use warntime to tune deadtime.
#
#warntime 10
#
#
# Very first dead time (initdead)
#
# On some machines/OSes, etc. the network takes a while to come up
# and start working right after you’ve been rebooted. As a result
# we have a separate dead time for when things first come up.
# It should be at least twice the normal dead time.
# 在某些机器/操作系统等中，网络在机器重启后需要花一定的时间启动并正常工作。因此我们必须分开他们初次起来的dead time，这个值应该最少设置为两倍的正常dead time。
#
#initdead 120
#
#
# What UDP port to use for bcast/ucast communication?
# 用于bacst/ucast通讯的UDP端口
#
#udpport 694
#
# Baud rate for serial ports…
# 串口的波特率
#baud 19200
#
# serial serialportname …
# serial 串口名称
#serial /dev/ttyS0 # Linux
#serial /dev/cuaa0 # FreeBSD
#serial /dev/cuad0 # FreeBSD 6.x
#serial /dev/cua/a # Solaris
#
#
# What interfaces to broadcast heartbeats over?
# 广播heartbeats的接口
#
#bcast eth0 # Linux
#bcast eth1 eth2 # Linux
#bcast le0 # Solaris
#bcast le1 le2 # Solaris
#
# Set up a multicast heartbeat medium
# 设置一个多播心跳介质
# mcast [dev] [mcast group] [port] [ttl] [loop]
#
# [dev] device to send/rcv heartbeats on 发送/接收heartbeats的设备
# [mcast group] multicast group to join (class D multicast address 224.0.0.0 – 239.255.255.255) 加入到的多播组（D类多播地址224.0.0.0 – 239.255.255.255）
# [port] udp port to sendto/rcvfrom udp(set this value to the same value as “udpport” above) 端口用于发送/接收udp（设置这个值跟上面的udpport为相同值）
# [ttl] the ttl value for outbound heartbeats. this effects how far the multicast packet will propagate. (0-255) Must be greater than zero.
# 外流的heartbeats的ttl值。这个影响多播包能传播多远。（0-255）必须要大于0 。
# [loop] toggles loopback for outbound multicast heartbeats.if enabled, an outbound packet will be looped back and received by the interface it was sent # on. (0 or 1) Set this value to zero.
# 为多播heartbeat开关loopback。如果enabled，一个外流的包将被回环到原处并由发送它的接口接收。（0或者1）设置这个值为0。
#
#mcast eth0 225.0.0.1 694 1 0
#
# Set up a unicast / udp heartbeat medium
# 配置一个unicast / udp heartbeat 介质
# ucast [dev] [peer-ip-addr]
#
# [dev] device to send/rcv heartbeats on 用于发送/接收heartbeat的设备
# [peer-ip-addr] IP address of peer to send packets to 包被发送到的对等的IP地址
#
#ucast eth0 192.168.1.2
#
#
# About boolean values…
# 关于boolean值
# Any of the following case-insensitive values will work for true:
# 下面的非大小写敏感的值将认为是true：
# true, on, yes, y, 1
# Any of the following case-insensitive values will work for false:
# 下面的非大小写敏感的值将认为是false：
# false, off, no, n, 0
#
#
#
# auto_failback: determines whether a resource will
# automatically fail back to its “primary” node, or remain
# on whatever node is serving it until that node fails, or
# an administrator intervenes.
# auto_failback: 决定一个resource是否自动恢复到它的primary节点，或者不管什么节点，都继续运行在上面直到节点出现故障或管# 理员进行干预。
#
#
# The possible values for auto_failback are:
# auto_failback 的可能值有：
# on – enable automatic failbacks
# on – 允许自动failbacks
# off – disable automatic failbacks
# off – 禁止自动failbacks
# legacy – enable automatic failbacks in systems where all nodes do not yet support the auto_failback option.
# legacy – 在所有节点都还不支持auto_failback的选项中允许自动failbacks
# auto_failback “on” and “off” are backwards compatible with the old “nice_failback on” setting.
# auto_failback “on”和”off”向后兼容旧的”nice_failback on”设置。
#
# See the FAQ for information on how to convert from “legacy” to “on” without a flash cut.
# (i.e., using a “rolling upgrade” process)
# 查看FAQ获取如何从”legacy”转为到”on”并不会闪断的信息。
#
#
# The default value for auto_failback is “legacy”, which
# will issue a warning at startup. So, make sure you put
# an auto_failback directive in your ha.cf file.
# (note: auto_failback can be any boolean or “legacy”)
# 缺省的auto_failback值是“legacy”，它在启动的时候会发送一个警告。因此，确保你在ha.cf文件中配置了auto_failback指令。
#
auto_failback on
#
#
# Basic STONITH support
# Using this directive assumes that there is one stonith
# device in the cluster. Parameters to this device are
# read from a configuration file. The format of this line is:
# 基本上STONITH支持
# 使用这个指令假设有一个stonith设备在集群中。这个设备的参数从一个配置文件中读取，这行的格式是：
#
# stonith
#
# NOTE: it is up to you to maintain this file on each node in the
# cluster!
# 注意：在集群中的每个节点上的这个文件都靠你去维护。
#
#stonith baytech /etc/ha.d/conf/stonith.baytech
#
# STONITH support
# You can configure multiple stonith devices using this directive.
# 你可以使用这个指令配置多个stonith设备：
# The format of the line is:
# 这行的格式是：
# stonith_host #
# is the machine the stonith device is attached to or * to mean it is accessible from any host.
# 表示stonith设备联结到的机器或者用*来表示从任何主机都可以访问。
# is the type of stonith device (a list of supported drives is in /usr/lib/stonith.)
# 是stonith设备的类型（支持的设备的列表在/usr/lib/stonith中）
# are driver specific parameters. To see the format for a particular device, run:
# 是驱动指定的参数，要查看特定设备的格式，运行：
# stonith -l -t
#
#
# Note that if you put your stonith device access information in
# here, and you make this file publically readable, you’re asking
# for a denial of service attack
# 需要注意如果你将你的stonith设备的访问信息放在这里，并且你让这个文件开放读权限，那么你是在召唤一个DoS攻击。
#
# To get a list of supported stonith devices, run
# 要得到支持的stonith设备的列表，运行
# stonith -L
#
# For detailed information on which stonith devices are supported
# and their detailed configuration options, run this command:
# 要哪个stonith设备是支持的详细信息和它们详细的配置选项，运行这个命令：
# stonith -h
#
#stonith_host * baytech 10.0.0.3 mylogin mysecretpassword
#stonith_host ken3 rps10 /dev/ttyS1 kathy 0
#stonith_host kathy rps10 /dev/ttyS1 ken3 0
#
# Watchdog is the watchdog timer. If our own heart doesn’t beat for
# a minute, then our machine will reboot.
# Watchdog是一个watchdog计时器，如果我们的心超过一分钟不跳，我们的机器将会reboot。
#
# NOTE: If you are using the software watchdog, you very likely
# wish to load the module with the parameter “nowayout=0″ or
# compile it without CONFIG_WATCHDOG_NOWAYOUT set. Otherwise even
# an orderly shutdown of heartbeat will trigger a reboot, which is
# very likely NOT what you want.
# 注意：如果你使用软件watchdog，你很可能希望用参数“nowayout=0”来加载这个模块或编译它的时候去掉
# CONFIG_WATCHDOG_NOWAYOUT设置。否则，即使一个有序的关闭heartbeat也会触发重启，这很可能不是你想要的。
#
#watchdog /dev/watchdog
#
# Tell what machines are in the cluster
# 说明说明机器在这个集群里面
# node nodename … — must match uname -n
# node nodename … –必须要匹配uname -n
#node ken3
#node kathy
#
# Less common options…
# 非常用的选项
# Treats 10.10.10.254 as a psuedo-cluster-member
# Used together with ipfail below…
# note: don’t use a cluster node as ping node
# 将10.10.10.254看成一个伪集群成员，与下面的ipfail一起使用。
# 注意：不要使用一个集群节点作为ping节点
#
#ping 10.10.10.254
#
# Treats 10.10.10.254 and 10.10.10.253 as a psuedo-cluster-member
# called group1. If either 10.10.10.254 or 10.10.10.253 are up
# then group1 is up
# Used together with ipfail below…
# 将10.10.10.254和10.10.10.254看成一个叫group1的伪集群成员。如果10.10.10.254或10.10.10.253是up的，那么group1为up
# 与下面的ipfail一起使用。
#
#ping_group group1 10.10.10.254 10.10.10.253
#
# HBA ping derective for Fiber Channel
# Treats fc-card-name as psudo-cluster-member
# used with ipfail below …
# 用于Fiber Channel的HBA ping指令，将fc-card-name看成是伪集群成员，与下面的ipfail一起使用。
#
# You can obtain HBAAPI from http://hbaapi.sourceforge.net. You need
# to get the library specific to your HBA directly from the vender
# To install HBAAPI stuff, all You need to do is to compile the common
# part you obtained from the sourceforge. This will produce libHBAAPI.so
# which you need to copy to /usr/lib. You need also copy hbaapi.h to
# /usr/include.
# 你可以从http://hbaapi.sourceforge.net获取HBAAPI，你需要从vender获得用于你的HBA指令的特定的库来安装HBAAPI。
# 你所需要做的是编译你从sourceforge获得的通用部分，它会生成libHBAAPI.so，然后你要将它拷贝到/usr/lib目录。同时
# 你也要吧hbaapi.h拷贝到/usr/include 。
#
# The fc-card-name is the name obtained from the hbaapitest program
# that is part of the hbaapi package. Running hbaapitest will produce
# a verbose output. One of the first line is similar to:
# Apapter number 0 is named: qlogic-qla2200-0
# Here fc-card-name is qlogic-qla2200-0.
# fc-card-name是从hbaapitest程序获取的名字，它是hbaapi包的一部分。运行hbaapitest将生成一个冗长的输出，其中第一行类似：
# Apapter number 0 is named: qlogic-qla2200-0
# 在这里fc-card-name是qlogic-qla2200-0
#
#hbaping fc-card-name
#
#
# Processes started and stopped with heartbeat. Restarted unless
# they exit with rc=100
# 与heartbeat一起启动和停止的进程。重启，除非它们的以rc=100退出。
#
#respawn userid /path/name/to/run
#respawn hacluster /usr/lib/heartbeat/ipfail
#
# Access control for client api
# default is no access
# 用于客户端api的访问控制，缺省为不可访问。
#
#apiauth client-name gid=gidlist uid=uidlist
#apiauth ipfail gid=haclient uid=hacluster
###########################
#
# Unusual options.
# 非常选项
###########################
#
# hopfudge maximum hop count minus number of nodes in config
#hopfudge 1
#
# deadping – dead time for ping nodes 上面设置的用来ping的节点的死亡时间
#deadping 30
#
# hbgenmethod – Heartbeat generation number creation method，Normally these are stored on disk and incremented as needed.
# hbgenmethod – Heartbeat产生数字的生产方法。通常执行存储在磁盘上并在需要时进行增量。
#
#hbgenmethod time
#
# realtime – enable/disable realtime execution (high priority, etc.) defaults to on
# realtime – 允许/禁止实时执行（高优先级）缺省为on
#realtime off
#
# debug – set debug level .defaults to zero
# debug – 设置debug等级，缺省为0
#debug 1
#
# API Authentication – replaces the fifo-permissions-based system of the past
# APT认证 – 代替以前的fifo-permission-base系统
#
# You can put a uid list and/or a gid list.If you put both, then a process is authorized if it qualifies under either the uid list, or under the gid list.
# 可以放上一个uid列表和/或gid列表。如果两个都放，那么符合uid列表或gid列表中的进程都将通过验证
#
#
# The groupname “default” has special meaning. If it is specified, then
# this will be used for authorizing groupless clients, and any client groups
# not otherwise specified.
# 组名“default”有特定的意思。如果它被指定，那么它将用于验证无组的客户端和任何没有另外指定的客户组
#
# There is a subtle exception to this. “default” will never be used in the
# following cases (actual default auth directives noted in brackets)
# 这是一个复杂的表达式，“default”将从不用于下面的情况（现实中缺省的验证指令记录在括号中）
# ipfail (uid=HA_CCMUSER)
# ccm (uid=HA_CCMUSER)
# ping (gid=HA_APIGROUP)
# cl_status (gid=HA_APIGROUP)
#
# This is done to avoid creating a gaping security hole and matches the most likely desired configuration.
# 它避免生成一个安全漏洞缺口并匹配到了可能很多人最渴望的配置。
#
#apiauth ipfail uid=hacluster
#apiauth ccm uid=hacluster
#apiauth cms uid=hacluster
#apiauth ping gid=haclient uid=alanr,root
#apiauth default gid=haclient
# message format in the wire, it can be classic or netstring,
# default: classic
# 网线中的信息格式，可以是classic或netstring
#
#msgfmt classic/netstring
#
# Do we use logging daemon?
# If logging daemon is used, logfile/debugfile/logfacility in this file
# are not meaningful any longer. You should check the config file for logging
# daemon (the default is /etc/logd.cf)
# more infomartion can be fould in http://www.linux-ha.org/ha_2ecf_2fUseLogdDirective
# Setting use_logd to “yes” is recommended
# 我们是否使用记录监控？
# 如果使用了记录监控，此文件里面的logfile/debugfile/logfacility将不再有意义。你应该检查在配置文件中是否有记录监控（缺省为/etc/logd.cf）
# 更多的信息可以在http://www.linux-ha.org/ha_2ecf_2fUseLogdDirective中找到。推荐配置use_logd为yes。
#
# use_logd yes/no
#
# the interval we reconnect to logging daemon if the previous connection failed
# default: 60 seconds
# 如果前一个连接失败了，我们再次连接到记录监控器的间隔。
#conn_logd_time 60
#
#
# Configure compression module
# It could be zlib or bz2, depending on whether u have the corresponding
# library in the system.
# 配置压缩模块
# 它可以为zlib或bz2，基于我们的系统中是否有相应的库。
#
#compression bz2
#
# Confiugre compression threshold
# This value determines the threshold to compress a message,
# e.g. if the threshold is 1, then any message with size greater than 1 KB
# will be compressed, the default is 2 (KB)
# 配置压缩的限度
# 这个值决定压缩一个信息的限度，例如：如果限度为1，那么任何大于1KB的消息都会被压缩，缺省为2（KB）
#compression_threshold 2

haresources

#
# This is a list of resources that move from machine to machine as
# nodes go down and come up in the cluster. Do not include
# “administrative” or fixed IP addresses in this file.
# 这是当集群中的节点拓机和启动时从一台机器转移到另一台机器的resources列表，不要包含管理或已用IP地址在这个文件中。
#
#
# The haresources files MUST BE IDENTICAL on all nodes of the cluster.
# 此haresources文件在所有的集群节点中都必须相同
# The node names listed in front of the resource group information
# is the name of the preferred node to run the service. It is
# not necessarily the name of the current machine. If you are running
# auto_failback ON (or legacy), then these services will be started
# up on the preferred nodes – any time they’re up.
# 列在resource组信息前的节点名称是优先运行服务的节点名称，它不需要是当前机器的名称，如果你运行auto_failback on(或者
# legacy)，那么这些服务将会在优先节点启动，只要它们是运行的。
#
# If you are running with auto_failback OFF, then the node information
# will be used in the case of a simultaneous start-up, or when using
# the hb_standby {foreign,local} command.
# 如果你运行auto_failback off，那么节点信息将使用在同时启动的情况，或当使用hb_standby {foreign,local}命令时。
#
# BUT FOR ALL OF THESE CASES, the haresources files MUST BE IDENTICAL.
# If your files are different then almost certainly something
# won’t work right.
# 但是对于所有的这些情况，此haresources文件都必须相同。如果你的文件不同那么肯定有某些东西将不能正常工作。
#
#
#
# We refer to this file when we’re coming up, and when a machine is being
# taken over after going down.
# 我们在起动的时候和一个机器停机后被接管的时候参考这个文件。
#
# You need to make this right for your installation, then install it in
# /etc/ha.d
# 你必须让它符合你的安装，然后安装它到/etc/ha.d目录。
#
# Each logical line in the file constitutes a “resource group”.
# A resource group is a list of resources which move together from
# one node to another – in the order listed. It is assumed that there
# is no relationship between different resource groups. These
# resource in a resource group are started left-to-right, and stopped
# right-to-left. Long lists of resources can be continued from line
# to line by ending the lines with backslashes (“”).
# 在文件里面的每个逻辑行组成一个“resource group”。一个resource group就是从一个节点移动到另一个的resources的列表。
# 可以假设不同的resource groups之间是没有关系的。resource group的resource启动时是从左到右的。关闭时是从右到左的。
# 长的resources列表可以以反斜杠（“”）结尾来续行。
#
# These resources in this file are either IP addresses, or the name
# of scripts to run to “start” or “stop” the given resource.
# 在这个文件里面的resources可以是IP地址，也可以是用于“start”或“stop”给定的resource的脚本名称
#
# The format is like this:
#
#node-name resource1 resource2 … resourceN
#
#
# If the resource name contains an :: in the middle of it, the
# part after the :: is passed to the resource script as an argument.
# Multiple arguments are separated by the :: delimeter
# 如果resource的名称包含一个::在它的中间，在::后面的部分会传递给resource的脚本中作为一个参数，多个参数会以::分割。
#
# In the case of IP addresses, the resource script name IPaddr is implied.
# 在IP地址的情况中，resource脚本名称IPaddr是隐含的。
#
# For example, the IP address 135.9.8.7 could also be represented
# as IPaddr::135.9.8.7
# 例如：IP地址135.9.8.7也可以被表现为IPaddr::135.9.8.7
#
# THIS IS IMPORTANT!! vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#
# The given IP address is directed to an interface which has a route
# to the given address. This means you have to have a net route
# set up outside of the High-Availability structure. We don’t set it
# up here — we key off of it.
# 给定的IP地址会直接连到有路由到给定的地址的接口上，这也就意味着你必须要在 High-Availability 外部配置一个网络路由。我们不在这里配置，我们切断它。
#
# The broadcast address for the IP alias that is created to support
# an IP address defaults to the highest address on the subnet.
# IP别名的广播地址将被缺省创建为支持IP地址的子网里的最高地址
#
# The netmask for the IP alias that is created defaults to the same
# netmask as the route that it selected in in the step above.
# IP别名的子网掩码将被缺省创建为与上面选择的路由相同的子网掩码
#
# The base interface for the IPalias that is created defaults to the
# same netmask as the route that it selected in the step above.
# IP别名的基础接口将被缺省创建为与上面选择的路由相同的子网掩码
#
# If you want to specify that this IP address is to be brought up
# on a subnet with a netmask of 255.255.255.0, you would specify
# this as IPaddr::135.9.8.7/24 .
# 如果你想要指定某个IP地址用指定的子网掩码来启动，那么像这样指定它 IPaddr::135.9.8.7/24
#
# If you wished to tell it that the broadcast address for this subnet
# was 135.9.8.210, then you would specify that this way:
# IPaddr::135.9.8.7/24/135.9.8.210
# 如果你想要指明这个子网的广播地址为135.9.8.210，那么可以像这样指定 IPaddr::135.9.8.7/24/135.9.8.210
#
# If you wished to tell it that the interface to add the address to
# is eth0, then you would need to specify it this way:
# IPaddr::135.9.8.7/24/eth0
# 如果你希望指明要增加地址的接口是eth0，那么你需要像这样指定 IPaddr::135.9.8.7/24/eth0
#
# And this way to specify both the broadcast address and the
# interface:
# IPaddr::135.9.8.7/24/eth0/135.9.8.210
# 同时指定广播地址和接口的方法为：
# IPaddr::135.9.8.7/24/eth0/135.9.8.210
#
# The IP addresses you list in this file are called “service” addresses,
# since they’re the publicly advertised addresses that clients
# use to get at highly available services.
# 列表在这个文件中的IP地址叫做服务地址，它们是客户端用于获取高可用服务的公共通告地址
#
# For a hot/standby (non load-sharing) 2-node system with only a single service address,
# you will probably only put one system name and one IP address in here.
# The name you give the address to is the name of the default “hot”
# system.
# 对于一个hot/standby（非共享负载）单服务地址的双节点系统，你可能只需要放置一个系统名称和一个IP地址在这里。你给定的地址对应的名字就是缺省的hot系统的名字。
#
# Where the nodename is the name of the node which “normally” owns the
# resource. If this machine is up, it will always have the resource
# it is shown as owning.
# 节点名称就是正常情况下拥有resource的节点的名称。如果此机器是up的，他将一直拥有以拥有显示的resource。
#
# The string you put in for nodename must match the uname -n name
# of your machine. Depending on how you have it administered, it could
# be a short name or a FQDN.
# 设置作为节点名称的字符串必须匹配在机器上使用uname -n获得的名字。基于你如果进行管理，它可能是一个缩写名称或一个FQDN。
#
#——————————————————————-
#
# Simple case: One service address, default subnet and netmask
# No servers that go up and down with the IP address
# 简单情况：一个服务地址，缺省子网和掩码，没有服务与IP地址一起启动和关闭
#
#just.linux-ha.org 135.9.216.110
#
#——————————————————————-
#
# Assuming the adminstrative addresses are on the same subnet…
# A little more complex case: One service address, default subnet
# and netmask, and you want to start and stop http when you get
# the IP address…
# 假定管理地址在相同的子网…
# 稍微复杂一些的情况：一个服务地址，缺省子网和子网掩码，同时你要在获得IP地址的时候启动和停止http。
#
#just.linux-ha.org 135.9.216.110 http
#——————————————————————-
#
# A little more complex case: Three service addresses, default subnet
# and netmask, and you want to start and stop http when you get
# the IP address…
# 稍微复杂一些的情况：三个服务地址，缺省子网和掩码，同时你要在获得IP地址的时候启动和停止http。
#
#just.linux-ha.org 135.9.216.110 135.9.215.111 135.9.216.112 httpd
#——————————————————————-
#
# One service address, with the subnet, interface and bcast addr
# explicitly defined.
# 一个服务地址，显式指定子网，接口，广播地址
#
#just.linux-ha.org 135.9.216.3/28/eth0/135.9.216.12 httpd
#
#——————————————————————-
#
# An example where a shared filesystem is to be used.
# Note that multiple aguments are passed to this script using
# the delimiter ‘::’ to separate each argument.
# 一个使用共享文件系统的例子
# 需要注意用’::’分隔的多个参数被传递到了这个脚本
#
#node1 10.0.0.170 Filesystem::/dev/sda1::/data1::ext2
#
# Regarding the node-names in this file:
# 关于这个文件中的节点名称：
# They must match the names of the nodes listed in ha.cf, which in turn
# must match the `uname -n` of some node in the cluster. So they aren’t
# virtual in any sense of the word.
# 它们必须匹配在ha.cf中列出的节点名称，依次必须匹配集群中的某些节点’unmae -n’的结果。所以它们不是对于词的虚假感觉。
#

authkeys

#
# Authentication file. Must be mode 600
# 验证文件。模式必须为600
#
# Must have exactly one auth directive at the front.
# auth send authentication using this method-id
# 必须有且只有一个auth指令在前面
# auth method-id 使用这个方法id发送验证
#
# Then, list the method and key that go with that method-id
# 然后列出方法和该方法的密钥
#
# Available methods: crc sha1, md5. Crc doesn’t need/want a key.
# 可用的模块：crc、sha1、md5。其中crc不需要一个密钥。
#
# You normally only have one authentication method-id listed in this file
# 通常只放置一个验证方法id在这个文件中
#
# Put more than one to make a smooth transition when changing auth
# methods and/or keys.
# 可以放置多于一个来使得进行验证方法和/或密钥更改的过渡变得平滑
#
#
# sha1 is believed to be the “best”, md5 next best.
# sha1被认为是最好的，md5第二。
#
# crc adds no security, except from packet corruption.
# Use only on physically secure networks.
# 除了防止包格式改变，crc不加安全保护。只能使用在物理上的安全网络。
#
#auth 1
#1 crc
#2 sha1 HI!
#3 md5 Hello!
转自：HA配置文件中英对照之ha.cf
HA配置文件中英对照之haresources
HA配置文件中英对照之authkeys

heartbeat配置文件ha.cf haresources authkeys详解

在启用Heartbeat之前，安装后要配置三个文件（如没有可手动建立）：ha.cf、haresources、authkeys。这三个配置文件需要在/etc/ha.d目录下面，但是默认是没有这三个文件的，可以到官网上下这三个文件，也可以在源码包里找这三个文件，在源码目录下的DOC子目录里。

1 配置ha.cf

第一个是ha.cf该文件位于在安装后创建的/etc/ha.d目录中。该文件中包括为Heartbeat使用何种介质通路和如何配置他们的信息。在源代码目录中的ha.cf文件包含了您可以使用的全部选项，详述如下：
serial /dev/ttyS0
使用串口heartbeat－如果不使用串口heartbeat，则必须使用其他的介质，如bcast（以太网）heartbeat。用适当的设备文件代替/dev/ttyS0。
watchdog /dev/watchdog

该选项是可选配置。通过Watchdog 功能可以获得提供最少功能的系统，该系统不提供heartbeat，可以在持续一份钟的不正常状态后重新启动。该功能有助于避免一台机器在被认定已经死亡之后恢复heartbeat的情况。如果这种情况发生并且磁盘挂载因故障而迁移（fail over），便有可能有两个节点同时挂载一块磁盘。如果要使用这项功能，则除了这行之外，也需要加载“softdog”内核模块，并创建相应的设备文件。方法是使用命令“insmod softdog”加载模块。然后输入“grep misc /proc/devices”并记住得到的数字（应该是10）。然后输入”cat /proc/misc | grep watchdog”并记住输出的数字（应该是130）。根据以上得到的信息可以创建设备文件，“mknod /dev/watchdog c 10 130”。
bcast eth1
表示在eth1接口上使用广播heartbeat（将eth1替换为eth0，eth2，或者您使用的任何接口）。
keepalive 2
设定heartbeat之间的时间间隔为2秒。
warntime 10
在日志中发出“late heartbeat“警告之前等待的时间，单位为秒。
deadtime 30
在30秒后宣布节点死亡。
initdead 120
在某些配置下，重启后网络需要一些时间才能正常工作。这个单独的”deadtime”选项可以处理这种情况。它的取值至少应该为通常deadtime的两倍。
baud 19200
波特率，串口通信的速度。
udpport 694
使用端口694进行bcast和ucast通信。这是默认的，并且在IANA官方注册的端口号。
auto_failback on
该选项是必须配置的。对于那些熟悉Tru64 Unix的人来说，heartbeat的工作方式类似于“favored member“模式。在failover之前，haresources文件中列出的主节点掌握所有的资源，之后从节点接管这些资源。当auto_failback设置为on时，一旦主节点重新恢复联机，将从从节点取回所有资源。若该选项设置为off，主节点便不能重新获得资源。该选项与废弃的nice_failback选项类似。如果要从一个nice_failback设置为off的集群升级到这个或更新的版本，需要特别注意一些事项以防止flash cut。请参阅FAQ中关于如何处理这类情况的章节。
node primary.mydomain.com
该选项是必须配置的。集群中机器的主机名，与“uname –n”的输出相同。
node backup.mydomain.com
该选项是必须配置的。同上。
respawn
该选项是可选配置的：列出将要执行和监控的命令。例如：要执行ccm守护进程，则要添加如下的内容：
respawn hacluster /usr/lib/heartbeat/ccm
使得Heartbeat以userid（在本例中为hacluster）的身份来执行该进程并监视该进程的执行情况，如果其死亡便重启之。对于ipfail，则应该是：
respawn hacluster /usr/lib/heartbeat/ipfail
注意：如果结束进程的退出代码为100，则不会重启该进程。

2 配置haresources

配置好ha.cf文件之后，便是haresources文件。该文件列出集群所提供的服务以及服务的默认所有者。注意：两个集群节点上的该文件必须相同。集群的IP地址是该选项是必须配置的，不能在haresources文件以外配置该地址, haresources文件用于指定双机系统的主节点、集群IP、子网掩码、广播地址以及启动的服务等。其配置语句格式如下：
node-name network-config
其中node-name指定双机系统的主节点，取值必须匹配ha.cf文件中node选项设置的主机名中的一个，node选项设置的另一个主机名成为从节点。network-config用于网络设置，包括指定集群IP、子网掩码、广播地址等。resource-group用于设置heartbeat启动的服务，该服务最终由双机系统通过集群IP对外提供。在本文中我们假设要配置的HA服务为Apache和Samba。

在haresources文件中需要如下内容：

primary.mydomain.com 192.168.85.3 httpd smb

该行指定在启动时，节点linuxha1得到IP地址192.168.85.3，并启动Apache和Samba。在停止时，Heartbeat将首先停止smb，然后停止Apache，最后释放IP地址192.168.85.3。这里假设命令“uname –n”的输出为“primary.mydomain.com”－如果输出为“primary”，便应使用“primary”。

正确配置好haresources文件之后，将ha.cf和haresource拷贝到/etc/ha.d目录。
注意：资源文件中能执行的命令必须在/etc/ha.d/resource.d/ 中可见

3 配置Authkeys

需要配置的第三个文件authkeys决定了您的认证密钥。共有三种认证方式：crc，md5，和sha1。您可能会问：“我应该用哪个方法呢？”简而言之：如果您的Heartbeat运行于安全网络之上，如本例中的交叉线，可以使用crc，从资源的角度来看，这是代价最低的方法。如果网络并不安全，但您也希望降低CPU使用，则使用md5。最后，如果您想得到最好的认证，而不考虑CPU使用情况，则使用sha1，它在三者之中最难破解。

文件格式如下：

auth
[]

因此，对于sha1，示例的/etc/ha.d/authkeys可能是

auth 1
1 sha1 key-for-sha1-any-text-you-want

对于md5，只要将上面内容中的sha1换成md5就可以了。对于crc，可作如下配置：

auth 2
2 crc

不论您在关键字auth后面指定的是什么索引值，在后面必须要作为键值再次出现。如果您指定“auth 4”，则在后面一定要有一行的内容为“4 ”。

确保该文件的访问权限是安全的，如600。
转自:http://blog.csdn.net/ndcs_dhf2008/article/details/5570219

configure.ac:63: require Automake 1.10.1, but have 1.9.6

安装Resource Agents的时候出现错误:configure.ac:63: require Automake 1.10.1, but have 1.9.6。解决方法：

wget http://ftp.gnu.org/gnu/automake/automake-1.11.2.tar.gz
tar xzf automake-1.11.2.tar.gz
cd automake-1.11.2
./configure
make && make install

configure.ac:9: error: Autoconf version 2.63 or higher is required

安装Resource Agents的时候出现错误：configure.ac:9: error: Autoconf version 2.63 or higher is required。指的是autoconf版本低，需要安装高版本的。

wget http://ftp.gnu.org/gnu/autoconf/autoconf-2.68.tar.gz
tar xzf autoconf-2.68.tar.gz
cd autoconf-2.68
./configure
make && make install

vmware设置centos虚拟机nat联网

今天在vmware虚拟主机中安装hearbeat，为了使用最新的版本，选用编译安装了。在编译过程中，需要连接被墙的网站下载文件，那只能用vpn，但我使用的是桥接方式联网，使用不了真实主机的vpn，于是改用nat联网，设置过程中遇到一些问题，现记录如何设置。
真实主机设置：
本人安装的是vmware 8.0.1英文版。
1、首先检查VM NAT的设置。
打开VM，在菜单中打开Edit->Virtual Network Editor，在弹出的窗口选择VMnet8，检查是否启用了DHCP和设置子网地址和子网掩码，如图：

2、设置虚拟机的联网方式为NAT。
3、设置真实主机VMware Network Adapter VMnet8网卡为自动获取ip和自动获取dns。
4、检查真实主机的VMware DHCP Service 和VMware NAT Service两个服务是否启动。
CentOS虚拟主机设置:
首先在真实主机中获取VMware Network Adapter VMnet8网卡的信息，如windows在cmd下执行ipconfig，如图：

根据图我们知道，ip地址为192.168.79.1，掩码为255.255.255.0，所以我们设置虚拟机的网关为192.168.79.2，子掩码255.255.255.0。
设置虚拟机网卡：

vi /etc/sysconfig/network-script/ifcfg-eth0

设置为：

BOOTPROTO=dhcp
GATEWAY=192.168.79.2
NETMASK=255.255.255.0
ONBOOT=yes

之后重启网卡：

service network restart

使用FreeNx或VNC连接CentOS桌面

我们这里学习使用FreeNx或VNC连接CentOS远程桌面。

FreeNx简介

FreeNX是近年来继VNC之后新出现的远程控制解决方案，基本原理是将XWindows的信号压缩后传输到远程客户端显示，而VNC是直接截取屏幕图像处理传输。这样，在同样的传输信道条件下，FreeNX可以比VNC提供更好的操作感和实时性。

VNC简介

VNC是一款优秀的远程控制工具软件，由著名的AT&T的欧洲研究实验室开发的。VNC是在基于UNIX和Linux操作系统的免费的开放源码软件，远程控制能力强大，高效实用，其性能可以和Windows和MAC中的任何远程控制软件媲美。

gnome桌面安装

如果没有安装桌面，首先需要安装好桌面。

yum -y groupinstall ‘GNOME Desktop Environment’ ‘X Window System’

FreeNx安装配置

1、安装freenx

yum -y install nx freenx

2、如果你机器的ssh设置了PasswordAuthentication no，即取消密码认证，则需要在它下面加上：

AllowUsers nxuser

nxuser是freenx的用户。
3、编辑文件node.conf

vi /etc/nxserver/node.conf

把#ENABLE_PASSDB_AUTHENTICATION=”0″更改为ENABLE_PASSDB_AUTHENTICATION=”1″。
4、增加nxserver用户。

useradd myuser
passwd myuser
nxserver –adduser myuser
nxserver –passwd myuser

5、下载NX客户端软件,安装启动NX，输入Session名称(随意)，输入Host和Port，点击Next，在下拉框中选择gnome桌面，继续next，finish。这时会跳出一个登录框，点击configure,点击key，复制服务器上/etc/nxserver/client.id_dsa.key的文件内容到这个文本框，保存。这时又回到登录框，直接输入用户和密码登录即可。

VNC安装配置

1、安装VNC

yum install vnc-server

2、添加用户

useradd vnc
passwd vnc

3、设置用户的vnc密码

su vnc
vncpasswd
exit

4、编辑vnc服务器配置文件

vi /etc/sysconfig/vncservers

在最后加上:

VNCSERVERS="1:vnc"
VNCSERVERARGS[1]="-geometry 1024×768"

5、创建xstartup脚本(centos-6用户忽视此步)

/sbin/service vncserver start
/sbin/service vncserver stop
su vnc
vi ~/.vnc/xstartup

加入如下代码：

#!/bin/sh
# Add the following line to ensure you always have an xterm available.
( while true ; do xterm ; done ) &
# Uncomment the following two lines for normal desktop:
unset SESSION_MANAGER
exec /etc/X11/xinit/xinitrc
[ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup
[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources
xsetroot -solid grey
vncconfig -iconic &
xterm -geometry 80×24+10+10 -ls -title "$VNCDESKTOP Desktop" &
twm &

退出到root：

exit

6、启动vnc

/sbin/service vncserver start

7、测试vnc
7.1、使用java连接vncserver
在浏览器中输入http://192.168.0.10:5801登录桌面。
7.2、使用vnc viewer连接vncserver
打开vnc viewer，在server中输入192.168.0.10:1进行连接。
注意：192.168.0.10替换成自己的服务器IP。

总结

个人尝试了这两种方法，发现通过freenx连接的桌面，画面非常清晰且流畅，而vnc则逊很多。所以强烈推荐freenx。

Linux安全设置

用户管理

用户权限

1)限制root

echo "tty1" > /etc/securetty
chmod 700 /root

2)密码策略

echo "Passwords expire every 180 days"
perl -npe ‘s/PASS_MAX_DAYSs+99999/PASS_MAX_DAYS 180/’ -i /etc/login.defs
echo "Passwords may only be changed once a day"
perl -npe ‘s/PASS_MIN_DAYSs+0/PASS_MIN_DAYS 1/g’ -i /etc/login.defs

用sha512保护密码而不用md5

authconfig –passalgo=sha512 –update

3)umask限制
更改umask为077

perl -npe ‘s/umasks+0d2/umask 077/g’ -i /etc/bashrc
perl -npe ‘s/umasks+0d2/umask 077/g’ -i /etc/csh.cshrc

4)Pam修改

touch /var/log/tallylog

cat << ‘EOF’ > /etc/pam.d/system-auth
#%PAM-1.0
# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth required pam_env.so
auth sufficient pam_unix.so nullok try_first_pass
auth requisite pam_succeed_if.so uid >= 500 quiet
auth required pam_deny.so
auth required pam_tally2.so deny=3 onerr=fail unlock_time=60
account required pam_unix.so
account sufficient pam_succeed_if.so uid < 500 quiet
account required pam_permit.so
account required pam_tally2.so per_user
password requisite pam_cracklib.so try_first_pass retry=3 minlen=9 lcredit=-2 ucredit=-2 dcredit=-2 ocredit=-2
password sufficient pam_unix.so sha512 shadow nullok try_first_pass use_authtok remember=10
password required pam_deny.so
session optional pam_keyinit.so revoke
session required pam_limits.so
session [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session required pam_unix.so
EOF

/var/log/tallylog是二进制日志，记录认证失败情况。可以使用pam_tally2 –reset -u username解锁
5)回收闲置用户

echo "Idle users will be removed after 15 minutes"
echo "readonly TMOUT=900" >> /etc/profile.d/os-security.sh
echo "readonly HISTFILE" >> /etc/profile.d/os-security.sh
chmod +x /etc/profile.d/os-security.sh

6)cron和at限制

echo "Locking down Cron"
touch /etc/cron.allow
chmod 600 /etc/cron.allow
awk -F: ‘{print $1}’ /etc/passwd | grep -v root > /etc/cron.deny
echo "Locking down AT"
touch /etc/at.allow
chmod 600 /etc/at.allow
awk -F: ‘{print $1}’ /etc/passwd | grep -v root > /etc/at.deny

删除系统特殊的的用户和组

userdel username
userdel adm
userdel lp
userdel sync
userdel shutdown
userdel halt
userdel news
userdel uucp
userdel operator
userdel games
userdel gopher

以上所删除用户为系统默认创建，但是在常用服务器中基本不使用的一些帐号，但是这些帐号常被黑客利用和攻击服务器。

groupdel username
groupdel adm
groupdel lp
groupdel news
groupdel uucp
groupdel games
groupdel dip

同样，以上删除的是系统安装是默认创建的一些组帐号。这样就减少受攻击的机会。

服务管理

关闭系统不使用的服务

chkconfig level 35 apmd off
chkconfig level 35 netfs off
chkconfig level 35 yppasswdd off
chkconfig level 35 ypserv off
chkconfig level 35 dhcpd off?
chkconfig level 35 portmap off
chkconfig level 35 lpd off
chkconfig level 35 nfs off
chkconfig level 35 sendmail off
chkconfig level 35 snmpd off
chkconfig level 35 rstatd off
chkconfig level 35 atd off??

定期更新系统

yum -y update，可以加入到cron job。

ssh服务安全

使用证书登录系统，具体不详述，请看这篇文章http://devops.webres.wang/2012/02/strengthen-ssh-security-login-with-certificate/

LAMP安全

系统文件权限

修改init目录文件执行权限

chmod -R 700 /etc/init.d/*

修改部分系统文件的SUID和SGID的权限

chmod a-s /usr/bin/chage
chmod a-s /usr/bin/gpasswd
chmod a-s /usr/bin/wall
chmod a-s /usr/bin/chfn
chmod a-s /usr/bin/chsh
chmod a-s /usr/bin/newgrp
chmod a-s /usr/bin/write
chmod a-s /usr/sbin/usernetctl
chmod a-s /usr/sbin/traceroute
chmod a-s /bin/mount
chmod a-s /bin/umount
chmod a-s /bin/ping
chmod a-s /sbin/netreport

修改系统引导文件

chmod 600 /etc/grub.conf
chattr +i /etc/grub.conf

日志管理

1、系统引导日志

dmesg
使用 dmesg 命令可以快速查看最后一次系统引导的引导日志。通常它的内容会很多，所以您往往会希望将其通过管道传输到一个阅读器。

2、系统运行日志

A、Linux 日志存储在 /var/log 目录中。
这里有几个由系统维护的日志文件，但其他服务和程序也可能会把它们的日志放在这里。大多数日志只有 root 才可以读，不过只需要修改文件的访问权限就可以让其他人可读。
以下是常用的系统日志文件名称及其描述：
lastlog 记录用户最后一次成功登录时间
loginlog 不良的登陆尝试记录?
messages 记录输出到系统主控台以及由syslog系统服务程序产生的消息
utmp 记录当前登录的每个用户
utmpx 扩展的utmp
wtmp 记录每一次用户登录和注销的历史信息 wtmpx 扩展的wtmp
vold.log 记录使用外部介质出现的错误
xferkig 记录Ftp的存取情况 sulog 记录su命令的使用情况
acct 记录每个用户使用过的命令
aculog 拨出自动呼叫记录
B、/var/log/messages
messages 日志是核心系统日志文件。它包含了系统启动时的引导消息，以及系统运行时的其他状态消息。IO 错误、网络错误和其他系统错误都会记录到这个文件中。其他信息，比如某个人的身份切换为 root，也在这里列出。如果服务正在运行，比如 DHCP 服务器，您可以在messages 文件中观察它的活动。通常，/var/log/messages 是您在做故障诊断时首先要查看的文件。
C、/var/log/XFree86.0.log
这个日志记录的是 Xfree86 Xwindows 服务器最后一次执行的结果。如果您在启动到图形模式时遇到了问题，一般情况从这个文件中会找到失败的原因。

网络安全

使用TCP_WRAPPERS

使用TCP_WRAPPERS可以使你的系统安全面对外部入侵。最好的策略就是阻止所有
的主机（在”/etc/hosts.deny” 文件中加入”ALL: ALL@ALL, PARANOID” ），然后再在”/etc/hosts.allow” 文件中加入所有允许访问的主机列表。
第一步：
编辑hosts.deny文件（vi /etc/hosts.deny），加入下面这行
# Deny access to everyone.
ALL: ALL@ALL, PARANOID
这表明除非该地址包好在允许访问的主机列表中，否则阻塞所有的服务和地址。
第二步：
编辑hosts.allow文件（vi /etc/hosts.allow），加入允许访问的主机列表，比
如：
ftp: 202.54.15.99 foo.com
202.54.15.99和 foo.com是允许访问ftp服务的ip地址和主机名称。
第三步：
tcpdchk程序是tepd wrapper设置检查程序。它用来检查你的tcp wrapper设置，并报告发现的潜在的和真实的问题。设置完后，运行下面这个命令：
[Root@kapil /]# tcpdchk

iptables防火墙使用

这里不多介绍，请参考：
1、适合Web服务器的iptables规则
2、iptables详细教程

BIND高速缓存DNS服务器配置

配置高速缓存DNS服务器非常的简单，首先当然是安装好bind9了，假设我们的bind安装目录为/usr/local/bind/，我们建立一个主配置文件named.conf。

vi /usr/local/bind/etc/named.conf

写入如下内容：

options {
directory "/usr/local/bind/etc/";
forward only;//所有请求转发到forwarders列表
forwarders { 8.8.8.8;8.8.4.4; };//定义转发请求目的IP
allow-query {any;};//允许所有客户查询
};

这样就完成了高速缓存DNS服务器的配置。