FAQ – 第5页 – Linux系统运维日志

varnish下使用acl限制ip地址访问

第1步：定义ACL，我们使用一个外部文件存储IP地址

acl forbidden {
include "/etc/varnish/chinaip.dat";
}
############chinaip.data#########
"192.168.1.0"/24;
"10.0.0.0"/24;

第2步：在vcl_recv中定义策略，放到最前面。

if (client.ip ~ forbidden) {
error 505 "Forbidden";
}

第3步（可选）：自定义错误页面
#根据不同的错误代码，执行不同的操作
#将错误代码为750的，重定向google，将错误代码为505的，直接返回错误代码。

sub vcl_error {
set obj.http.Content-Type = "text/html; charset=utf-8";
if (obj.status == 750) {
set obj.http.Location = "http://www.google.com/";
set obj.status = 302;
deliver;
}
else {
synthetic {"
<!–?xml version="1.0" encoding="utf-8"?–>
"} obj.status " " obj.response {"
<h1>Error "} obj.status " " obj.response {"</h1>
"} obj.response {"
"};
}
return (deliver);
}

第4步：验证配置是否正确

varnishd -d -f /etc/varnish/my.vcl

第5步：重启varnish

service varnish restart

第6步：测试

忘记说第0步了，就是先备份你的配置文件，很重要。
转自：http://blog.poesylife.com/

NFS常见错误

错误一：Cannot register service: RPC

service nfs restart

Shutting down NFS mountd: [ OK ]

Shutting down NFS daemon: [ OK ]

Shutting down NFS quotas: [ OK ]

Shutting down NFS services: [ OK ]

Starting NFS services: [ OK ]

Starting NFS quotas: Cannot register service: RPC: Unable to receive; errno = Connection refused

rpc.rquotad: unable to register (RQUOTAPROG, RQUOTAVERS, udp).

[FAILED]

#解决方法：

service portmap start

#先启动portmap才行

错误二：Address already in use

tail -f /var/log/message

Apr :: bogon nfsd[]: nfssvc: Setting version failed: errno (Device or resource busy)

Apr :: bogon nfsd[]: nfssvc: unable to bind UPD socket: errno (Address already in use)

Apr :: bogon nfsd[]: nfssvc: Setting version failed: errno (Device or resource busy)

Apr :: bogon nfsd[]: nfssvc: unable to bind UPD socket: errno (Address already in use)

Apr :: bogon nfsd[]: nfssvc: Setting version failed: errno (Device or resource busy)

#解决方法：

ps aux | grep nfs

#然后用kill干掉这些进程

错误三：mount: …:/nfsdata failed, reason given by server: Permission denied

#解决方法：

a.把该客户端的ip加入服务端的/etc/exports

b.服务端的和客户端规则要统一，要么都使用主机名（注意每台机器的hosts文件），要么都使用IP

错误四：客户端挂载超时

tail -f /var/log/message

Apr :: localhost kernel: portmap: server localhost not responding, timed out

Apr :: localhost kernel: RPC: failed to contact portmap (errno -).

Apr :: localhost kernel: lockd_up: makesock failed, error=-

Apr :: localhost kernel: RPC: failed to contact portmap (errno -).

#解决方法：

service portmap restart

service nfs restart

错误五：Error: RPC MTAB does not exist.

service nfs start

Starting NFS services: [ OK ]

Starting NFS quotas: [ OK ]

Starting NFS daemon: [ OK ]

Starting NFS mountd: [ OK ]

Starting RPC idmapd: Error: RPC MTAB does not exist.

#解决方法：

#手动执行

mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs/

#需要时加入开机启动时，加入下面两行到/etc/fstab

rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs defaults

nfsd /proc/fs/nfsd nfsd defaults

You will need re2c 0.13.4 or later if you want to regenerate PHP parsers

编译PHP可能会出现错误：You will need re2c 0.13.4 or later if you want to regenerate PHP parsers，解决方法是安装或升级re2c 0.13.4以上版本。
下面我们用rpm包安装此库。
centos-5 32位：http://pkgs.repoforge.org/re2c/re2c-0.13.5-1.el5.rf.i386.rpm
centos-5 64位：http://pkgs.repoforge.org/re2c/re2c-0.13.5-1.el5.rf.x86_64.rpm
centos-6 32位：http://pkgs.repoforge.org/re2c/re2c-0.13.5-1.el6.rf.i686.rpm
centos-6 64位：http://pkgs.repoforge.org/re2c/re2c-0.13.5-1.el6.rf.x86_64.rpm
根据对应的系统下载好rpm包后，执行rpm -i xxx.rpm安装re2c。

heartbeat配置文件中英对照

ha.cf

#
# There are lots of options in this file. All you have to have is a set
# of nodes listed {“node …} one of {serial, bcast, mcast, or ucast},
# and a value for “auto_failback”.
# 这文件下面有很多的选项，你必须设置的有节点列表集{node …}，{serial,bcast,mcast,或ucast}中的一个，auto_failback的值
#
# ATTENTION: As the configuration file is read line by line,
# THE ORDER OF DIRECTIVE MATTERS!
# 注意：配置文件是逐行读取的，并且选项的顺序是会影响最终结果的。
#
# In particular, make sure that the udpport, serial baud rate
# etc. are set before the heartbeat media are defined!
# debug and log file directives go into effect when they
# are encountered.
# 特别注意，确保udpport,serial baud rate等配置在心跳检测媒体（heartbeat media）前！他们将影响debug和log file指令。
# 也就是是在定义网卡，串口等心跳检测接口前先要定义端口号。
#
# All will be fine if you keep them ordered as in this example.
# 如果你保持他们在此例子中的顺序的话一切都不会有问题。
#
# Note on logging:
# If all of debugfile, logfile and logfacility are not defined,
# logging is the same as use_logd yes. In other case, they are
# respectively effective. if detering the logging to syslog,
# logfacility must be “none”.
# 记录日志方面的注意事项：
# 如果debugfile,logfile和logfacility都没有定义，日志记录就相当于use_logd yes。否则，他们将分别生效。如果要阻止记录日志到syslog，那么logfacility必须设置为“none”
#
# File to write debug messages to
# 写入debug消息的文件
#debugfile /var/log/ha-debug
#
#
# File to write other messages to
# 写入其他消息的文件
#logfile /var/log/ha-log
#
#
# Facility to use for syslog()/logger
# 用于syslog()/logger的设备
logfacility local0
#
#
# A note on specifying “how long” times below…
# 在下面指定多长时间时应该注意
# The default time unit is seconds
# 缺省的时间单位是秒
# 10 means ten seconds
# 10就代表10秒
#
# You can also specify them in milliseconds
# 1500ms means 1.5 seconds
# 你也可以指定他们以毫秒为单位
# 1500ms表示 1.5秒
#
# keepalive: how long between heartbeats?
# keepalive: 在heartbeat之间连接保持多久
#keepalive 2
#
# deadtime: how long-to-declare-host-dead?
# deadtime：
# If you set this too low you will get the problematic
# split-brain (or cluster partition) problem.
# See the FAQ for how to use warntime to tune deadtime.
# 如果这个时间值设置得太低可能会导致出现很难判断的问题，如何使用warntime来调节deadtime请查看FAQ。
#
#deadtime 30
#
# warntime: how long before issuing “late heartbeat” warning?
# See the FAQ for how to use warntime to tune deadtime.
#
#warntime 10
#
#
# Very first dead time (initdead)
#
# On some machines/OSes, etc. the network takes a while to come up
# and start working right after you’ve been rebooted. As a result
# we have a separate dead time for when things first come up.
# It should be at least twice the normal dead time.
# 在某些机器/操作系统等中，网络在机器重启后需要花一定的时间启动并正常工作。因此我们必须分开他们初次起来的dead time，这个值应该最少设置为两倍的正常dead time。
#
#initdead 120
#
#
# What UDP port to use for bcast/ucast communication?
# 用于bacst/ucast通讯的UDP端口
#
#udpport 694
#
# Baud rate for serial ports…
# 串口的波特率
#baud 19200
#
# serial serialportname …
# serial 串口名称
#serial /dev/ttyS0 # Linux
#serial /dev/cuaa0 # FreeBSD
#serial /dev/cuad0 # FreeBSD 6.x
#serial /dev/cua/a # Solaris
#
#
# What interfaces to broadcast heartbeats over?
# 广播heartbeats的接口
#
#bcast eth0 # Linux
#bcast eth1 eth2 # Linux
#bcast le0 # Solaris
#bcast le1 le2 # Solaris
#
# Set up a multicast heartbeat medium
# 设置一个多播心跳介质
# mcast [dev] [mcast group] [port] [ttl] [loop]
#
# [dev] device to send/rcv heartbeats on 发送/接收heartbeats的设备
# [mcast group] multicast group to join (class D multicast address 224.0.0.0 – 239.255.255.255) 加入到的多播组（D类多播地址224.0.0.0 – 239.255.255.255）
# [port] udp port to sendto/rcvfrom udp(set this value to the same value as “udpport” above) 端口用于发送/接收udp（设置这个值跟上面的udpport为相同值）
# [ttl] the ttl value for outbound heartbeats. this effects how far the multicast packet will propagate. (0-255) Must be greater than zero.
# 外流的heartbeats的ttl值。这个影响多播包能传播多远。（0-255）必须要大于0 。
# [loop] toggles loopback for outbound multicast heartbeats.if enabled, an outbound packet will be looped back and received by the interface it was sent # on. (0 or 1) Set this value to zero.
# 为多播heartbeat开关loopback。如果enabled，一个外流的包将被回环到原处并由发送它的接口接收。（0或者1）设置这个值为0。
#
#mcast eth0 225.0.0.1 694 1 0
#
# Set up a unicast / udp heartbeat medium
# 配置一个unicast / udp heartbeat 介质
# ucast [dev] [peer-ip-addr]
#
# [dev] device to send/rcv heartbeats on 用于发送/接收heartbeat的设备
# [peer-ip-addr] IP address of peer to send packets to 包被发送到的对等的IP地址
#
#ucast eth0 192.168.1.2
#
#
# About boolean values…
# 关于boolean值
# Any of the following case-insensitive values will work for true:
# 下面的非大小写敏感的值将认为是true：
# true, on, yes, y, 1
# Any of the following case-insensitive values will work for false:
# 下面的非大小写敏感的值将认为是false：
# false, off, no, n, 0
#
#
#
# auto_failback: determines whether a resource will
# automatically fail back to its “primary” node, or remain
# on whatever node is serving it until that node fails, or
# an administrator intervenes.
# auto_failback: 决定一个resource是否自动恢复到它的primary节点，或者不管什么节点，都继续运行在上面直到节点出现故障或管# 理员进行干预。
#
#
# The possible values for auto_failback are:
# auto_failback 的可能值有：
# on – enable automatic failbacks
# on – 允许自动failbacks
# off – disable automatic failbacks
# off – 禁止自动failbacks
# legacy – enable automatic failbacks in systems where all nodes do not yet support the auto_failback option.
# legacy – 在所有节点都还不支持auto_failback的选项中允许自动failbacks
# auto_failback “on” and “off” are backwards compatible with the old “nice_failback on” setting.
# auto_failback “on”和”off”向后兼容旧的”nice_failback on”设置。
#
# See the FAQ for information on how to convert from “legacy” to “on” without a flash cut.
# (i.e., using a “rolling upgrade” process)
# 查看FAQ获取如何从”legacy”转为到”on”并不会闪断的信息。
#
#
# The default value for auto_failback is “legacy”, which
# will issue a warning at startup. So, make sure you put
# an auto_failback directive in your ha.cf file.
# (note: auto_failback can be any boolean or “legacy”)
# 缺省的auto_failback值是“legacy”，它在启动的时候会发送一个警告。因此，确保你在ha.cf文件中配置了auto_failback指令。
#
auto_failback on
#
#
# Basic STONITH support
# Using this directive assumes that there is one stonith
# device in the cluster. Parameters to this device are
# read from a configuration file. The format of this line is:
# 基本上STONITH支持
# 使用这个指令假设有一个stonith设备在集群中。这个设备的参数从一个配置文件中读取，这行的格式是：
#
# stonith
#
# NOTE: it is up to you to maintain this file on each node in the
# cluster!
# 注意：在集群中的每个节点上的这个文件都靠你去维护。
#
#stonith baytech /etc/ha.d/conf/stonith.baytech
#
# STONITH support
# You can configure multiple stonith devices using this directive.
# 你可以使用这个指令配置多个stonith设备：
# The format of the line is:
# 这行的格式是：
# stonith_host #
# is the machine the stonith device is attached to or * to mean it is accessible from any host.
# 表示stonith设备联结到的机器或者用*来表示从任何主机都可以访问。
# is the type of stonith device (a list of supported drives is in /usr/lib/stonith.)
# 是stonith设备的类型（支持的设备的列表在/usr/lib/stonith中）
# are driver specific parameters. To see the format for a particular device, run:
# 是驱动指定的参数，要查看特定设备的格式，运行：
# stonith -l -t
#
#
# Note that if you put your stonith device access information in
# here, and you make this file publically readable, you’re asking
# for a denial of service attack
# 需要注意如果你将你的stonith设备的访问信息放在这里，并且你让这个文件开放读权限，那么你是在召唤一个DoS攻击。
#
# To get a list of supported stonith devices, run
# 要得到支持的stonith设备的列表，运行
# stonith -L
#
# For detailed information on which stonith devices are supported
# and their detailed configuration options, run this command:
# 要哪个stonith设备是支持的详细信息和它们详细的配置选项，运行这个命令：
# stonith -h
#
#stonith_host * baytech 10.0.0.3 mylogin mysecretpassword
#stonith_host ken3 rps10 /dev/ttyS1 kathy 0
#stonith_host kathy rps10 /dev/ttyS1 ken3 0
#
# Watchdog is the watchdog timer. If our own heart doesn’t beat for
# a minute, then our machine will reboot.
# Watchdog是一个watchdog计时器，如果我们的心超过一分钟不跳，我们的机器将会reboot。
#
# NOTE: If you are using the software watchdog, you very likely
# wish to load the module with the parameter “nowayout=0″ or
# compile it without CONFIG_WATCHDOG_NOWAYOUT set. Otherwise even
# an orderly shutdown of heartbeat will trigger a reboot, which is
# very likely NOT what you want.
# 注意：如果你使用软件watchdog，你很可能希望用参数“nowayout=0”来加载这个模块或编译它的时候去掉
# CONFIG_WATCHDOG_NOWAYOUT设置。否则，即使一个有序的关闭heartbeat也会触发重启，这很可能不是你想要的。
#
#watchdog /dev/watchdog
#
# Tell what machines are in the cluster
# 说明说明机器在这个集群里面
# node nodename … — must match uname -n
# node nodename … –必须要匹配uname -n
#node ken3
#node kathy
#
# Less common options…
# 非常用的选项
# Treats 10.10.10.254 as a psuedo-cluster-member
# Used together with ipfail below…
# note: don’t use a cluster node as ping node
# 将10.10.10.254看成一个伪集群成员，与下面的ipfail一起使用。
# 注意：不要使用一个集群节点作为ping节点
#
#ping 10.10.10.254
#
# Treats 10.10.10.254 and 10.10.10.253 as a psuedo-cluster-member
# called group1. If either 10.10.10.254 or 10.10.10.253 are up
# then group1 is up
# Used together with ipfail below…
# 将10.10.10.254和10.10.10.254看成一个叫group1的伪集群成员。如果10.10.10.254或10.10.10.253是up的，那么group1为up
# 与下面的ipfail一起使用。
#
#ping_group group1 10.10.10.254 10.10.10.253
#
# HBA ping derective for Fiber Channel
# Treats fc-card-name as psudo-cluster-member
# used with ipfail below …
# 用于Fiber Channel的HBA ping指令，将fc-card-name看成是伪集群成员，与下面的ipfail一起使用。
#
# You can obtain HBAAPI from http://hbaapi.sourceforge.net. You need
# to get the library specific to your HBA directly from the vender
# To install HBAAPI stuff, all You need to do is to compile the common
# part you obtained from the sourceforge. This will produce libHBAAPI.so
# which you need to copy to /usr/lib. You need also copy hbaapi.h to
# /usr/include.
# 你可以从http://hbaapi.sourceforge.net获取HBAAPI，你需要从vender获得用于你的HBA指令的特定的库来安装HBAAPI。
# 你所需要做的是编译你从sourceforge获得的通用部分，它会生成libHBAAPI.so，然后你要将它拷贝到/usr/lib目录。同时
# 你也要吧hbaapi.h拷贝到/usr/include 。
#
# The fc-card-name is the name obtained from the hbaapitest program
# that is part of the hbaapi package. Running hbaapitest will produce
# a verbose output. One of the first line is similar to:
# Apapter number 0 is named: qlogic-qla2200-0
# Here fc-card-name is qlogic-qla2200-0.
# fc-card-name是从hbaapitest程序获取的名字，它是hbaapi包的一部分。运行hbaapitest将生成一个冗长的输出，其中第一行类似：
# Apapter number 0 is named: qlogic-qla2200-0
# 在这里fc-card-name是qlogic-qla2200-0
#
#hbaping fc-card-name
#
#
# Processes started and stopped with heartbeat. Restarted unless
# they exit with rc=100
# 与heartbeat一起启动和停止的进程。重启，除非它们的以rc=100退出。
#
#respawn userid /path/name/to/run
#respawn hacluster /usr/lib/heartbeat/ipfail
#
# Access control for client api
# default is no access
# 用于客户端api的访问控制，缺省为不可访问。
#
#apiauth client-name gid=gidlist uid=uidlist
#apiauth ipfail gid=haclient uid=hacluster
###########################
#
# Unusual options.
# 非常选项
###########################
#
# hopfudge maximum hop count minus number of nodes in config
#hopfudge 1
#
# deadping – dead time for ping nodes 上面设置的用来ping的节点的死亡时间
#deadping 30
#
# hbgenmethod – Heartbeat generation number creation method，Normally these are stored on disk and incremented as needed.
# hbgenmethod – Heartbeat产生数字的生产方法。通常执行存储在磁盘上并在需要时进行增量。
#
#hbgenmethod time
#
# realtime – enable/disable realtime execution (high priority, etc.) defaults to on
# realtime – 允许/禁止实时执行（高优先级）缺省为on
#realtime off
#
# debug – set debug level .defaults to zero
# debug – 设置debug等级，缺省为0
#debug 1
#
# API Authentication – replaces the fifo-permissions-based system of the past
# APT认证 – 代替以前的fifo-permission-base系统
#
# You can put a uid list and/or a gid list.If you put both, then a process is authorized if it qualifies under either the uid list, or under the gid list.
# 可以放上一个uid列表和/或gid列表。如果两个都放，那么符合uid列表或gid列表中的进程都将通过验证
#
#
# The groupname “default” has special meaning. If it is specified, then
# this will be used for authorizing groupless clients, and any client groups
# not otherwise specified.
# 组名“default”有特定的意思。如果它被指定，那么它将用于验证无组的客户端和任何没有另外指定的客户组
#
# There is a subtle exception to this. “default” will never be used in the
# following cases (actual default auth directives noted in brackets)
# 这是一个复杂的表达式，“default”将从不用于下面的情况（现实中缺省的验证指令记录在括号中）
# ipfail (uid=HA_CCMUSER)
# ccm (uid=HA_CCMUSER)
# ping (gid=HA_APIGROUP)
# cl_status (gid=HA_APIGROUP)
#
# This is done to avoid creating a gaping security hole and matches the most likely desired configuration.
# 它避免生成一个安全漏洞缺口并匹配到了可能很多人最渴望的配置。
#
#apiauth ipfail uid=hacluster
#apiauth ccm uid=hacluster
#apiauth cms uid=hacluster
#apiauth ping gid=haclient uid=alanr,root
#apiauth default gid=haclient
# message format in the wire, it can be classic or netstring,
# default: classic
# 网线中的信息格式，可以是classic或netstring
#
#msgfmt classic/netstring
#
# Do we use logging daemon?
# If logging daemon is used, logfile/debugfile/logfacility in this file
# are not meaningful any longer. You should check the config file for logging
# daemon (the default is /etc/logd.cf)
# more infomartion can be fould in http://www.linux-ha.org/ha_2ecf_2fUseLogdDirective
# Setting use_logd to “yes” is recommended
# 我们是否使用记录监控？
# 如果使用了记录监控，此文件里面的logfile/debugfile/logfacility将不再有意义。你应该检查在配置文件中是否有记录监控（缺省为/etc/logd.cf）
# 更多的信息可以在http://www.linux-ha.org/ha_2ecf_2fUseLogdDirective中找到。推荐配置use_logd为yes。
#
# use_logd yes/no
#
# the interval we reconnect to logging daemon if the previous connection failed
# default: 60 seconds
# 如果前一个连接失败了，我们再次连接到记录监控器的间隔。
#conn_logd_time 60
#
#
# Configure compression module
# It could be zlib or bz2, depending on whether u have the corresponding
# library in the system.
# 配置压缩模块
# 它可以为zlib或bz2，基于我们的系统中是否有相应的库。
#
#compression bz2
#
# Confiugre compression threshold
# This value determines the threshold to compress a message,
# e.g. if the threshold is 1, then any message with size greater than 1 KB
# will be compressed, the default is 2 (KB)
# 配置压缩的限度
# 这个值决定压缩一个信息的限度，例如：如果限度为1，那么任何大于1KB的消息都会被压缩，缺省为2（KB）
#compression_threshold 2

haresources

#
# This is a list of resources that move from machine to machine as
# nodes go down and come up in the cluster. Do not include
# “administrative” or fixed IP addresses in this file.
# 这是当集群中的节点拓机和启动时从一台机器转移到另一台机器的resources列表，不要包含管理或已用IP地址在这个文件中。
#
#
# The haresources files MUST BE IDENTICAL on all nodes of the cluster.
# 此haresources文件在所有的集群节点中都必须相同
# The node names listed in front of the resource group information
# is the name of the preferred node to run the service. It is
# not necessarily the name of the current machine. If you are running
# auto_failback ON (or legacy), then these services will be started
# up on the preferred nodes – any time they’re up.
# 列在resource组信息前的节点名称是优先运行服务的节点名称，它不需要是当前机器的名称，如果你运行auto_failback on(或者
# legacy)，那么这些服务将会在优先节点启动，只要它们是运行的。
#
# If you are running with auto_failback OFF, then the node information
# will be used in the case of a simultaneous start-up, or when using
# the hb_standby {foreign,local} command.
# 如果你运行auto_failback off，那么节点信息将使用在同时启动的情况，或当使用hb_standby {foreign,local}命令时。
#
# BUT FOR ALL OF THESE CASES, the haresources files MUST BE IDENTICAL.
# If your files are different then almost certainly something
# won’t work right.
# 但是对于所有的这些情况，此haresources文件都必须相同。如果你的文件不同那么肯定有某些东西将不能正常工作。
#
#
#
# We refer to this file when we’re coming up, and when a machine is being
# taken over after going down.
# 我们在起动的时候和一个机器停机后被接管的时候参考这个文件。
#
# You need to make this right for your installation, then install it in
# /etc/ha.d
# 你必须让它符合你的安装，然后安装它到/etc/ha.d目录。
#
# Each logical line in the file constitutes a “resource group”.
# A resource group is a list of resources which move together from
# one node to another – in the order listed. It is assumed that there
# is no relationship between different resource groups. These
# resource in a resource group are started left-to-right, and stopped
# right-to-left. Long lists of resources can be continued from line
# to line by ending the lines with backslashes (“”).
# 在文件里面的每个逻辑行组成一个“resource group”。一个resource group就是从一个节点移动到另一个的resources的列表。
# 可以假设不同的resource groups之间是没有关系的。resource group的resource启动时是从左到右的。关闭时是从右到左的。
# 长的resources列表可以以反斜杠（“”）结尾来续行。
#
# These resources in this file are either IP addresses, or the name
# of scripts to run to “start” or “stop” the given resource.
# 在这个文件里面的resources可以是IP地址，也可以是用于“start”或“stop”给定的resource的脚本名称
#
# The format is like this:
#
#node-name resource1 resource2 … resourceN
#
#
# If the resource name contains an :: in the middle of it, the
# part after the :: is passed to the resource script as an argument.
# Multiple arguments are separated by the :: delimeter
# 如果resource的名称包含一个::在它的中间，在::后面的部分会传递给resource的脚本中作为一个参数，多个参数会以::分割。
#
# In the case of IP addresses, the resource script name IPaddr is implied.
# 在IP地址的情况中，resource脚本名称IPaddr是隐含的。
#
# For example, the IP address 135.9.8.7 could also be represented
# as IPaddr::135.9.8.7
# 例如：IP地址135.9.8.7也可以被表现为IPaddr::135.9.8.7
#
# THIS IS IMPORTANT!! vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#
# The given IP address is directed to an interface which has a route
# to the given address. This means you have to have a net route
# set up outside of the High-Availability structure. We don’t set it
# up here — we key off of it.
# 给定的IP地址会直接连到有路由到给定的地址的接口上，这也就意味着你必须要在 High-Availability 外部配置一个网络路由。我们不在这里配置，我们切断它。
#
# The broadcast address for the IP alias that is created to support
# an IP address defaults to the highest address on the subnet.
# IP别名的广播地址将被缺省创建为支持IP地址的子网里的最高地址
#
# The netmask for the IP alias that is created defaults to the same
# netmask as the route that it selected in in the step above.
# IP别名的子网掩码将被缺省创建为与上面选择的路由相同的子网掩码
#
# The base interface for the IPalias that is created defaults to the
# same netmask as the route that it selected in the step above.
# IP别名的基础接口将被缺省创建为与上面选择的路由相同的子网掩码
#
# If you want to specify that this IP address is to be brought up
# on a subnet with a netmask of 255.255.255.0, you would specify
# this as IPaddr::135.9.8.7/24 .
# 如果你想要指定某个IP地址用指定的子网掩码来启动，那么像这样指定它 IPaddr::135.9.8.7/24
#
# If you wished to tell it that the broadcast address for this subnet
# was 135.9.8.210, then you would specify that this way:
# IPaddr::135.9.8.7/24/135.9.8.210
# 如果你想要指明这个子网的广播地址为135.9.8.210，那么可以像这样指定 IPaddr::135.9.8.7/24/135.9.8.210
#
# If you wished to tell it that the interface to add the address to
# is eth0, then you would need to specify it this way:
# IPaddr::135.9.8.7/24/eth0
# 如果你希望指明要增加地址的接口是eth0，那么你需要像这样指定 IPaddr::135.9.8.7/24/eth0
#
# And this way to specify both the broadcast address and the
# interface:
# IPaddr::135.9.8.7/24/eth0/135.9.8.210
# 同时指定广播地址和接口的方法为：
# IPaddr::135.9.8.7/24/eth0/135.9.8.210
#
# The IP addresses you list in this file are called “service” addresses,
# since they’re the publicly advertised addresses that clients
# use to get at highly available services.
# 列表在这个文件中的IP地址叫做服务地址，它们是客户端用于获取高可用服务的公共通告地址
#
# For a hot/standby (non load-sharing) 2-node system with only a single service address,
# you will probably only put one system name and one IP address in here.
# The name you give the address to is the name of the default “hot”
# system.
# 对于一个hot/standby（非共享负载）单服务地址的双节点系统，你可能只需要放置一个系统名称和一个IP地址在这里。你给定的地址对应的名字就是缺省的hot系统的名字。
#
# Where the nodename is the name of the node which “normally” owns the
# resource. If this machine is up, it will always have the resource
# it is shown as owning.
# 节点名称就是正常情况下拥有resource的节点的名称。如果此机器是up的，他将一直拥有以拥有显示的resource。
#
# The string you put in for nodename must match the uname -n name
# of your machine. Depending on how you have it administered, it could
# be a short name or a FQDN.
# 设置作为节点名称的字符串必须匹配在机器上使用uname -n获得的名字。基于你如果进行管理，它可能是一个缩写名称或一个FQDN。
#
#——————————————————————-
#
# Simple case: One service address, default subnet and netmask
# No servers that go up and down with the IP address
# 简单情况：一个服务地址，缺省子网和掩码，没有服务与IP地址一起启动和关闭
#
#just.linux-ha.org 135.9.216.110
#
#——————————————————————-
#
# Assuming the adminstrative addresses are on the same subnet…
# A little more complex case: One service address, default subnet
# and netmask, and you want to start and stop http when you get
# the IP address…
# 假定管理地址在相同的子网…
# 稍微复杂一些的情况：一个服务地址，缺省子网和子网掩码，同时你要在获得IP地址的时候启动和停止http。
#
#just.linux-ha.org 135.9.216.110 http
#——————————————————————-
#
# A little more complex case: Three service addresses, default subnet
# and netmask, and you want to start and stop http when you get
# the IP address…
# 稍微复杂一些的情况：三个服务地址，缺省子网和掩码，同时你要在获得IP地址的时候启动和停止http。
#
#just.linux-ha.org 135.9.216.110 135.9.215.111 135.9.216.112 httpd
#——————————————————————-
#
# One service address, with the subnet, interface and bcast addr
# explicitly defined.
# 一个服务地址，显式指定子网，接口，广播地址
#
#just.linux-ha.org 135.9.216.3/28/eth0/135.9.216.12 httpd
#
#——————————————————————-
#
# An example where a shared filesystem is to be used.
# Note that multiple aguments are passed to this script using
# the delimiter ‘::’ to separate each argument.
# 一个使用共享文件系统的例子
# 需要注意用’::’分隔的多个参数被传递到了这个脚本
#
#node1 10.0.0.170 Filesystem::/dev/sda1::/data1::ext2
#
# Regarding the node-names in this file:
# 关于这个文件中的节点名称：
# They must match the names of the nodes listed in ha.cf, which in turn
# must match the `uname -n` of some node in the cluster. So they aren’t
# virtual in any sense of the word.
# 它们必须匹配在ha.cf中列出的节点名称，依次必须匹配集群中的某些节点’unmae -n’的结果。所以它们不是对于词的虚假感觉。
#

authkeys

#
# Authentication file. Must be mode 600
# 验证文件。模式必须为600
#
# Must have exactly one auth directive at the front.
# auth send authentication using this method-id
# 必须有且只有一个auth指令在前面
# auth method-id 使用这个方法id发送验证
#
# Then, list the method and key that go with that method-id
# 然后列出方法和该方法的密钥
#
# Available methods: crc sha1, md5. Crc doesn’t need/want a key.
# 可用的模块：crc、sha1、md5。其中crc不需要一个密钥。
#
# You normally only have one authentication method-id listed in this file
# 通常只放置一个验证方法id在这个文件中
#
# Put more than one to make a smooth transition when changing auth
# methods and/or keys.
# 可以放置多于一个来使得进行验证方法和/或密钥更改的过渡变得平滑
#
#
# sha1 is believed to be the “best”, md5 next best.
# sha1被认为是最好的，md5第二。
#
# crc adds no security, except from packet corruption.
# Use only on physically secure networks.
# 除了防止包格式改变，crc不加安全保护。只能使用在物理上的安全网络。
#
#auth 1
#1 crc
#2 sha1 HI!
#3 md5 Hello!
转自：HA配置文件中英对照之ha.cf
HA配置文件中英对照之haresources
HA配置文件中英对照之authkeys

configure.ac:63: require Automake 1.10.1, but have 1.9.6

安装Resource Agents的时候出现错误:configure.ac:63: require Automake 1.10.1, but have 1.9.6。解决方法：

wget http://ftp.gnu.org/gnu/automake/automake-1.11.2.tar.gz
tar xzf automake-1.11.2.tar.gz
cd automake-1.11.2
./configure
make && make install

configure.ac:9: error: Autoconf version 2.63 or higher is required

安装Resource Agents的时候出现错误：configure.ac:9: error: Autoconf version 2.63 or higher is required。指的是autoconf版本低，需要安装高版本的。

wget http://ftp.gnu.org/gnu/autoconf/autoconf-2.68.tar.gz
tar xzf autoconf-2.68.tar.gz
cd autoconf-2.68
./configure
make && make install

vmware设置centos虚拟机nat联网

今天在vmware虚拟主机中安装hearbeat，为了使用最新的版本，选用编译安装了。在编译过程中，需要连接被墙的网站下载文件，那只能用vpn，但我使用的是桥接方式联网，使用不了真实主机的vpn，于是改用nat联网，设置过程中遇到一些问题，现记录如何设置。
真实主机设置：
本人安装的是vmware 8.0.1英文版。
1、首先检查VM NAT的设置。
打开VM，在菜单中打开Edit->Virtual Network Editor，在弹出的窗口选择VMnet8，检查是否启用了DHCP和设置子网地址和子网掩码，如图：

2、设置虚拟机的联网方式为NAT。
3、设置真实主机VMware Network Adapter VMnet8网卡为自动获取ip和自动获取dns。
4、检查真实主机的VMware DHCP Service 和VMware NAT Service两个服务是否启动。
CentOS虚拟主机设置:
首先在真实主机中获取VMware Network Adapter VMnet8网卡的信息，如windows在cmd下执行ipconfig，如图：

根据图我们知道，ip地址为192.168.79.1，掩码为255.255.255.0，所以我们设置虚拟机的网关为192.168.79.2，子掩码255.255.255.0。
设置虚拟机网卡：

vi /etc/sysconfig/network-script/ifcfg-eth0

设置为：

BOOTPROTO=dhcp
GATEWAY=192.168.79.2
NETMASK=255.255.255.0
ONBOOT=yes

之后重启网卡：

service network restart

配置DRBD出现错误总结

Q1.’ha’ ignored, since this host (node2.webres.wang) is not mentioned with an ‘on’ keyword.？

Error Meaage:

执行指令 drbdadm create-md ha 时出现如下错误信息

‘ha’ ignored, since this host (node2.webres.wang) is not mentioned with an ‘on’ keyword.
Ans:

因为在 drbd 设定 drbd.conf 中 on 本来写的是 node1、node2 而以，将node1和node2分别改为node1.webres.wang,node2.webres.wang。

Q2.drbdadm create-md ha: exited with coolcode 20？

Error Meaage:

执行指令 drbdadm create-md ha 时出现如下错误信息

open(/dev/hdb1) failed: No such file or directory
Command ‘drbdmeta 0 v08 /dev/hdb1 internal create-md’ terminated with exit coolcode 20
drbdadm create-md ha: exited with coolcode 20

Ans:

因为忘了执行 fdisk /dev/hdb 指令建立分割区所造成，如下将 /dev/hdb 建立分割区后指令即可正常执行

#fdisk /dev/hdb //准备为 hdb 建立分割区
The number of cylinders for this disk is set to 20805.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): n //键入 n 表示要建立分割区
Command action
e extended
p primary partition (1-4)
p //键入 p 表示建立主要分割区
Partition number (1-4): 1 //键入 1 为此主要分割区代号
First cylinder (1-20805, default 1): //开始磁柱值，按下 enter 即可
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-20805, default 20805): //结束磁柱值，按下 enter 即可
Using default value 20805
Command (m for help): w //键入 w 表示确定执行刚才设定
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
[root@node1 yum.repos.d]# partprobe //使刚才的 partition table 变更生效
Q3.drbdadm create-md ha: exited with coolcode 40？

Error Meaage:

执行指令 drbdadm create-md ha 时出现如下错误信息

Device size would be truncated, which
would corrupt data and result in
‘access beyond end of device’ errors.
You need to either
* use external meta data (recommended)
* shrink that filesystem first
* zero out the device (destroy the filesystem)
Operation refused.
Command ‘drbdmeta 0 v08 /dev/hdb1 internal create-md’ terminated with exit coolcode 40
drbdadm create-md ha: exited with coolcode 40

Ans:

使用 dd 指令将一些资料塞到 /dev/hdb 后再执行 drbdadm create-md ha 指令即可顺利执行

#dd if=/dev/zero of=/dev/hdb1 bs=1M count=100
Q4.DRBD 状态始终是 Secondary/Unknown？

Error Meaage:

Node1、Node2 主机启动 DRBD 后状态始终是 Secondary/Unknown

#service drbd status
drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
m:res cs ro ds p mounted fstype
0:ha WFConnection Secondary/Unknown Inconsistent/DUnknown C
Ans:

1、Node1、Node2 没有打开相对应的 Port，请开启相对应的 Port 或先把 IPTables 服务关闭即可。
2、可能发生了脑裂行为，一般出现在ha切换时，解决方法：
在一节点执行:
drbdadm secondary resource
drbdadm connect –discard-my-data resource
另一节点执行：
drbdadm connect resource
Q5.1: Failure: (104) Can not open backing device

Error Meaage:
执行drbdadm up r0时出现：

1: Failure: (104) Can not open backing device.
Command ‘drbdsetup attach 1 /dev/sdb1 /dev/sdb1 internal’ terminated with exit code 10

Ans:
可能因为你挂载了/dev/sdb1,执行umount /dev/sdb1即可。

xsltproc: Command not found

编译drbd出现xsltproc: Command not found错误。
解决方法：

yum install libxslt

pils.c:245: error: initialization from incompatible pointer type

编译cluster glue时出现：

cc1: warnings being treated as errors
pils.c:244: error: initialization from incompatible pointer type
pils.c:245: error: initialization from incompatible pointer type
make[2]: *** [pils.lo] Error 1
make[2]: se sale del directorio
`/usr/src/Heartbeat-STABLE-2-1-STABLE-2.1.4/lib/pils’
make[1]: *** [all-recursive] Error 1
make[1]: se sale del directorio
`/usr/src/Heartbeat-STABLE-2-1-STABLE-2.1.4/lib’
make: *** [all-recursive] Error 1

错误。
解决方法：
打开文件lib/pils/Makefile，删除文件里的-Werror字符。