etcd分布式键值存储服务的备份与恢复

  • etcd 是一款开源的分布式一致性键值存储,由 CoreOS 公司进行维护,详细的介绍请参考官方文档。

  • etcd 目前最新的版本的 v3.1.1,但它的 API 又有 v3 和 v2 之分,社区通常所说的 v3 与 v2 都是指 API 的版本号。从 etcd 2.3 版本开始推出了一个实验性的全新 v3 版本 API 的实现,v2 与 v3 API 使用了不同的存储引擎,所以客户端命令也完全不同。

# etcdctl --version
etcdctl version: 3.0.4
API version: 2
  • 官方指出 etcd v2 和 v3 的数据不能混合存放,support backup of v2 and v3 stores 。

  • 特别提醒:

    • 若使用 v3 备份数据时存在 v2 的数据则不影响恢复
    • 若使用 v2 备份数据时存在 v3 的数据则恢复失败

对于 API 2 备份与恢复方法

官方 v2 admin guide (https://github.com/coreos/etcd/blob/master/Documentation/v2/admin_guide.md#disaster-recovery)

etcd的数据默认会存放在我们的命令工作目录中,我们发现数据所在的目录,会被分为两个文件夹中:

  • snap: 存放快照数据,etcd防止WAL文件过多而设置的快照,存储etcd数据状态。
  • wal: 存放预写式日志,最大的作用是记录了整个数据变化的全部历程。在etcd中,所有数据的修改在提交前,都要先写入到WAL中。
# etcdctl backup --data-dir /home/etcd/ --backup-dir /home/etcd_backup

# etcd -data-dir=/home/etcd_backup/  -force-new-cluster

恢复时会覆盖 snapshot 的元数据(member ID 和 cluster ID),所以需要启动一个新的集群。

对于 API 3 备份与恢复方法

官方 v3 admin guide (https://github.com/coreos/etcd/blob/master/Documentation/op-guide/recovery.md)

在使用 API 3 时需要使用环境变量 ETCDCTL_API 明确指定。
在命令行设置:

# export ETCDCTL_API=3

备份数据:

# etcdctl --endpoints localhost:2379 snapshot save snapshot.db

恢复:

# etcdctl snapshot restore snapshot.db --name m3 --data-dir=/home/etcd_data

恢复后的文件需要修改权限为 etcd:etcd

  • –name:重新指定一个数据目录,可以不指定,默认为 default.etcd
  • –data-dir:指定数据目录

建议使用时不指定 name 但指定 data-dir,并将 data-dir 对应于 etcd 服务中配置的 data-dir

etcd 集群都是至少 3 台机器,官方也说明了集群容错为 (N-1)/2,所以备份数据一般都是用不到,但是鉴上次 gitlab 出现的问题,对于备份数据也要非常重视。

一步步手工搭建etcd集群

目标

搭建一套三节点的 etcd 高可用集群

资源准备

  • 准备三台 Linux 服务器
192.168.9.1

192.168.9.2

192.168.9.3
  • 下载 etcd-v3.2.5

  • etcd:由于 raft 算法的特性,集群的节点数必须是奇数

服务器初始化

  • 三台服务器分别创建 etcd 用户并指定 gid 为:etcd

etcd 集群配置

  • 分别在三台服务器创建 conf、data、bin 目录:
etcd@XXXX$ mkdir -p /home/etcd/{conf,data,bin}
  • 将 etcd-v3.2.5-linux-amd64.tar.gz 分别上传至三台服务器中,解压后将 etcdctl、etcd 复制到 /home/etcd/bin 目录下,并将 /home/etcd/bin 目录配置到系统环境变量下

  • 三台服务器分别编辑 systemd 启动文件:

使用 root 用户编辑: vi /usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
WorkingDirectory=/home/etcd/data
EnvironmentFile=-/home/etcd/conf/etcd.conf
User=etcd
ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /home/etcd/bin/etcd 
  --name ${ETCD_NAME} 
  --initial-advertise-peer-urls ${ETCD_INITIAL_ADVERTISE_PEER_URLS} 
  --listen-peer-urls ${ETCD_LISTEN_PEER_URLS} 
  --listen-client-urls ${ETCD_LISTEN_CLIENT_URLS},http://127.0.0.1:2379 
  --advertise-client-urls ${ETCD_ADVERTISE_CLIENT_URLS} 
  --initial-cluster-token ${ETCD_INITIAL_CLUSTER_TOKEN} 
  --initial-cluster ${ETCD_CLUSTER_ADDRESS} 
  --initial-cluster-state new 
  --data-dir=${ETCD_DATA_DIR}"
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
  • 通过 etcd 用户编辑 etcd 的启动参数文件:
vi /home/etcd/conf/etcd.conf
  • 规划三个节点 etcd name 分别为:etcd1、etcd2、etcd3

  • 下列配置文件内容中包含中文处需要根据当前服务器实际信息进行修改,ETCD_CLUSTER_ADDRESS 的值也要根据自己实际的 IP 就行修改,切记请勿直接 copy 后就立即使用

# [member]
ETCD_NAME=当前节点的etcd name,例如:etcd1
ETCD_DATA_DIR="/home/etcd/data"
ETCD_LISTEN_PEER_URLS="http://当前服务器IP:2380"
ETCD_LISTEN_CLIENT_URLS="http://当前服务器IP:2379"
#ETCD_WAL_DIR=""
#ETCD_SNAPSHOT_COUNT="10000"
#ETCD_HEARTBEAT_INTERVAL="100"
#ETCD_ELECTION_TIMEOUT="1000"
#ETCD_MAX_SNAPSHOTS="5"
#ETCD_MAX_WALS="5"
#ETCD_CORS=""

#[cluster]
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://当前服务器IP:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_ADVERTISE_CLIENT_URLS="http://当前服务器IP:2379"
ETCD_CLUSTER_ADDRESS="etcd1=http://192.168.9.1:2380,etcd2=http://192.168.9.2:2380,etcd3=http://192.168.9.3:2380"

#ETCD_INITIAL_ADVERTISE_PEER_URLS="http://localhost:2380"
# if you use different ETCD_NAME (e.g. test), set ETCD_INITIAL_CLUSTER value for this name, i.e. "test=http://..."
#ETCD_INITIAL_CLUSTER="default=http://localhost:2380"
#ETCD_INITIAL_CLUSTER_STATE="new"
#ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
#ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379"
#ETCD_DISCOVERY=""
#ETCD_DISCOVERY_SRV=""
#ETCD_DISCOVERY_FALLBACK="proxy"
#ETCD_DISCOVERY_PROXY=""
#ETCD_STRICT_RECONFIG_CHECK="false"
#ETCD_AUTO_COMPACTION_RETENTIO:N="0"
#
#[proxy]
#ETCD_PROXY="off"
#ETCD_PROXY_FAILURE_WAIT="5000"
  • 通过 systemctl 启动 etcd,分别在三台服务器上通过 root 组用户执行:
systemctl daemon-reload
systemctl start etcd
  • 集群健康状态检测

任意一台服务器上通过 etcd 用户执行:etcdctl cluster-health
如果三个节点状态都是 is healthy,证明 etcd 集群搭建完毕。

CentOS 7上搭建安全、容灾、高可用的etcd集群

【编者的话】etcd 是 CoreOS 团队发起的开源项目,基于 Go 语言实现,做为一个分布式键值对存储,通过分布式锁,leader选举和写屏障(write barriers)来实现可靠的分布式协作。

本文目标是部署一个基于TLS(Self-signed certificates)的安全、快速灾难恢复(Disaster Recovery, SNAPSHOT)的高可用(High Availability)的etcd集群。

准备工作

版本信息:

  • OS: CentOS Linux release 7.3.1611 (Core)
  • etcd Version: 3.2.4
  • Git SHA: c31bec0
  • Go Version: go1.8.3
  • Go OS/Arch: linux/amd64

机器配置信息

CoreOS官方推荐集群规模5个为宜,为了简化本文仅以3个节点为例:

NAME       ADDRESS             HOSTNAME                    CONFIGURATION
infra0  192.168.16.227  bjo-ep-kub-01.dev.fwmrm.net  8cpus, 16GB内存, 500GB磁盘
infra1  192.168.16.228  bjo-ep-kub-02.dev.fwmrm.net  8cpus, 16GB内存, 500GB磁盘
infra2  192.168.16.229  bjo-ep-kub-03.dev.fwmrm.net  8cpus, 16GB内存, 500GB磁盘

官方建议配置

硬件            通常场景                    重负载
CPU           2-4 cores                 8-18 cores 
Memory        8GB                       16GB-64GB
Disk          50 sequential IOPS        500 sequential IOPS
Network       1GbE                      10GbE

注:重负载情况以CPU为例,每秒处理数以千计的client端请求。AWS、GCE推荐配置请参考:Example hardware configurations on AWS and GCE

搭建etcd集群

搭建etcd集群有3种方式,分别为Static, etcd Discovery, DNS Discovery。Discovery请参见官网https://coreos.com/etcd/docs/l … .html,在此不再敖述。本文仅以Static方式展示一次集群搭建过程。
每个node的etcd配置分别如下:

$ /export/etcd/etcd --name infra0 --initial-advertise-peer-urls http://192.168.16.227:2380 
--listen-peer-urls http://192.168.16.227:2380 
--listen-client-urls http://192.168.16.227:2379,http://127.0.0.1:2379 
--advertise-client-urls http://192.168.16.227:2379 
--initial-cluster-token etcd-cluster-1 
--initial-cluster infra0=http://192.168.16.227:2380,infra1=http://192.168.16.228:2380,infra2=http://192.168.16.229:2380 
--initial-cluster-state new
$ /export/etcd/etcd --name infra1 --initial-advertise-peer-urls http://192.168.16.228:2380 
--listen-peer-urls http://192.168.16.228:2380 
--listen-client-urls http://192.168.16.228:2379,http://127.0.0.1:2379 
--advertise-client-urls http://192.168.16.228:2379 
--initial-cluster-token etcd-cluster-1 
--initial-cluster infra0=http://192.168.16.227:2380,infra1=http://192.168.16.228:2380,infra2=http://192.168.16.229:2380 
--initial-cluster-state new
$ /export/etcd/etcd --name infra2 --initial-advertise-peer-urls http://192.168.16.229:2380 
--listen-peer-urls http://192.168.16.229:2380 
--listen-client-urls http://192.168.16.229:2379,http://127.0.0.1:2379 
--advertise-client-urls http://192.168.16.229:2379 
--initial-cluster-token etcd-cluster-1 
--initial-cluster infra0=http://192.168.16.227:2380,infra1=http://192.168.16.228:2380,infra2=http://192.168.16.229:2380 
--initial-cluster-state new

TLS

etcd支持通过TLS加密通信,TLS channels可被用于集群peer间通信加密,以及client端traffic加密。Self-signed certificates与Automatic certificates两种安全认证形式,其中Self-signed certificates:自签名证书既可以加密traffic也可以授权其连接。本文以Self-signed certificates为例,使用Cloudflare的cfssl很容易生成集群所需证书。
首先,安装go以及设置环境变量GOPATH

$ cd /export
$ wget https://storage.googleapis.com/golang/go1.8.3.linux-amd64.tar.gz
$ tar -xzf go1.8.3.linux-amd64.tar.gz

$ sudo vim ~/.profile
$ export GOPATH=/export/go_path
$ export GOROOT=/export/go/
$ export CFSSL=/export/go_path/
$ export PATH=$PATH:$GOROOT/bin:$CFSSL/bin

$ source ~/.profile

下载并build CFSSL工具, 安装路径为$GOPATH/bin/cfssl, eg. cfssl, cfssljson会被安装到/export/go_path目录。

$ go get -u github.com/cloudflare/cfssl/cmd/cfssl
$ go get -u github.com/cloudflare/cfssl/cmd/cfssljson

初始化certificate authority

$ mkdir ~/cfssl
$ cd ~/cfssl
$ cfssl print-defaults config > ca-config.json
$ cfssl print-defaults csr > ca-csr.json

配置CA选项, ca-config.json文件内容如下

{
"signing": {
    "default": {
        "expiry": "43800h"
    },
    "profiles": {
        "server": {
            "expiry": "43800h",
            "usages": [
                "signing",
                "key encipherment",
                "server auth"
            ]
        },
        "client": {
            "expiry": "43800h",
            "usages": [
                "signing",
                "key encipherment",
                "client auth"
            ]
        },
        "peer": {
            "expiry": "43800h",
            "usages": [
                "signing",
                "key encipherment",
                "server auth",
                "client auth"
            ]
        }
    }
}

ca-csr.json Certificate Signing Request (CSR)文件内容如下

{
"CN": "My own CA",
"key": {
    "algo": "rsa",
    "size": 2048
},
"names": [
    {
        "C": "US",
        "L": "CA",
        "O": "My Company Name",
        "ST": "San Francisco",
        "OU": "Org Unit 1",
        "OU": "Org Unit 2"
    }
]

用已定义的选项生成CA:cfssl gencert -initca ca-csr.json | cfssljson -bare ca –

$ cfssl gencert -initca ca-csr.json | cfssljson -bare ca -
2017/08/02 00:56:03 [INFO] generating a new CA key and certificate from CSR
2017/08/02 00:56:03 [INFO] generate received request
2017/08/02 00:56:03 [INFO] received CSR
2017/08/02 00:56:03 [INFO] generating key: rsa-2048
2017/08/02 00:56:04 [INFO] encoded CSR
2017/08/02 00:56:04 [INFO] signed certificate with serial number 81101109133309828380726760425799837279517519090

会在当前目录下生成如下文件

ca-key.pem
ca.csr
ca.pem

注:保存好ca-key.pem文件。

生成server端证书:

$ cfssl print-defaults csr > server.json

server.json内容如下:

{
"CN": "server",
"hosts": [
    "127.0.0.1",
    "192.168.16.227",
    "192.168.16.228",
    "192.168.16.229",
    "bjo-ep-kub-01.dev.fwmrm.net",
    "bjo-ep-kub-02.dev.fwmrm.net",
    "bjo-ep-kub-03.dev.fwmrm.net"
],
"key": {
    "algo": "ecdsa",
    "size": 256
},
"names": [
    {
        "C": "US",
        "L": "CA",
        "ST": "San Francisco"
    }
]

}

接下来生成server端证书以及private key

$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server server.json | cfssljson -bare server
$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server server.json | cfssljson -bare server
2017/08/02 00:57:12 [INFO] generate received request
2017/08/02 00:57:12 [INFO] received CSR
2017/08/02 00:57:12 [INFO] generating key: ecdsa-256
2017/08/02 00:57:12 [INFO] encoded CSR
2017/08/02 00:57:12 [INFO] signed certificate with serial number 138149747694684969550285630966539823697635905885

将会生成如下文件:

server-key.pem
server.csr
server.pem

生成peer certificate

$ cfssl print-defaults csr > member1.json

替换 CN和hosts值,如下:

{
"CN": "member1",
"hosts": [
    "127.0.0.1",
    "192.168.16.227",
    "192.168.16.228",
    "192.168.16.229",
    "bjo-ep-kub-01.dev.fwmrm.net",
    "bjo-ep-kub-02.dev.fwmrm.net",
    "bjo-ep-kub-03.dev.fwmrm.net"
],
"key": {
    "algo": "rsa",
    "size": 2048
},
"names": [
    {
        "C": "US",
        "ST": "CA",
        "L": "San Francisco"
    }
]

}

生成 member1 certificate与private key

$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer member1.json | cfssljson -bare member1
$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer member1.json | cfssljson -bare member1
2017/08/02 00:59:12 [INFO] generate received request
2017/08/02 00:59:12 [INFO] received CSR
2017/08/02 00:59:12 [INFO] generating key: rsa-2048
2017/08/02 00:59:13 [INFO] encoded CSR
2017/08/02 00:59:13 [INFO] signed certificate with serial number 222573666682951886940627822839805508037201209158

得到如下文件:

member1-key.pem
member1.csr
member1.pem

在集群其他节点上重复如上步骤。

生成 client certificate

$ cfssl print-defaults csr > client.json

client.json内容如下:

{
"CN": "client",
"hosts": [
    "127.0.0.1",
    "192.168.16.227",
    "192.168.16.228",
    "192.168.16.229"
],
"key": {
    "algo": "rsa",
    "size": 2048
},
"names": [
    {
        "C": "US",
        "ST": "CA",
        "L": "San Francisco"
    }
]

}

生成client certificate

$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client 

将会得到如下文件

client-key.pem
client.csr
client.pem

拷贝节点1生成的证书到全部节点,并将证书全部置于/etc/ssl/etcd/目录, 至此TLS证书全部生成完成。

测试TLS

示例1: 客户端到服务器采用HTTPS客户端证书授权

启动etcd服务:

$ /export/etcd/etcd -name infra0 --data-dir infra0 
--client-cert-auth --trusted-ca-file=/etc/ssl/etcd/ca.pem --cert-file=/etc/ssl/etcd/server.pem --key-file=/etc/ssl/etcd/server-key.pem   
--advertise-client-urls=https://127.0.0.1:2379 --listen-client-urls=https://127.0.0.1:2379

插入数据:

$ curl --cacert /etc/ssl/etcd/ca.pem --cert /etc/ssl/etcd/client.pem --key /etc/ssl/etcd/client-key.pem -L https://127.0.0.1:2379/v2/keys/foo -XPUT -d value=bar -v

读取数据成功

$ curl --cacert /etc/ssl/etcd/ca.pem --cert /etc/ssl/etcd/client.pem --key /etc/ssl/etcd/client-key.pem -L https://127.0.0.1:2379/v2/keys/foo
{"action":"get","node":{"key":"/foo","value":"bar","modifiedIndex":12,"createdIndex":12

示例2:Using self-signed certificates both encrypts traffic and authenticates its connections

各节点的etcd配置分别如下

$ /export/etcd/etcd 
--name infra0 
--initial-advertise-peer-urls https://192.168.16.227:2380 
--listen-peer-urls https://192.168.16.227:2380 
--listen-client-urls https://192.168.16.227:2379,https://127.0.0.1:2379 
--advertise-client-urls https://192.168.16.227:2379 
--initial-cluster-token etcd-cluster-1 
--initial-cluster infra0=https://192.168.16.227:2380,infra1=https://192.168.16.228:2380,infra2=https://192.168.16.229:2380 
--initial-cluster-state new 
--client-cert-auth --trusted-ca-file=/etc/ssl/etcd/ca.pem 
--cert-file=/etc/ssl/etcd/server.pem --key-file=/etc/ssl/etcd/server-key.pem 
--peer-client-cert-auth --peer-trusted-ca-file=/etc/ssl/etcd/ca.pem 
--peer-cert-file=/etc/ssl/etcd/member1.pem --peer-key-file=/etc/ssl/etcd/member1-key.pem
$ /export/etcd/etcd 
--name infra1 
--initial-advertise-peer-urls https://192.168.16.228:2380 
--listen-peer-urls https://192.168.16.228:2380 
--listen-client-urls https://192.168.16.228:2379,https://127.0.0.1:2379 
--advertise-client-urls https://192.168.16.228:2379 
--initial-cluster-token etcd-cluster-1 
--initial-cluster infra0=https://192.168.16.227:2380,infra1=https://192.168.16.228:2380,infra2=https://192.168.16.229:2380 
--initial-cluster-state new 
--client-cert-auth --trusted-ca-file=/etc/ssl/etcd/ca.pem 
--cert-file=/etc/ssl/etcd/server.pem --key-file=/etc/ssl/etcd/server-key.pem 
--peer-client-cert-auth --peer-trusted-ca-file=/etc/ssl/etcd/ca.pem 
--peer-cert-file=/etc/ssl/etcd/member2.pem --peer-key-file=/etc/ssl/etcd/member2-key.pem
$ /export/etcd/etcd 
--name infra2 
--initial-advertise-peer-urls https://192.168.16.229:2380 
--listen-peer-urls https://192.168.16.229:2380 
--listen-client-urls https://192.168.16.229:2379,https://127.0.0.1:2379 
--advertise-client-urls https://192.168.16.229:2379 
--initial-cluster-token etcd-cluster-1 
--initial-cluster infra0=https://192.168.16.227:2380,infra1=https://192.168.16.228:2380,infra2=https://192.168.16.229:2380 
--initial-cluster-state new 
--client-cert-auth --trusted-ca-file=/etc/ssl/etcd/ca.pem 
--cert-file=/etc/ssl/etcd/server.pem --key-file=/etc/ssl/etcd/server-key.pem 
--peer-client-cert-auth --peer-trusted-ca-file=/etc/ssl/etcd/ca.pem 
--peer-cert-file=/etc/ssl/etcd/member3.pem --peer-key-file=/etc/ssl/etcd/member3-key.pem

准备测试数据:

$ curl --cacert /etc/ssl/etcd/ca.pem --cert /etc/ssl/etcd/client.pem --key /etc/ssl/etcd/client-key.pem -L https://127.0.0.1:2379/v2/keys/fristname -XPUT -d value=Xia -v

$ ETCDCTL_API=3 /export/etcd/etcdctl --cacert /etc/ssl/etcd/ca.pem --cert /etc/ssl/etcd/client.pem --key /etc/ssl/etcd/client-key.pem --endpoints=https://192.168.16.227:2379,https://192.168.16.228:2379,https://192.168.16.229:2379 put lasttname 'Zhang'

验证测试结果:

$ curl --cacert /etc/ssl/etcd/ca.pem --cert /etc/ssl/etcd/client.pem --key /etc/ssl/etcd/client-key.pem -L https://127.0.0.1:2379/v2/keys/
{"action":"get","node":{"dir":true,"nodes":[{"key":"/foo","value":"bar","modifiedIndex":19,"createdIndex":19},{"key":"/fristname","value":"Xia","modifiedIndex":20,"createdIndex":20},{"key":"/lasttname","value":"Zhang","modifiedIndex":21,"createdIndex":21}]

etcd Troubleshooting

etcd failure主要分为如下5种情况:

  1. 少数followers failure
  2. Leader failure
  3. 多数failure
  4. Network partition
  5. 启动时失败

接下来主要对上面情况3进行处理,也就是平时常说的Disaster Recovery

灾备恢复(Disaster Recovery)

以etcd v3 provides snapshot 方式为例说明etcd一次灾难恢复过程。
首先,etcd正常工作时利用etcdctl snapshot save命令或拷贝etcd目录中的member/snap/db文件,以前者为例:

$ ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshot.db}}
如果enable TLS,需要如下命令:
{{{$ ETCDCTL_API=3 /export/etcd/etcdctl --endpoints=https://192.168.16.227:2379,https://192.168.16.228:2379,https://192.168.16.228:2379 snapshot save snapshot.db --cacert=/etc/ssl/etcd/ca.pem --cert=/etc/ssl/etcd/client.pem --key=/etc/ssl/etcd/client-key.pem

Snapshot saved at snapshot.db

将生成snapshot拷贝到集群其他2个节点上,所有节点灾备的恢复都用同一个snapshot。

插入部分数据用于测试灾备:

$ curl --cacert /etc/ssl/etcd/ca.pem --cert /etc/ssl/etcd/client.pem --key /etc/ssl/etcd/client-key.pem -L https://127.0.0.1:2379/v2/keys/fristname -XPUT -d value=Xia -v

$ ETCDCTL_API=3 /export/etcd/etcdctl --cacert /etc/ssl/etcd/ca.pem --cert /etc/ssl/etcd/client.pem --key /etc/ssl/etcd/client-key.pem --endpoints=https://192.168.16.227:2379,https://192.168.16.228:2379,https://192.168.16.229:2379 put lasttname 'Zhang'

测试数据已插入成功:

$ ETCDCTL_API=3 /export/etcd/etcdctl --cacert /etc/ssl/etcd/ca.pem --cert /etc/ssl/etcd/client.pem --key /etc/ssl/etcd/client-key.pem --endpoints=https://192.168.16.227:2379,https://192.168.16.228:2379,https://192.168.16.229:2379  get  firstname

$ curl --cacert /etc/ssl/etcd/ca.pem --cert /etc/ssl/etcd/client.pem --key /etc/ssl/etcd/client-key.pem -L https://127.0.0.1:2379/v2/keys/
{"action":"get","node":{"dir":true,"nodes":[{"key":"/foo","value":"bar","modifiedIndex":19,"createdIndex":19},{"key":"/fristname","value":"Xia","modifiedIndex":20,"createdIndex":20},{"key":"/lasttname","value":"Zhang","modifiedIndex":21,"createdIndex":21}]

停止3个机器的etcd服务,并删除全部节点上etcd数据目录 。
恢复数据,以TLS enable为例,分别在3个节点执行如下命令进行恢复:

$ ETCDCTL_API=3 /export/etcd/etcdctl snapshot restore snapshot.db 
--name infra0 
--initial-cluster infra0=https://192.168.16.227:2380,infra1=https://192.168.16.228:2380,infra2=https://192.168.16.229:2380 
--initial-cluster-token etcd-cluster-1 
--initial-advertise-peer-urls https://192.168.16.227:2380 
--cacert /etc/ssl/etcd/ca.pem 
--cert /etc/ssl/etcd/client.pem 
--key /etc/ssl/etcd/client-key.pem
$ ETCDCTL_API=3 /export/etcd/etcdctl snapshot restore snapshot.db 
--name infra1 
--initial-cluster infra0=https://192.168.16.227:2380,infra1=https://192.168.16.228:2380,infra2=https://192.168.16.229:2380 
--initial-cluster-token etcd-cluster-1 
--initial-advertise-peer-urls https://192.168.16.228:2380 
--cacert /etc/ssl/etcd/ca.pem 
--cert /etc/ssl/etcd/client.pem 
--key /etc/ssl/etcd/client-key.pem
$ ETCDCTL_API=3 /export/etcd/etcdctl snapshot restore snapshot.db 
--name infra2 
--initial-cluster infra0=https://192.168.16.227:2380,infra1=https://192.168.16.228:2380,infra2=https://192.168.16.229:2380 
--initial-cluster-token etcd-cluster-1 
--initial-advertise-peer-urls https://192.168.16.229:2380 
--cacert /etc/ssl/etcd/ca.pem 
--cert /etc/ssl/etcd/client.pem 
--key /etc/ssl/etcd/client-key.pem

恢复数据log示例:

$ ETCDCTL_API=3 /export/etcd/etcdctl snapshot restore snapshot.db   --name infra0   --initial-cluster infra0=https://192.168.16.227:2380,infra1=https://192.168.16.228:2380,infra2=https://192.168.16.229:2380   --initial-cluster-token etcd-cluster-1   --initial-advertise-peer-urls https://192.168.16.227:2380   --cacert /etc/ssl/etcd/ca.pem   --cert /etc/ssl/etcd/client.pem   --key /etc/ssl/etcd/client-key.pem
2017-08-06 04:09:12.853510 I | etcdserver/membership: added member 3e5097be4ea17ebe [https://192.168.16.229:2380] to cluster cabc8098aa3afc98
2017-08-06 04:09:12.853567 I | etcdserver/membership: added member 67d47e92a1704b1a [https://192.168.16.227:2380] to cluster cabc8098aa3afc98
2017-08-06 04:09:12.853583 I | etcdserver/membership: added member b4725a5341abf1a0 [https://192.168.16.228:2380] to cluster cabc8098aa3afc98

接下来,在3个节点上分别执行:

$ /export/etcd/etcd 
--name infra0 
--initial-advertise-peer-urls https://192.168.16.227:2380 
--listen-peer-urls https://192.168.16.227:2380 
--listen-client-urls https://192.168.16.227:2379,https://127.0.0.1:2379 
--advertise-client-urls https://192.168.16.227:2379 
--initial-cluster-token etcd-cluster-1 
--initial-cluster infra0=https://192.168.16.227:2380,infra1=https://192.168.16.228:2380,infra2=https://192.168.16.229:2380 
--initial-cluster-state new 
--client-cert-auth --trusted-ca-file=/etc/ssl/etcd/ca.pem 
--cert-file=/etc/ssl/etcd/server.pem --key-file=/etc/ssl/etcd/server-key.pem 
--peer-client-cert-auth --peer-trusted-ca-file=/etc/ssl/etcd/ca.pem 
--peer-cert-file=/etc/ssl/etcd/member1.pem --peer-key-file=/etc/ssl/etcd/member1-key.pem
$ /export/etcd/etcd 
--name infra1 
--initial-advertise-peer-urls https://192.168.16.228:2380 
--listen-peer-urls https://192.168.16.228:2380 
--listen-client-urls https://192.168.16.228:2379,https://127.0.0.1:2379 
--advertise-client-urls https://192.168.16.228:2379 
--initial-cluster-token etcd-cluster-1 
--initial-cluster infra0=https://192.168.16.227:2380,infra1=https://192.168.16.228:2380,infra2=https://192.168.16.229:2380 
--initial-cluster-state new 
--client-cert-auth --trusted-ca-file=/etc/ssl/etcd/ca.pem 
--cert-file=/etc/ssl/etcd/server.pem --key-file=/etc/ssl/etcd/server-key.pem 
--peer-client-cert-auth --peer-trusted-ca-file=/etc/ssl/etcd/ca.pem 
--peer-cert-file=/etc/ssl/etcd/member2.pem --peer-key-file=/etc/ssl/etcd/member2-key.pem
$ /export/etcd/etcd 
--name infra2 
--initial-advertise-peer-urls https://192.168.16.229:2380 
--listen-peer-urls https://192.168.16.229:2380 
--listen-client-urls https://192.168.16.229:2379,https://127.0.0.1:2379 
--advertise-client-urls https://192.168.16.229:2379 
--initial-cluster-token etcd-cluster-1 
--initial-cluster infra0=https://192.168.16.227:2380,infra1=https://192.168.16.228:2380,infra2=https://192.168.16.229:2380 
--initial-cluster-state new 
--client-cert-auth --trusted-ca-file=/etc/ssl/etcd/ca.pem 
--cert-file=/etc/ssl/etcd/server.pem --key-file=/etc/ssl/etcd/server-key.pem 
--peer-client-cert-auth --peer-trusted-ca-file=/etc/ssl/etcd/ca.pem 
--peer-cert-file=/etc/ssl/etcd/member3.pem --peer-key-file=/etc/ssl/etcd/member3-key.pem

验证灾备恢复效果,原集群数据是否保存:

$ ETCDCTL_API=3 /export/etcd/etcdctl --cacert /etc/ssl/etcd/ca.pem --cert /etc/ssl/etcd/client.pem --key /etc/ssl/etcd/client-key.pem --endpoints=https://192.168.16.227:2379,https://192.168.16.228:2379,https://192.168.16.229:2379 get lasttname
lasttname
Zhang

$ ETCDCTL_API=3 /export/etcd/etcdctl --cacert /etc/ssl/etcd/ca.pem --cert /etc/ssl/etcd/client.pem --key /etc/ssl/etcd/client-key.pem --endpoints=https://192.168.16.227:2379,https://192.168.16.228:2379,https://192.168.16.229:2379 get firstname
firstname
Xia

从上面结果可以看出,灾备恢复成功。

etcd系统限制

  1. 请求大小限制:当前支持 RPC requests 1MB 数据,未来会有所增加或可配置
  2. 存储大小限制:默认 2GB存储,可配置 –quota-backend-bytes扩展到8GB

监控

etcd提供基于Prometheus + builtin Grafana的etcd Metrics监控方案和监控项,具体请参见
etcd Metrics: https://coreos.com/etcd/docs/latest/metrics.html
Prometheus + builtin Grafana: https://coreos.com/etcd/docs/latest/op-guide/monitoring.html