一、准备
前一篇点击打开链接只部署了一个单机集群。在这一篇里,手动部署一个多机集群:mycluster。我们有三台机器nod1,node2和node3;其中node1可以免密ssh/scp任意其他两台机器。我们的所有工作都在node1上完成。
准备工作包括在各个机器上安装ceph rpm包(见前一篇第1节点击打开链接),并在各个机器上修改下列文件:
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
修改:
Environment=CLUSTER=ceph <--- 改成CLUSTER=mycluster
ExecStart=/usr/bin/... --id %i --setuser ceph --setgroup ceph <--- 删掉--setuser ceph --setgroup ceph
二、创建工作目录
在node1创建一个工作目录,后续所有工作都在node1上的这个工作目录中完成;
mkdir /tmp/mk-ceph-cluster
cd /tmp/mk-ceph-cluster
三、创建配置文件
vim mycluster.conf
[global]
cluster = mycluster
fsid = 116d4de8-fd14-491f-811f-c1bdd8fac141
public network = 192.168.100.0/24
cluster network = 192.168.73.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd pool default size = 3
osd pool default min size = 2
osd pool default pg num = 128
osd pool default pgp num = 128
osd pool default crush rule = 0
osd crush chooseleaf type = 1
admin socket = /var/run/ceph/$cluster-$name.asock
pid file = /var/run/ceph/$cluster-$name.pid
log file = /var/log/ceph/$cluster-$name.log
log to syslog = false
max open files = 131072
ms bind ipv6 = false
[mon]
mon initial members = node1,node2,node3
mon host = 192.168.100.131:6789,192.168.100.132:6789,192.168.100.133:6789
;Yuanguo: the default value of {mon data} is /var/lib/ceph/mon/$cluster-$id,
; we overwrite it.
mon data = /var/lib/ceph/mon/$cluster-$name
mon clock drift allowed = 10
mon clock drift warn backoff = 30
mon osd full ratio = .95
mon osd nearfull ratio = .85
mon osd down out interval = 600
mon osd report timeout = 300
debug ms = 20
debug mon = 20
debug paxos = 20
debug auth = 20
[mon.node1]
host = node1
mon addr = 192.168.100.131:6789
[mon.node2]
host = node2
mon addr = 192.168.100.132:6789
[mon.node3]
host = node3
mon addr = 192.168.100.133:6789
[mgr]
;Yuanguo: the default value of {mgr data} is /var/lib/ceph/mgr/$cluster-$id,
; we overwrite it.
mgr data = /var/lib/ceph/mgr/$cluster-$name
[osd]
;Yuanguo: we wish to overwrite {osd data}, but it seems that 'ceph-disk' forces
; to use the default value, so keep the default now; maybe in later versions
; of ceph the limitation will be eliminated.
osd data = /var/lib/ceph/osd/$cluster-$id
osd recovery max active = 3
osd max backfills = 5
osd max scrubs = 2
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=1024
osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog
filestore max sync interval = 5
osd op threads = 2
debug ms = 100
debug osd = 100
需要说明的是,在这个配置文件中,我们覆盖了一些默认值,比如:{mon data}和{mgr data},但是没有覆盖{osd data},因为ceph-disk貌似强制使用默认值。另外,pid, sock文件被放置在/var/run/ceph/中,以$cluster-$name命名;log文件放置在/var/log/ceph/中,也是以$cluster-$name命名。这些都可以覆盖。
四、生成keyring
在单机部署中点击打开链接,我们说过,有两种操作集群中user及其权限的方式,这里我们使用第一种:先生成keyring文件,然后在创建集群时带入使之生效。
ceph-authtool --create-keyring mycluster.keyring --gen-key -n mon. --cap mon 'allow *'
ceph-authtool --create-keyring mycluster.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' --cap mgr 'allow *'
ceph-authtool --create-keyring mycluster.client.bootstrap-osd.keyring --gen-key -n client.bootstrap-osd --cap mon 'allow profile bootstrap-osd'
ceph-authtool --create-keyring mycluster.mgr.node1.keyring --gen-key -n mgr.node1 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *'
ceph-authtool --create-keyring mycluster.mgr.node2.keyring --gen-key -n mgr.node2 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *'
ceph-authtool --create-keyring mycluster.mgr.node3.keyring --gen-key -n mgr.node3 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *'
ceph-authtool mycluster.keyring --import-keyring mycluster.client.admin.keyring
ceph-authtool mycluster.keyring --import-keyring mycluster.client.bootstrap-osd.keyring
ceph-authtool mycluster.keyring --import-keyring mycluster.mgr.node1.keyring
ceph-authtool mycluster.keyring --import-keyring mycluster.mgr.node2.keyring
ceph-authtool mycluster.keyring --import-keyring mycluster.mgr.node3.keyring
cat mycluster.keyring
[mon.]
key = AQA525NZsY73ERAAIM1J6wSxglBNma3XAdEcVg==
caps mon = "allow *"
[client.admin]
key = AQBJ25NZznIpEBAAlCdCy+OyUIvxtNq+1DSLqg==
auid = 0
caps mds = "allow *"
caps mgr = "allow *"
caps mon = "allow *"
caps osd = "allow *"
[client.bootstrap-osd]
key = AQBW25NZtl/RBxAACGWafYy1gPWEmx9geCLi6w==
caps mon = "allow profile bootstrap-osd"
[mgr.node1]
key = AQBb25NZ1mIeFhAA/PmRHFY6OgnAMXL1/8pSxw==
caps mds = "allow *"
caps mon = "allow profile mgr"
caps osd = "allow *"
[mgr.node2]
key = AQBg25NZJ6jyHxAAf2GfBAG5tuNwf9YjkhhEWA==
caps mds = "allow *"
caps mon = "allow profile mgr"
caps osd = "allow *"
[mgr.node3]
key = AQBl25NZ7h6CJRAAaFiea7hiTrQNVoZysA7n/g==
caps mds = "allow *"
caps mon = "allow profile mgr"
caps osd = "allow *"
五、生成monmap
生成monmap并添加3个monitor
monmaptool --create --add node1 192.168.100.131:6789 --add node2 192.168.100.132:6789 --add node3 192.168.100.133:6789 --fsid 116d4de8-fd14-491f-811f-c1bdd8fac141 monmap
[plain] view plain copy
monmaptool --print monmap
monmaptool: monmap file monmap
epoch 0
fsid 116d4de8-fd14-491f-811f-c1bdd8fac141
last_changed 2017-08-16 05:45:37.851899
created 2017-08-16 05:45:37.851899
0: 192.168.100.131:6789/0 mon.node1
1: 192.168.100.132:6789/0 mon.node2
2: 192.168.100.133:6789/0 mon.node3
六、分发配置文件
keyring和monmap
把第2、3和4步中生成的配置文件,keyring,monmap分发到各个机器。由于mycluster.mgr.nodeX.keyring暂时使用不到,先不分发它们(见第8节)。
cp mycluster.client.admin.keyring mycluster.client.bootstrap-osd.keyring mycluster.keyring mycluster.conf monmap /etc/ceph
scp mycluster.client.admin.keyring mycluster.client.bootstrap-osd.keyring mycluster.keyring mycluster.conf monmap node2:/etc/ceph
scp mycluster.client.admin.keyring mycluster.client.bootstrap-osd.keyring mycluster.keyring mycluster.conf monmap node3:/etc/ceph
七、创建集群
1、创建{mon data}目录
mkdir /var/lib/ceph/mon/mycluster-mon.node1
ssh node2 mkdir /var/lib/ceph/mon/mycluster-mon.node2
ssh node3 mkdir /var/lib/ceph/mon/mycluster-mon.node3
注意,在配置文件mycluster.conf中,我们把{mon data}设置为/var/lib/ceph/mon/$cluster-$name,而不是默认的/var/lib/ceph/mon/$cluster-$id;
$cluster-$name展开为mycluster-mon.node1(23);
默认的$cluster-$id展开为mycluster-node1(23);
2、初始化monitor
ceph-mon --cluster mycluster --mkfs -i node1 --monmap /etc/ceph/monmap --keyring /etc/ceph/mycluster.keyring
ssh node2 ceph-mon --cluster mycluster --mkfs -i node2 --monmap /etc/ceph/monmap --keyring /etc/ceph/mycluster.keyring
ssh node3 ceph-mon --cluster mycluster --mkfs -i node3 --monmap /etc/ceph/monmap --keyring /etc/ceph/mycluster.keyring
注意,在配置文件mycluster.conf,我们把{mon data}设置为/var/lib/ceph/mon/$cluster-$name,展开为/var/lib/ceph/mon/mycluster-mon.node1(23)。ceph-mon会
根据–cluster mycluster找到配置文件mycluster.conf,并解析出{mon data},然后在那个目录下进行初始化。
3、touch done
touch /var/lib/ceph/mon/mycluster-mon.node1/done
ssh node2 touch /var/lib/ceph/mon/mycluster-mon.node2/done
ssh node3 touch /var/lib/ceph/mon/mycluster-mon.node3/done
4、启动monitors
systemctl start ceph-mon@node1
ssh node2 systemctl start ceph-mon@node2
ssh node3 systemctl start ceph-mon@node3
5、检查机器状态
ceph --cluster mycluster -s
cluster:
id: 116d4de8-fd14-491f-811f-c1bdd8fac141
health: HEALTH_OK
services:
mon: 3 daemons, quorum node1,node2,node3
mgr: no daemons active
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage: 0 kB used, 0 kB / 0 kB avail
pgs:
八、添加osd
每台集群都有一个/dev/sdb,我们把它们作为osd。
1、删除它们的分区
2、prepare
ceph-disk prepare --cluster mycluster --cluster-uuid 116d4de8-fd14-491f-811f-c1bdd8fac141 --bluestore --block.db /dev/sdb --block.wal /dev/sdb /dev/sdb
ssh node2 ceph-disk prepare --cluster mycluster --cluster-uuid 116d4de8-fd14-491f-811f-c1bdd8fac141 --bluestore --block.db /dev/sdb --block.wal /dev/sdb /dev/sdb
ssh node3 ceph-disk prepare --cluster mycluster --cluster-uuid 116d4de8-fd14-491f-811f-c1bdd8fac141 /dev/sdb
注意:prepare node3:/dev/sdb时,我们没有加选项:--bluestore --block.db /dev/sdb --block.wal /dev/sdb;后面我们会看它和其他两个有什么不同。
3、activate
ceph-disk activate /dev/sdb1 --activate-key /etc/ceph/mycluster.client.bootstrap-osd.keyring
ssh node2 ceph-disk activate /dev/sdb1 --activate-key /etc/ceph/mycluster.client.bootstrap-osd.keyring
ssh node3 ceph-disk activate /dev/sdb1 --activate-key /etc/ceph/mycluster.client.bootstrap-osd.keyring
注意:ceph-disk好像有两个问题:
- 前面说过,它不使用自定义的{osd data},而强制使用默认值 /var/lib/ceph/osd/$cluster-$id
-
好像不能为一个磁盘指定osd id,而只能依赖它自动生成。虽然ceph-disk prepare有一个选项–osd-id,但是ceph-disk activate并不使用它而是自己生成。当不匹配时,会出现 如下错误:
# ceph-disk activate /dev/sdb1 --activate-key /etc/ceph/mycluster.client.bootstrap-osd.keyring
command_with_stdin: Error EEXIST: entity osd.0 exists but key does not match
mount_activate: Failed to activate
'['ceph', '--cluster', 'mycluster', '--name', 'client.bootstrap-osd', '--keyring', '/etc/ceph/mycluster.client.bootstrap-osd.keyring', '-i', '-', 'osd', 'new', u'ca8aac6a-b442-4b07-8fa6-62ac93b7cd29']' failed with status code 17
从 ‘-i’, ‘-‘可以看出,它只能自动生成osd id;
4、检查osd
在ceph-disk prepare时,node1:/dev/sdb和node2:/dev/sdb一样,都有–bluestore –block.db /dev/sdb –block.wal选项;node3:/dev/sdb不同,没有加这些选项。我们看看有什么不同。
4.1 node1
mount | grep sdb
/dev/sdb1 on /var/lib/ceph/osd/mycluster-0 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
ls /var/lib/ceph/osd/mycluster-0/
activate.monmap block block.db_uuid block.wal bluefs fsid kv_backend mkfs_done systemd whoami
active block.db block_uuid block.wal_uuid ceph_fsid keyring magic ready type
ls -l /var/lib/ceph/osd/mycluster-0/block
lrwxrwxrwx. 1 ceph ceph 58 Aug 16 05:52 /var/lib/ceph/osd/mycluster-0/block -> /dev/disk/by-partuuid/a12dd642-b64c-4fef-b9e6-0b45cff40fa9
ls -l /dev/disk/by-partuuid/a12dd642-b64c-4fef-b9e6-0b45cff40fa9
lrwxrwxrwx. 1 root root 10 Aug 16 05:55 /dev/disk/by-partuuid/a12dd642-b64c-4fef-b9e6-0b45cff40fa9 -> ../../sdb2
blkid /dev/sdb2
/dev/sdb2: PARTLABEL="ceph block" PARTUUID="a12dd642-b64c-4fef-b9e6-0b45cff40fa9"
cat /var/lib/ceph/osd/mycluster-0/block_uuid
a12dd642-b64c-4fef-b9e6-0b45cff40fa9
ls -l /var/lib/ceph/osd/mycluster-0/block.db
lrwxrwxrwx. 1 ceph ceph 58 Aug 16 05:52 /var/lib/ceph/osd/mycluster-0/block.db -> /dev/disk/by-partuuid/1c107775-45e6-4b79-8a2f-1592f5cb03f2
ls -l /dev/disk/by-partuuid/1c107775-45e6-4b79-8a2f-1592f5cb03f2
lrwxrwxrwx. 1 root root 10 Aug 16 05:55 /dev/disk/by-partuuid/1c107775-45e6-4b79-8a2f-1592f5cb03f2 -> ../../sdb3
blkid /dev/sdb3
/dev/sdb3: PARTLABEL="ceph block.db" PARTUUID="1c107775-45e6-4b79-8a2f-1592f5cb03f2"
cat /var/lib/ceph/osd/mycluster-0/block.db_uuid
1c107775-45e6-4b79-8a2f-1592f5cb03f2
ls -l /var/lib/ceph/osd/mycluster-0/block.wal
lrwxrwxrwx. 1 ceph ceph 58 Aug 16 05:52 /var/lib/ceph/osd/mycluster-0/block.wal -> /dev/disk/by-partuuid/76055101-b892-4da9-b80a-c1920f24183f
ls -l /dev/disk/by-partuuid/76055101-b892-4da9-b80a-c1920f24183f
lrwxrwxrwx. 1 root root 10 Aug 16 05:55 /dev/disk/by-partuuid/76055101-b892-4da9-b80a-c1920f24183f -> ../../sdb4
blkid /dev/sdb4
/dev/sdb4: PARTLABEL="ceph block.wal" PARTUUID="76055101-b892-4da9-b80a-c1920f24183f"
cat /var/lib/ceph/osd/mycluster-0/block.wal_uuid
76055101-b892-4da9-b80a-c1920f24183f
可见,node1(node2)上,/dev/sdb被分为4个分区:
- /dev/sdb1: metadata
- /dev/sdb2:the main block device
- /dev/sdb3: db
- /dev/sdb4: wal
具体见:ceph-disk prepare –help
4.2 node3
mount | grep sdb
/dev/sdb1 on /var/lib/ceph/osd/mycluster-2 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
ls /var/lib/ceph/osd/mycluster-2
activate.monmap active block block_uuid bluefs ceph_fsid fsid keyring kv_backend magic mkfs_done ready systemd type whoami
ls -l /var/lib/ceph/osd/mycluster-2/block
lrwxrwxrwx. 1 ceph ceph 58 Aug 16 05:54 /var/lib/ceph/osd/mycluster-2/block -> /dev/disk/by-partuuid/0a70b661-43f5-4562-83e0-cbe6bdbd31fb
ls -l /dev/disk/by-partuuid/0a70b661-43f5-4562-83e0-cbe6bdbd31fb
lrwxrwxrwx. 1 root root 10 Aug 16 05:56 /dev/disk/by-partuuid/0a70b661-43f5-4562-83e0-cbe6bdbd31fb -> ../../sdb2
blkid /dev/sdb2
/dev/sdb2: PARTLABEL="ceph block" PARTUUID="0a70b661-43f5-4562-83e0-cbe6bdbd31fb"
cat /var/lib/ceph/osd/mycluster-2/block_uuid
0a70b661-43f5-4562-83e0-cbe6bdbd31fb
可见,在node3上,/dev/sdb被分为2个分区:
- /dev/sdb1:metadata
- /dev/sdb2:the main block device;db和wal也在这个分区上。
具体见:ceph-disk prepare –help
5、检查集群状态
ceph --cluster mycluster -s
cluster:
id: 116d4de8-fd14-491f-811f-c1bdd8fac141
health: HEALTH_WARN
no active mgr
services:
mon: 3 daemons, quorum node1,node2,node3
mgr: no daemons active
osd: 3 osds: 3 up, 3 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage: 0 kB used, 0 kB / 0 kB avail
pgs:
由于没有添加mgr,集群处于WARN状态。
九、添加mgr
1、创建{mgr data}目录
mkdir /var/lib/ceph/mgr/mycluster-mgr.node1
ssh node2 mkdir /var/lib/ceph/mgr/mycluster-mgr.node2
ssh node3 mkdir /var/lib/ceph/mgr/mycluster-mgr.node3
注意,和{mon data}类似,在配置文件mycluster.conf中,我们把{mgr data}设置为/var/lib/ceph/mgr/$cluster-$name,而不是默认的/var/lib/ceph/mgr/$cluster-$id。
2、分发mgr的keyring
cp mycluster.mgr.node1.keyring /var/lib/ceph/mgr/mycluster-mgr.node1/keyring
scp mycluster.mgr.node2.keyring node2:/var/lib/ceph/mgr/mycluster-mgr.node2/keyring
scp mycluster.mgr.node3.keyring node3:/var/lib/ceph/mgr/mycluster-mgr.node3/keyring
3、启动mgr
systemctl start ceph-mgr@node1
ssh node2 systemctl start ceph-mgr@node2
ssh node3 systemctl start ceph-mgr@node3
4、检查集群状态
ceph --cluster mycluster -s
cluster:
id: 116d4de8-fd14-491f-811f-c1bdd8fac141
health: HEALTH_OK
services:
mon: 3 daemons, quorum node1,node2,node3
mgr: node1(active), standbys: node3, node2
osd: 3 osds: 3 up, 3 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage: 5158 MB used, 113 GB / 118 GB avail
pgs:
可见,添加mgr之后,集群处于OK状态。