Rook定制化和管理Ceph集群

概述

上一篇文章 中介绍了如何通过Rook在Kubernetes上部署Ceph集群,如何提供PV服务。

下面进一步的从Ceph集群的配置和维护角度,讲解下如何根据实际需求来部署、修改、升级Ceph集群。

下面实践基于Rook v0.8.0版本,最新的Rook v0.9.0版本相差不大。

Ceph OSD配置

默认通过cluster.yaml创建Ceph集群时,使用的是filestore,并且使用的是/var/lib/rook/osd-<id>目录,这明显不是我们通常的使用方式,下面介绍如何配置Ceph OSD使用bluestore和具体磁盘。

使用所有可用磁盘

如下,若我们配置具体节点上Ceph OSD使用所有可以使用的Devices,并且指定都使用bluestore的方式,则可以类似如下配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
...
---
apiVersion: ceph.rook.io/v1beta1
kind: Cluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: ceph/ceph:v13
allowUnsupported: false
dataDirHostPath: /var/lib/rook
serviceAccount: rook-ceph-cluster
mon:
count: 3
allowMultiplePerNode: true
dashboard:
enabled: true
network:
hostNetwork: false
storage: # cluster level storage configuration and selection
useAllNodes: false
useAllDevices: true
deviceFilter:
location:
config:
storeType: bluestore
nodes:
- name: "ke-dev1-worker1"
- name: "ke-dev1-worker3"
- name: "ke-dev1-worker4"

使用指定磁盘

若指定具体节点使用的磁盘,storage的部分配置如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
storage:
useAllNodes: false
useAllDevices: false
deviceFilter:
location:
config:
storeType: bluestore
nodes:
- name: "ke-dev1-worker1"
devices:
- name: "vde"
- name: "ke-dev1-worker3"
devices:
- name: "vde"
- name: "ke-dev1-worker4"
devices:
- name: "vdf"

指定磁盘必须有GPT header!

不支持指定分区!(查看log,配置分区的信息并没有传递到ceph-osd-prepare这一步)

Ceph集群修改

在部署完Ceph集群后,若想修改Ceph集群的部署配置,比如增加/删除OSDs等,可以通过下面命令执行:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# kubectl -n rook-ceph edit cluster rook-ceph
...
spec:
cephVersion:
image: ceph/ceph:v13
dashboard:
enabled: true
dataDirHostPath: /var/lib/rook
mon:
allowMultiplePerNode: true
count: 3
network:
hostNetwork: false
serviceAccount: rook-ceph-cluster
storage:
config:
storeType: bluestore
nodes:
- config: null
devices:
- FullPath: ""
config: null
name: vde
name: ke-dev1-worker1
resources: {}
- config: null
devices:
- FullPath: ""
config: null
name: vde
name: ke-dev1-worker3
resources: {}
- config: null
devices:
- FullPath: ""
config: null
name: vdf
name: ke-dev1-worker4
resources: {}
useAllDevices: false
...

根据需要修改后,直接保存退出即可;

遇到的问题

部署中出现问题后,可以通过下面方法查看log,分析原因:

  1. 查看rook-ceph-operatorpod的log
  2. kubectl describe <pod>查看pod的输出
  3. 查看组件对应pod的log

ceph-mon状态一直不为running

遇到两种情况下会出现ceph-mon一直能为running的状态:

  1. 节点之前部署过ceph-mon pod,遗留的/var/lib/rook/下有数据,清除掉即可。
  2. 节点存储mon data的磁盘剩余空间不够,ceph-mon启动失败,清除节点对应磁盘空间即可。

配置osd指定磁盘无效

cluster.yaml的storage做如下配置时,并不能找到按照配置的设备来部署OSD:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
storage:
useAllNodes: false
useAllDevices: false
deviceFilter:
location:
config:
storeType: bluestore
nodes:
- name: "ke-dev1-worker1"
devices:
- name: "vde"
- name: "ke-dev1-worker3"
devices:
- name: "vde"
- name: "ke-dev1-worker4"
devices:
- name: "vdf"

查看rook-ceph-operatorpod的log,发现是识别了配置的vde/vdf信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
# kubectl -n rook-ceph-systemm log rook-ceph-operator-5dc97f5c79-vq7xs
...
2018-11-29 03:28:30.239119 I | exec: nodeep-scrub is set
2018-11-29 03:28:30.252166 I | op-osd: 3 of the 3 storage nodes are valid
2018-11-29 03:28:30.252192 I | op-osd: checking if orchestration is still in progress
2018-11-29 03:28:30.259012 I | op-osd: start provisioning the osds on nodes, if needed
2018-11-29 03:28:30.338514 I | op-osd: avail devices for node ke-dev1-worker1: [{Name:vde FullPath: Config:map[]}]
2018-11-29 03:28:30.354912 I | op-osd: osd provision job started for node ke-dev1-worker1
2018-11-29 03:28:31.050925 I | op-osd: avail devices for node ke-dev1-worker3: [{Name:vde FullPath: Config:map[]}]
2018-11-29 03:28:31.071399 I | op-osd: osd provision job started for node ke-dev1-worker3
2018-11-29 03:28:32.253394 I | op-osd: avail devices for node ke-dev1-worker4: [{Name:vdf FullPath: Config:map[]}]
2018-11-29 03:28:32.269271 I | op-osd: osd provision job started for node ke-dev1-worker4
...

查看ceph-osd-prepare job的log:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# kubectl -n rook-ceph get pods -a -o wide
NAME READY STATUS RESTARTS AGE IP NODE
rook-ceph-mgr-a-959d64b9d-hfntv 1/1 Running 0 9m 192.168.32.184 ke-dev1-worker1
rook-ceph-mon-a-b79d8687d-qwcnp 1/1 Running 0 10m 192.168.53.210 ke-dev1-master3
rook-ceph-mon-b-66b895d57d-prfdp 1/1 Running 0 9m 192.168.32.150 ke-dev1-worker1
rook-ceph-mon-c-8489c4bc8b-jwm8v 1/1 Running 0 9m 192.168.2.76 ke-dev1-worker3
rook-ceph-osd-prepare-ke-dev1-worker1-bbm9t 0/2 Completed 0 8m 192.168.32.170 ke-dev1-worker1
rook-ceph-osd-prepare-ke-dev1-worker3-xg2pc 0/2 Completed 0 8m 192.168.2.122 ke-dev1-worker3
rook-ceph-osd-prepare-ke-dev1-worker4-mjlg7 0/2 Completed 0 8m 192.168.217.153 ke-dev1-worker4

# kubectl -n rook-ceph log rook-ceph-osd-prepare-ke-dev1-worker1-bbm9t provision
...
2018-11-29 03:28:36.533532 I | exec: Running command: lsblk /dev/vde --bytes --nodeps --pairs --output SIZE,ROTA,RO,TYPE,PKNAME
2018-11-29 03:28:36.537270 I | exec: Running command: sgdisk --print /dev/vde
2018-11-29 03:28:36.547839 W | inventory: skipping device vde with an unknown uuid. Failed to complete 'get disk vde uuid': exit status 2. ^GCaution: invalid main GPT header, but valid backup; regenerating main header
from backup!

Invalid partition data!

从log里找到了设备vde没有被识别的原因:invalid main GPT header

这个盘是新添加的,并没有创建GPT分区信息,手动给各个盘创建GPT header后,部署OSD正常!

扩展功能

记录下使用Rook部署Ceph系统的扩展功能需求。

如何配置分区?

Rook现在不支持配置OSD的devices为分区,代码中检测配置磁盘分区这块有待改善!

Operator discover检查

File: pkg/operator/ceph/cluster/osd/osd.go

1
2
3
4
5
6
7
8
9
10
11
func (c *Cluster) startProvisioning(config *provisionConfig) {
config.devicesToUse = make(map[string][]rookalpha.Device, len(c.Storage.Nodes))

// start with nodes currently in the storage spec
for _, node := range c.Storage.Nodes {
...
availDev, deviceErr := discover.GetAvailableDevices(c.context, n.Name, c.Namespace, n.Devices, n.Selection.DeviceFilter, n.Selection.GetUseAllDevices())
...
}
...
}

File: pkg/operator/discover/discover.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// GetAvailableDevices conducts outer join using input filters with free devices that a node has. It marks the devices from join result as in-use.
func GetAvailableDevices(context *clusterd.Context, nodeName, clusterName string, devices []rookalpha.Device, filter string, useAllDevices bool) ([]rookalpha.Device, error) {
...
// find those on the node
nodeAllDevices, ok := allDevices[nodeName]
if !ok {
return results, fmt.Errorf("node %s has no devices", nodeName)
}
// find those in use on the node
devicesInUse, err := ListDevicesInUse(context, namespace, nodeName)
if err != nil {
return results, err
}

nodeDevices := []sys.LocalDisk{}
for _, nodeDevice := range nodeAllDevices {
// TODO: Filter out devices that are in use by another cluster.
// We need to retain the devices in use for this cluster so the provisioner will continue to configure the same OSDs.
for _, device := range devicesInUse {
if nodeDevice.Name == device.Name {
break
}
}
nodeDevices = append(nodeDevices, nodeDevice)
}
claimedDevices := []sys.LocalDisk{}
// now those left are free to use
if len(devices) > 0 {
for i := range devices {
for j := range nodeDevices {
// 指定devices为分区时
// devices[i].Name 为 sdk1,而nodeDevices[j].Name 为 sdk
// 所以最后返回到上层的可用Devices为空!!
if devices[i].Name == nodeDevices[j].Name {
results = append(results, devices[i])
claimedDevices = append(claimedDevices, nodeDevices[j])
}
}
}
} else if len(filter) >= 0 {
...
} else if useAllDevices {
...
}
...
}

ListDevices函数返回的disk格式如下:
{Name:sdk ... Partitions:[{Name:sdk1 Size:4000785964544 Label: Filesystem:}] ...}

1
2
3
4
// ListDevices lists all devices discovered on all nodes or specific node if node name is provided.
func ListDevices(context *clusterd.Context, namespace, nodeName string) (map[string][]sys.LocalDisk, error) {
...
}

OSD Daemon检查

当磁盘通过了Ceph Operator Discover的相关检查后,会通过参数传递给OSD Prepare Job,如下所示:

File:rook-ceph-osd-prepare-ceph0-bphlv-ceph0.log

1
2
3
2018-12-04 10:18:51.959163 I | rookcmd: starting Rook v0.8.0-320.g3135b1d with arguments '/rook/rook ceph osd provision'
2018-12-04 10:18:51.993500 I | rookcmd: flag values: --cluster-id=c6434de9-f7ad-11e8-bec3-6c92bf2db856, --data-device-filter=, --data-devices=sdk,sdl, --data-directories=, --force-format=false, --help=false, --location=, --log-level=INFO, --metadata-device=, --node-name=ceph0, --osd-database-size=20480, --osd-journal-size=5120, --osd-store=bluestore, --osd-wal-size=576
...

上述指定了--data-devices=sdk,sdl

File: pkg/daemon/ceph/osd/daemon.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
func getAvailableDevices(context *clusterd.Context, desiredDevices string, metadataDevice string, usingDeviceFilter bool) (*DeviceOsdMapping, error) {
...
for _, device := range context.Devices {
ownPartitions, fs, err := sys.CheckIfDeviceAvailable(context.Executor, device.Name)
if err != nil {
return nil, fmt.Errorf("failed to get device %s info. %+v", device.Name, err)
}

// 从这里看出需要配置的磁盘上fs信息为空,并且没有分区信息!
if fs != "" || !ownPartitions {
// not OK to use the device because it has a filesystem or rook doesn't own all its partitions
logger.Infof("skipping device %s that is in use (not by rook). fs: %s, ownPartitions: %t", device.Name, fs, ownPartitions)
continue
}
...
}
...
}

所以现在通过任何方式无法配置Ceph OSD指定磁盘分区!

如何配置HDD+SSD的BlueStore?

配置节点OSD使用HDD+SSD的方式,可以修改cluster.yaml如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
storage:
useAllNodes: false
useAllDevices: false
location:
config:
storeType: bluestore
nodes:
...
- name: "ke-dev1-worker4"
devices:
- name: "vdf"
- name: "vdg"
config:
metadataDevice: "vdh"

部署中可以通过获取ceph-osd-prepare的log来查看是否配置正确:

1
2
3
4
# kubectl -n rook-ceph log rook-ceph-osd-prepare-ke-dev1-worker4-456nj provision
2018-11-30 03:30:37.118716 I | rookcmd: starting Rook v0.8.0-304.g0a8e109 with arguments '/rook/rook ceph osd provision'
2018-11-30 03:30:37.124652 I | rookcmd: flag values: --cluster-id=072418f4-f450-11e8-bb3e-fa163e65e579, --data-device-filter=, --data-devices=vdf,vdg, --data-directories=, --force-format=false, --help=false, --location=, --log-level=INFO, --metadata-device=vdh, --node-name=ke-dev1-worker4, --osd-database-size=20480, --osd-journal-size=5120, --osd-store=bluestore, --osd-wal-size=576
...

如上述log,传进来的正确参数应该为:

  • –data-devices=vdf,vdg
  • –metadata-device=vdh

若要指定SSD提供的wal/db分区的大小,可以加如下配置:

1
2
3
4
5
6
7
8
9
...
- name: "ke-dev1-worker4"
devices:
- name: "vdf"
- name: "vdg"
config:
metadataDevice: "vdh"
databaseSizeMB: "10240"
WalSizeMB: "10240"

如何自定义ceph.conf?

默认创建Ceph集群的配置参数在Rook代码里是固定的,在创建Cluster的时候生成Ceph集群的配置参数,参考上面章节的:Ceph默认配置

如果用户想自定义Ceph集群的配置参数,可以通过修改rook-config-override的方法。

如下是默认的rook-config-override

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# kubectl -n rook-ceph get configmap rook-config-override -o yaml
apiVersion: v1
data:
config: ""
kind: ConfigMap
metadata:
creationTimestamp: 2018-12-03T05:34:58Z
name: rook-config-override
namespace: rook-ceph
ownerReferences:
- apiVersion: v1beta1
blockOwnerDeletion: true
kind: Cluster
name: rook-ceph
uid: 229e7106-f6bd-11e8-bec3-6c92bf2db856
resourceVersion: "40803738"
selfLink: /api/v1/namespaces/rook-ceph/configmaps/rook-config-override
uid: 2c489850-f6bd-11e8-bec3-6c92bf2db856

修改已有Ceph集群配置参数

1、修改rook-config-override

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# kubectl -n rook-ceph edit configmap rook-config-override -o yaml
apiVersion: v1
data:
config: |
[global]
osd crush update on start = false
osd pool default size = 2
[osd]
bluefs_buffered_io = false
bluestore_csum_type = none
kind: ConfigMap
metadata:
creationTimestamp: 2018-12-03T05:34:58Z
name: rook-config-override
namespace: rook-ceph
ownerReferences:
- apiVersion: v1beta1
blockOwnerDeletion: true
kind: Cluster
name: rook-ceph
uid: 229e7106-f6bd-11e8-bec3-6c92bf2db856
resourceVersion: "40803738"
selfLink: /api/v1/namespaces/rook-ceph/configmaps/rook-config-override
uid: 2c489850-f6bd-11e8-bec3-6c92bf2db856

2、依次重启ceph组件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# kubectl -n rook-ceph get pods
NAME READY STATUS RESTARTS AGE
rook-ceph-mgr-a-5699bb7984-kpxgp 1/1 Running 0 2h
rook-ceph-mon-a-66854cfb5-m5d9x 1/1 Running 0 15m
rook-ceph-mon-b-c6f58986f-xpnc4 1/1 Running 0 2h
rook-ceph-mon-c-97669b7ff-kgdbp 1/1 Running 0 2h
rook-ceph-osd-0-54bdd844b-wfqk6 1/1 Running 0 2h
rook-ceph-osd-1-789cdb4c5b-rddhh 1/1 Running 0 2h
rook-ceph-osd-2-57c8644749-djs98 1/1 Running 0 2h
rook-ceph-osd-3-7566d48f85-k5mw6 1/1 Running 0 2h

# kubectl -n rook-ceph delete pod rook-ceph-mgr-a-5699bb7984-kpxgp

# kubectl -n rook-ceph delete pod rook-ceph-mon-a-66854cfb5-m5d9x
...

# kubectl -n rook-ceph delete pod rook-ceph-osd-0-54bdd844b-wfqk6

ceph-mon, ceph-osd的delete最后是one-by-one的,等待ceph集群状态为HEALTH_OK后再delete另一个

3、检查ceph组件的配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# cat /var/lib/rook/osd2/rook-ceph.config
[global]
run dir = /var/lib/rook/osd2
mon initial members = a b c
mon host = 10.96.195.188:6790,10.96.128.73:6790,10.96.51.21:6790
log file = /dev/stderr
mon cluster log file = /dev/stderr
public addr = 192.168.150.252
cluster addr = 192.168.150.252
mon keyvaluedb = rocksdb
mon_allow_pool_delete = true
mon_max_pg_per_osd = 1000
debug default = 0
debug rados = 0
debug mon = 0
debug osd = 0
debug bluestore = 0
debug filestore = 0
debug journal = 0
debug leveldb = 0
filestore_omap_backend = rocksdb
osd pg bits = 11
osd pgp bits = 11
osd pool default size = 2
osd pool default min size = 1
osd pool default pg num = 100
osd pool default pgp num = 100
osd objectstore = bluestore
crush location = root=default host=ceph5
rbd_default_features = 3
fatal signal handlers = false
osd crush update on start = false

[osd.2]
keyring = /var/lib/rook/osd2/keyring
bluestore block path = /dev/disk/by-partuuid/bad8c220-d4f7-40de-b7ff-fcc2e492ea64
bluestore block wal path = /dev/disk/by-partuuid/5315d8be-f80b-4351-95b5-026889d1dd19
bluestore block db path = /dev/disk/by-partuuid/6d3d494f-0021-4e95-b45f-59a326976cf8

[osd]
bluefs_buffered_io = false
bluestore_csum_type = none

创建Ceph集群前指定配置参数

若用户想在创建Ceph集群前指定配置参数,可以通过先手动创建名为:rook-config-overrideConfigMap,然后再创建Ceph集群。

1、创建ConfigMap后创建

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# cat ceph-override-conf.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: rook-config-override
namespace: rook-ceph
data:
config: |
[global]
osd crush update on start = false
osd pool default size = 2
[osd]
bluefs_buffered_io = false
bluestore_csum_type = none

# kubectl create -f ceph-override-conf.yaml
# kubectl create -f cluster.yaml
serviceaccount "rook-ceph-cluster" created
role "rook-ceph-cluster" created
rolebinding "rook-ceph-cluster-mgmt" created
rolebinding "rook-ceph-cluster" created
configmap "rook-config-override" created
cluster "rook-ceph" created

2、检查启动的Ceph组件配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# cat /var/lib/rook/mon-a/rook-ceph.config
[global]
fsid = e963975a-fe17-4806-b1b1-d4a6fcebd710
run dir = /var/lib/rook/mon-a
mon initial members = a
mon host = 10.96.0.239:6790
log file = /dev/stderr
mon cluster log file = /dev/stderr
public addr = 10.96.0.239
cluster addr = 192.168.239.137
mon keyvaluedb = rocksdb
mon_allow_pool_delete = true
mon_max_pg_per_osd = 1000
debug default = 0
debug rados = 0
debug mon = 0
debug osd = 0
debug bluestore = 0
debug filestore = 0
debug journal = 0
debug leveldb = 0
filestore_omap_backend = rocksdb
osd pg bits = 11
osd pgp bits = 11
osd pool default size = 2
osd pool default min size = 1
osd pool default pg num = 100
osd pool default pgp num = 100
rbd_default_features = 3
fatal signal handlers = false
osd crush update on start = false

[mon.a]
keyring = /var/lib/rook/mon-a/keyring
public bind addr = 192.168.239.137:6790

[osd]
bluefs_buffered_io = false
bluestore_csum_type = none

如何自定义crush rule?

Rook没有提供kind为crush rule的API,所以这里没法类似创建Pool那样创建一个crush rulecrush rule的定制化也比较多,可以通过CLI或者修改CRUSHMAP的方式操作。

如何升级Ceph集群?

如下,创建Ceph版本为v12的Cluster:

1
2
3
4
5
6
7
# vim cluster.yaml
...
spec:
cephVersion:
image: ceph/ceph:v12
allowUnsupported: false
...

创建后查看Ceph版本为:12.2.9

1
2
3
4
5
6
7
8
[root@rook-ceph-mgr-a-558d49cf8c-dk49n /]# ceph -v
ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)

# kubectl create -f toolbox.yaml
deployment "rook-ceph-tools" created
# kubectl -n rook-ceph exec -it rook-ceph-tools-79954fdf9d-s65wm bash
[root@ceph0 /]# ceph -v
ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)

通过edit来修改Cluster,指定image的Ceph版本为v13,如下:

1
2
3
4
5
6
7
8
# kubectl -n rook-ceph edit cluster rook-ceph
...
spec:
cephVersion:
image: ceph/ceph:v13
...

cluster "rook-ceph" edited

之后查看Ceph OSD组件会逐个删除重建,升级到指定的Ceph版本:

1
2
3
4
5
6
7
8
9
10
11
# kubectl -n rook-ceph get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
rook-ceph-mgr-a-558d49cf8c-dk49n 1/1 Running 0 29m 192.168.239.130 ceph0
rook-ceph-mon-a-6c99f7fc49-rw556 1/1 Running 0 30m 192.168.239.171 ceph0
rook-ceph-mon-b-77bbdd8676-rj22f 1/1 Running 0 29m 192.168.152.189 ceph4
rook-ceph-mon-c-c7dd7bb4b-8qclr 1/1 Running 0 29m 192.168.150.217 ceph5
rook-ceph-osd-0-c5d865db6-5dgl4 1/1 Running 0 1m 192.168.152.190 ceph4
rook-ceph-osd-1-785b4f8c6d-qf9lc 1/1 Running 0 55s 192.168.150.237 ceph5
rook-ceph-osd-2-6679497484-hjf85 0/1 Terminating 0 28m <none> ceph5
rook-ceph-osd-3-87f8d69db-tmrl5 1/1 Running 0 2m 192.168.239.184 ceph0
rook-ceph-tools-79954fdf9d-s65wm 1/1 Running 0 23m 100.64.0.20 ceph0

升级过程中,会发现会自动设置上flag:noscrub,nodeep-scrub

1
2
3
4
5
6
7
8
9
[root@ceph0 /]# ceph -s
cluster:
id: adb3db57-6f09-4c4a-a3f9-171d6cfe167a
health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set
1 osds down
Reduced data availability: 6 pgs inactive, 18 pgs down
Degraded data redundancy: 2/10 objects degraded (20.000%), 2 pgs degraded
...

待所有的OSD升级完成后,集群状态为HEALTH_OK,Ceph mgr,mon,mds组件不会自动升级:

1
2
3
4
5
6
7
8
9
10
11
# kubectl -n rook-ceph get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
rook-ceph-mgr-a-558d49cf8c-dk49n 1/1 Running 0 32m 192.168.239.130 ceph0
rook-ceph-mon-a-6c99f7fc49-rw556 1/1 Running 0 33m 192.168.239.171 ceph0
rook-ceph-mon-b-77bbdd8676-rj22f 1/1 Running 0 32m 192.168.152.189 ceph4
rook-ceph-mon-c-c7dd7bb4b-8qclr 1/1 Running 0 32m 192.168.150.217 ceph5
rook-ceph-osd-0-c5d865db6-5dgl4 1/1 Running 0 4m 192.168.152.190 ceph4
rook-ceph-osd-1-785b4f8c6d-qf9lc 1/1 Running 0 3m 192.168.150.237 ceph5
rook-ceph-osd-2-86bb5594df-tdhx4 1/1 Running 0 2m 192.168.150.244 ceph5
rook-ceph-osd-3-87f8d69db-tmrl5 1/1 Running 0 5m 192.168.239.184 ceph0
rook-ceph-tools-79954fdf9d-s65wm 1/1 Running 0 26m 100.64.0.20 ceph0

Rook V0.9.0版本里,mgr和mon会自动升级

之后单独升级Ceph的其他组件:

1
2
3
# kubectl -n rook-ceph delete pod rook-ceph-mgr-a-558d49cf8c-dk49n
# kubectl -n rook-ceph delete pod rook-ceph-mon-a-6c99f7fc49-rw556
...

但发现这些pod重启后,还是使用旧的Ceph版本!!!!

可以通过修改deployment的方法来升级Ceph mgr,mon,mds组件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# kubectl -n rook-ceph get deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
rook-ceph-mds-cephfs-a 1 1 1 1 22m
rook-ceph-mds-cephfs-b 1 1 1 1 22m
rook-ceph-mds-cephfs-c 1 1 1 1 22m
rook-ceph-mds-cephfs-d 1 1 1 1 22m
rook-ceph-mgr-a 1 1 1 1 25m
rook-ceph-mon-a 1 1 1 1 27m
rook-ceph-mon-b 1 1 1 1 26m
rook-ceph-mon-c 1 1 1 1 26m
rook-ceph-osd-0 1 1 1 1 25m
rook-ceph-osd-1 1 1 1 1 25m
rook-ceph-osd-2 1 1 1 1 25m
rook-ceph-tools 1 1 1 1 14m

# kubectl -n rook-ceph edit deployment rook-ceph-mon-a
...
image: ceph/ceph:v13
...
deployment "rook-ceph-mon-a" edited

升级Ceph MDS组件时候要全部升级,不同Ceph版本的MDSs不能组成多Active MDSs集群

总结

Rook的定位

从Rook的官方文档中看出,它的定位是Kubernetes上的存储提供框架,提供基于Kubernetes的多种存储部署,比如:Ceph,Minio,CockroachDB,Cassandra,NFS等。

Ceph只是作为其第一个提供的beta版的存储方案。

参考:Storage Provider Framework

Rook的优势

  1. 与Kubernetes集成,一键部署
  2. Rook支持通过yaml文件创建pool,cephfs,radosgw,监控等
  3. 简单扩容和小版本升级比较方便,kuberctl edit 即可

Rook的不足

  1. Rook项目时间还短,代码不够完善
  2. 不支持分区配置OSD,不能准确定制OSD的磁盘使用
  3. Rook可以一键删除Ceph pool / cephfs / radosgw和Ceph集群,没有确认,有些危险
  4. 基于容器化技术,Ceph的各个组件的IO栈又多了一层,性能会有所损耗
  5. Ceph运维增加了Kubernetes一层,对Ceph运维人员的知识栈要求又提高了

使用场景总结

所以总体来说如下:

适合使用Rook的场景

  • POC环境,测试环境
  • Kubernetes + Ceph混合部署环境
  • 对Ceph性能没强要求环境
  • 不需要经常随社区升级Ceph版本的环境

不适合使用Rook的场景

  • Ceph集群单独部署环境
  • Ceph性能强需求环境
  • 跟随Ceph社区升级版本的环境
支持原创