问题描述
在使用Rook部署的Ceph集群里,配置好CephFS后,在比较长的目录下,读写文件失败。
现象如下:
使用ceph-fuse客户端时:
1 | root@ceph3:/mnt/test/volumes/kubernetes/kubernetes/kubernetes-dynamic-pvc-9adbb10c-f86a-11e8-96ff-9247c38478e0# cat fox |
使用kernel client时:
1 | root@ceph3:/mnt/test/volumes/kubernetes/kubernetes/kubernetes-dynamic-pvc-9adbb10c-f86a-11e8-96ff-9247c38478e0# cat fox |
问题分析
打开ceph-fuse client的log:1
# ceph-fuse -m 10.10.15.89:6790,10.10.15.198:6790 /mnt/test/ -n client.admin --keyring=./yangguanjun/keyring --debug-client=20
然后在 /var/log/ceph/里查看log,有err:-36
1
2
3
42018-12-05 18:42:04.908612 7fd5955bc700 3 client.5213 ll_read 0x565512bf8760 0x10000000007 0~4096
2018-12-05 18:42:04.910114 7fd599dc5700 10 client.5213 ms_handle_connect on 172.16.1.54:6800/33886
2018-12-05 18:42:04.910887 7fd5955bc700 10 client.5213 check_pool_perm on pool 2 ns fsvolumens_kubernetes-dynamic-pvc-9adbb10c-f86a-11e8-96ff-9247c38478e0 rd_err = -36 wr_err = -36
2018-12-05 18:42:04.910907 7fd5955bc700 10 client.5213 check_pool_perm on pool 2 ns fsvolumens_kubernetes-dynamic-pvc-9adbb10c-f86a-11e8-96ff-9247c38478e0 rd_err = -36 wr_err = -36
查看linux里的 include/uapi/asm-generic/errno.h
,有如下定义:
1 |
因为这个错误跟ceph-fuse和kernel client无关,所以猜测是osd相关地方的问题;
然后搜索ceph代码,osd相关的地方有如下几处检查:
文件:osd/PrimaryLogPG.cc1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31/** do_op - do an op
* pg lock will be held (if multithreaded)
* osd_lock NOT held.
*/
void PrimaryLogPG::do_op(OpRequestRef& op)
{
...
// object name too long?
if (m->get_oid().name.size() > cct->_conf->osd_max_object_name_len) {
dout(4) << "do_op name is longer than "
<< cct->_conf->osd_max_object_name_len
<< " bytes" << dendl;
osd->reply_op_error(op, -ENAMETOOLONG);
return;
}
if (m->get_hobj().get_key().size() > cct->_conf->osd_max_object_name_len) {
dout(4) << "do_op locator is longer than "
<< cct->_conf->osd_max_object_name_len
<< " bytes" << dendl;
osd->reply_op_error(op, -ENAMETOOLONG);
return;
}
if (m->get_hobj().nspace.size() > cct->_conf->osd_max_object_namespace_len) {
dout(4) << "do_op namespace is longer than "
<< cct->_conf->osd_max_object_namespace_len
<< " bytes" << dendl;
osd->reply_op_error(op, -ENAMETOOLONG);
return;
}
...
}
于是打开OSD的log:1
2
3
4[root@rook-ceph-tools /]# ceph tell osd.0 config set debug_osd 5
Set debug_osd to 5/5
[root@rook-ceph-tools /]# ceph tell osd.1 config set debug_osd 5
Set debug_osd to 5/5
然后继续测试重新问题,抓取osd的log:1
2
3# grep "longer than" *
rook-ceph-osd-0-85f5bf454f-64w7d-ceph3.log:2018-12-05 11:14:38.864707 7fbe34194700 4 osd.0 pg_epoch: 24 pg[2.15( empty local-lis/les=21/22 n=0 ec=20/20 lis/c 21/21 les/c/f 22/22/0 21/21/20) [0,1] r=0 lpr=21 crt=0'0 mlcod 0'0 active+clean] do_op namespace is longer than 64 bytes
rook-ceph-osd-0-85f5bf454f-64w7d-ceph3.log:2018-12-05 11:14:38.864853 7fbe3819c700 4 osd.0 pg_epoch: 24 pg[2.15( empty local-lis/les=21/22 n=0 ec=20/20 lis/c 21/21 les/c/f 22/22/0 21/21/20) [0,1] r=0 lpr=21 crt=0'0 mlcod 0'0 active+clean] do_op namespace is longer than 64 bytes
而对应代码处的检查为:1
if (m->get_hobj().nspace.size() > cct->_conf->osd_max_object_namespace_len)
查看osd相关的配置:
1 | [root@rook-ceph-osd-0-85f5bf454f-64w7d-ceph3 ceph]# cat ceph.conf |
(o゜▽゜)o☆[BINGO!],找到原因了,哪为什么会有这个配置呢??
在Rook的代码里有下面代码,文件:pkg/daemon/ceph/osd/device.go
1 | func writeConfigFile(cfg *osdConfig, context *clusterd.Context, cluster *cephconfig.ClusterInfo, location string) error { |
可以看出这个配置项是在配置OSD使用目录 或 使用FileStore时添加的。
查看Ceph关于Filestore里的说明,指出在ext4的文件系统里,因为xattrs长度的限制,启动Filestore会被限制,用户可以在确定object name比较短的应用场景里,配置下面的两个参数来使用ext4的Filestore。
1 | osd max object name len = 256 |
参考:http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/#not-recommended
查看我们的配置,cluster.yaml里对OSD的配置如下:
1 | ... |
而Rook在配置OSD时候,是不支持配置分区的。若配置为分区时,实际上Rook代码检查端会跳过所有有分区的盘,然后默认使用/var/lib/rook/osd<id>/
这个目录来创建OSD,如下:1
2
3
4
5
6
7
8
9
10
11# ll /var/lib/rook/osd0/
total 3344
drwxr--r-- 3 root root 4096 Dec 5 16:44 ./
drwxr-xr-x 5 root root 4096 Dec 5 16:44 ../
lrwxrwxrwx 1 root root 34 Dec 5 16:44 block -> /var/lib/rook/osd0/bluestore-block
lrwxrwxrwx 1 root root 31 Dec 5 16:44 block.db -> /var/lib/rook/osd0/bluestore-db
lrwxrwxrwx 1 root root 32 Dec 5 16:44 block.wal -> /var/lib/rook/osd0/bluestore-wal
-rw-r--r-- 1 root root 2 Dec 5 16:44 bluefs
-rw-r--r-- 1 root root 472432779264 Dec 5 19:22 bluestore-block
-rw-r--r-- 1 root root 1073741824 Dec 5 16:44 bluestore-db
-rw-r--r-- 1 root root 603979776 Dec 5 19:27 bluestore-wal
解决办法
临时方法
修改osd_max_object_namespace_len
为更长的值即可。
1 | [root@rook-ceph-tools /]# ceph tell osd.0 config set osd_max_object_namespace_len 256 |
推荐方法
Ceph OSD使用BlueStore,在Rook对应Ceph集群的clustre.yaml文件里,指定OSD使用整块磁盘。
1 | ... |
注:指定的磁盘要创建GPT Header,并且删除所有分区