ceph mutex lock monitor

在分析ceph性能的时候,发现OSD处理线程等待PG lock的时间比较长,当时就想能不能像ceph perf counter一样,统计下PG lock上的等待时间?(后来分析代码知道了PG lock互斥的原因:Ceph PGLock and RWState)

虽然最后证明这样的办法不可靠,不方便分析,但本次尝试也给出了在ceph代码中添加tracepoint的方法和步骤,记录如下;

ceph perf counter

现在ceph系统支持perf counter统计mutex花费的时间

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
OPTION(mutex_perf_counter, OPT_BOOL, false)
ceph daemon /var/run/ceph/ceph-osd.0.asok config set mutex_perf_counter 1
ceph daemon /var/run/ceph/ceph-osd.0.asok config set mutex_perf_counter 0

# ceph daemon /var/run/ceph/ceph-osd.0.asok perf dump | grep mutex
"mutex-FileJournal::completions_lock": {
"mutex-FileJournal::finisher_lock": {
"mutex-FileJournal::write_lock": {
"mutex-FileJournal::writeq_lock": {
"mutex-JOS::ApplyManager::apply_lock": {
"mutex-JOS::ApplyManager::com_lock": {
"mutex-JOS::SubmitManager::lock": {
"mutex-WBThrottle::lock": {
"wait": {
"avgcount": 0,
"sum": 0.000000000
}
},

这个能统计出一段时间内所有mutex lock花费的时间,但是不能显示具体某一个mutex lock的花费时间。

mutex添加tracepoint

添加tracepoint到mutex实现中,通过tracepoint log来分析具体某一个mutex lock的花费时间。

src/tracing目录添加对应的mutex.tp文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include "include/int_types.h"

TRACEPOINT_EVENT(mutex, lock_enter,
TP_ARGS(
const void *, addr,
const char *, name),
TP_FIELDS(
ctf_integer_hex(unsigned long, addr, addr)
ctf_string(name, name)
)
)

TRACEPOINT_EVENT(mutex, lock_exit,
TP_ARGS(
const void *, addr,
const char *, name),
TP_FIELDS(
ctf_integer_hex(unsigned long, addr, addr)
ctf_string(name, name)
)
)

修改src/common/Mutex.cc文件

添加:

1
2
3
4
5
#ifdef WITH_LTTNG
#include “tracing/mutex.h"
#else
#define tracepoint(...)
#endif

修改src/tracing/Makefile.am文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
14d13
< mutex.tp \
23,24d21
< mutex.c \
< mutex.h \
63,73d59
< nodist_libmutex_tp_la_SOURCES = \
< mutex.c \
< mutex.h
< endif
< libmutex_tp_la_LIBADD = -llttng-ust -ldl
< libmutex_tp_la_CPPFLAGS = -DTRACEPOINT_PROBE_DYNAMIC_LINKAGE
< libmutex_tp_la_CFLAGS = -I$(top_srcdir)/src $(AM_CFLAGS)
< libmutex_tp_la_LDFLAGS =
<
<
< if WITH_LTTNG
78d63
< libmutex_tp.la \
87d71
< mutex.h \
95d78
< $(nodist_libmutex_tp_la_SOURCES) \

修改src/Makefile-env.am文件

1
2
添加:
LIBMUTEX_TP = tracing/libmutex_tp.la

修改src/common/Makefile.am文件

添加:

1
2
3
if WITH_LTTNG
libcommon_la_LIBADD += $(LIBMUTEX_TP)
endif

修改src/test/Makefile.am文件

添加:

1
2
3
if WITH_LTTNG
UNITTEST_LDADD += $(LIBMUTEX_TP)
endif

修改PG lock

添加PG lock的perf counter统计。
修改:

1
2
3
4
5
6
7
8
PG::PG(OSDService *o, OSDMapRef curmap,
const PGPool &_pool, spg_t p) :
osd(o),
cct(o->cct),
...
map_lock("PG::map_lock"),
osdmap_ref(curmap), last_persisted_osdmap_ref(curmap), pool(_pool),
_lock((std::string("PG::_lock") + _pool.name.c_str()).c_str(), false, true, false, o->cct),

编译测试

最终编译出来的安装包中,rbd命令不能执行,报以下错误:

1
2
3
4
5
# rbd
LTTng-UST: Error (-17) while registering tracepoint probe. Duplicate registration of tracepoint probes having the same name is not allowed.
Aborted (core dumped)
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg23627.html
http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20353

上述问题是PG lock的命名重复导致的,可以先忽略,ceph是能正常工作的,也能通过lttng搜集出来mutex log的trace;

但是因为mutex太多地方用到,这个log太多了,没办法分析,需要寻求别的方法追踪PG lock。

支持原创