Author: Wenbo Zhang (Linux Kernel Engineer of the EE team at PingCAP)
Transcreator: Caitin Chen; Editor: Tom Dewan
Distributed clusters might encounter performance problems or unpredictable failures, especially when they are running in the cloud. Of all the kinds of failures, kernel failures may be the most difficult to analyze and simulate.
A practical solution is Berkeley Packet Filter (BPF), a highly flexible, efficient virtual machine that runs in the Linux kernel. It allows bytecode to be safely executed in various hooks, which exist in a variety of Linux kernel subsystems. BPF is mainly used for networking, tracing, and security.
Based on BPF, there are two development modes:
In this post, I'll describe why libbpf-tools, a collection of applications based on the libbpf + BPF CO-RE mode, is a better solution than bcc-tools and how we're using libbpf-tools at PingCAP.
BCC embeds LLVM or Clang to rewrite, compile, and load BPF programs. Although it does its best to simplify BPF developers’ work, it has these drawbacks:
By contrast, BPF CO-RE has these advantages:
For more details, see Why libbpf and BPF CO-RE?.
Performance optimization master Brendan Gregg used libbpf + BPF CO-RE to convert a BCC tool and compared their performance data. He said: “As my colleague Jason pointed out, the memory footprint of opensnoop as CO-RE is much lower than opensnoop.py. 9 Mbytes for CO-RE vs 80 Mbytes for Python.”
According to his research, compared with BCC at runtime, libbpf + BPF CO-RE reduced memory overhead by nearly nine times, which greatly benefits servers with scarce physical memory.
At PingCAP, we've been following BPF and its community development for a long time. In the past, every time we added a new machine, we had to install a set of BCC dependencies on it, which was troublesome. After Andrii Nakryiko (the libbpf + BPF CO-RE project's leader) added the first libbpf-tools to the BCC project, we did our research and switched from bcc-tools to libbpf-tools. Fortunately, during the switch, we got guidance from him, Brendan, and Yonghong Song (the BTF project's leader). We've converted 18 BCC or bpftrace tools to libbpf + BPF CO-RE, and we're using them in our company.
For example, when we analyzed the I/O performance of a specific workload, we used multiple performance analysis tools at the block layer:
Task | Performance analysis tool |
Check I/O requests' latency distribution | ./biolatency -d nvme0n1 |
Analyze I/O mode | ./biopattern -T 1 -d 259:0 |
Check the request size distribution diagram when the task sent physical I/O requests | ./bitesize -c fio -T |
Analyze each physical I/O | ./biosnoop -d nvme0n1 |
The analysis results helped us optimize I/O performance. We're also exploring whether the scheduler-related libbpf-tools are helpful for tuning the TiDB database.
These tools are universal: feel free to give them a try. In the future, we'll implement more tools based on libbpf-tools. If you'd like to learn more about our experience with these tools, you can join the TiDB community on Slack.