Quick Start β’ Documentation β’ Demo β’ Contributing
Cephtrace is a suite of eBPF-based dynamic tracing tools that provide microsecond-level visibility into your Ceph storage cluster's performance. Identify bottlenecks, diagnose slow operations, and understand exactly where latency occurs - all without restarting services or modifying configurations.
- π Per-IO Latency Breakdown - See exactly where each operation spends its time
- π No Downtime Required - Attach and detach from running processes dynamically
- βοΈ No Configuration Needed - Just start tracing on the fly, no service restarts or config changes
- π¦ Works with Containers - Full support for cephadm, Rook, Docker, lxd and MicroCeph
- π Low Overhead in Production - eBPF with the kernel uprobe dynamic instrumentation
πΉ osdtrace - OSD Performance Deep Dive
Trace OSD operations with detailed latency breakdown across:
- Messenger layer: Network throttling, receive, dispatch
- OSD processing: Queue wait, request handling, replication coordination
- BlueStore backend: Transaction prep, I/O wait, commit latencies
Perfect for:
- Diagnosing "slow ops" warnings
- Understanding replication latency
- Identifying storage vs network bottlenecks
- Inspecting BlueStore low-level metrics
πΉ radostrace - Client-Side Operation Tracking
Monitor librados client operations in real-time:
- Track read/write/delete/omap-related operations
- Measure end-to-end latency from client perspective
- Identify slow requests before they timeout
- Debug VM/application-level performance issues
Perfect for:
- VM/Application performance troubleshooting
- Precisely identify underperformed OSDs in large scaled cluster
πΉ kfstrace - Kernel Client Tracing
Trace kernel-level CephFS and RBD operations:
- Monitor CephFS file operations
- Track RBD block I/O requests
- Measure kernel client latencies
- Debug mount and I/O issues
Perfect for:
- CephFS performance analysis
- Kernel RBD volume latency debugging
- Kernel client troubleshooting
Example: Trace a VM's Ceph Operations from the Host
Get up and running in under 2 minutes - monitor VM I/O operations hitting your Ceph cluster:
# Download radostrace
wget https://github.com/taodd/cephtrace/releases/latest/download/radostrace
chmod +x radostrace
# Check your librados version on the host
dpkg -l | grep librados2
# Download matching DWARF file (example for Ubuntu 22.04, Ceph 17.2.6)
wget https://raw.githubusercontent.com/taodd/cephtrace/main/files/ubuntu/radostrace/17.2.6-0ubuntu0.22.04.2_dwarf.json
# Find the QEMU process for your VM
ps aux | grep qemu
# Start tracing the VM's RBD operations (replace <qemu-pid> with actual PID)
sudo ./radostrace -i 17.2.6-0ubuntu0.22.04.2_dwarf.json -p <qemu-pid>
# Or trace all VMs on the host
sudo ./radostrace -i 17.2.6-0ubuntu0.22.04.2_dwarf.json pid client tid pool pg acting w/r size latency object[ops][offset,length]
19015 34206 419357 2 1e [1,11,121] W 0 887 rbd_header.374de3730ad0[watch ]
19015 34206 419358 2 1e [1,11,121] W 0 8561 rbd_header.374de3730ad0[call ]
19015 34206 419359 2 39 [0,121,11] R 4096 1240 rbd_data.374de3730ad0.0000000000000000[read ][0, 4096]
19015 34206 419360 2 39 [0,121,11] R 4096 1705 rbd_data.374de3730ad0.0000000000000000[read ][4096, 4096]
19015 34206 419361 2 39 [0,121,11] R 4096 1334 rbd_data.374de3730ad0.0000000000000000[read ][12288, 4096]
19015 34206 419362 2 2b [77,11,1] iR 4096 2180 rbd_data.374de3730ad0.00000000000000ff[read ][4128768, 4096]
π Detailed guide: Getting Started
πΊ 10-Minute Live Demo(Demo is explained in below Cephalocon talk)
See cephtrace in action troubleshooting real performance issues.
π€ Cephalocon 2025 Presentation
"Efficient Ceph Performance Troubleshooting in Production Using eBPF" - Learn the techniques and real-world use cases.
| Tool | Description | Link |
|---|---|---|
| osdtrace | OSD-side tracing with detailed latency breakdown | Guide |
| radostrace | Client-side librados operation tracing | Guide |
| kfstrace | Kernel client (CephFS/RBD) tracing | Guide |
| DWARF Files | Managing debug information for tracing | Guide |
- Analyzing Radostrace Logs - Extract insights from client traces
- Analyzing Osdtrace Logs - Deep-dive into OSD performance data
- Tracing Containerized Ceph - cephadm, Rook, Docker, LXD
- Tracing MicroCeph - Snap-based deployments
- Building from Source - Compilation and installation guide
- Kernel: Linux 5.8 or later
- Architecture: x86_64
We welcome contributions! Here's how you can help:
Found a bug or have a feature request?
- Open an issue
- Provide Ceph version, OS, and reproduction steps
Help expand version support:
- Generate DWARF JSON for your Ceph version
- Submit a PR to
files/directory - Help others with the same version
This project is licensed under the GNU General Public License v2.0 - see the LICENSE file for details.
Made with β€οΈ for the Ceph community