Skip to content

taodd/cephtrace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” Cephtrace

eBPF-Powered Dynamic Tracing for Ceph Distributed Storage

License: GPL v2 Build Status

Quick Start β€’ Documentation β€’ Demo β€’ Contributing


🎯 What is Cephtrace?

Cephtrace is a suite of eBPF-based dynamic tracing tools that provide microsecond-level visibility into your Ceph storage cluster's performance. Identify bottlenecks, diagnose slow operations, and understand exactly where latency occurs - all without restarting services or modifying configurations.

✨ Key Features

  • πŸ“Š Per-IO Latency Breakdown - See exactly where each operation spends its time
  • πŸ”„ No Downtime Required - Attach and detach from running processes dynamically
  • βš™οΈ No Configuration Needed - Just start tracing on the fly, no service restarts or config changes
  • πŸ“¦ Works with Containers - Full support for cephadm, Rook, Docker, lxd and MicroCeph
  • πŸš€ Low Overhead in Production - eBPF with the kernel uprobe dynamic instrumentation

πŸ› οΈ The Tools

πŸ”Ή osdtrace - OSD Performance Deep Dive

Trace OSD operations with detailed latency breakdown across:

  • Messenger layer: Network throttling, receive, dispatch
  • OSD processing: Queue wait, request handling, replication coordination
  • BlueStore backend: Transaction prep, I/O wait, commit latencies

Perfect for:

  • Diagnosing "slow ops" warnings
  • Understanding replication latency
  • Identifying storage vs network bottlenecks
  • Inspecting BlueStore low-level metrics

πŸ”Ή radostrace - Client-Side Operation Tracking

Monitor librados client operations in real-time:

  • Track read/write/delete/omap-related operations
  • Measure end-to-end latency from client perspective
  • Identify slow requests before they timeout
  • Debug VM/application-level performance issues

Perfect for:

  • VM/Application performance troubleshooting
  • Precisely identify underperformed OSDs in large scaled cluster

πŸ”Ή kfstrace - Kernel Client Tracing

Trace kernel-level CephFS and RBD operations:

  • Monitor CephFS file operations
  • Track RBD block I/O requests
  • Measure kernel client latencies
  • Debug mount and I/O issues

Perfect for:

  • CephFS performance analysis
  • Kernel RBD volume latency debugging
  • Kernel client troubleshooting

πŸš€ Quick Start

Example: Trace a VM's Ceph Operations from the Host

Get up and running in under 2 minutes - monitor VM I/O operations hitting your Ceph cluster:

# Download radostrace
wget https://github.com/taodd/cephtrace/releases/latest/download/radostrace
chmod +x radostrace

# Check your librados version on the host
dpkg -l | grep librados2

# Download matching DWARF file (example for Ubuntu 22.04, Ceph 17.2.6)
wget https://raw.githubusercontent.com/taodd/cephtrace/main/files/ubuntu/radostrace/17.2.6-0ubuntu0.22.04.2_dwarf.json

# Find the QEMU process for your VM
ps aux | grep qemu

# Start tracing the VM's RBD operations (replace <qemu-pid> with actual PID)
sudo ./radostrace -i 17.2.6-0ubuntu0.22.04.2_dwarf.json -p <qemu-pid>

# Or trace all VMs on the host
sudo ./radostrace -i 17.2.6-0ubuntu0.22.04.2_dwarf.json

Sampl Output:

     pid  client     tid  pool  pg     acting       w/r    size  latency     object[ops][offset,length]
   19015   34206  419357     2  1e     [1,11,121]     W        0     887     rbd_header.374de3730ad0[watch ]
   19015   34206  419358     2  1e     [1,11,121]     W        0    8561     rbd_header.374de3730ad0[call ]
   19015   34206  419359     2  39     [0,121,11]     R     4096    1240     rbd_data.374de3730ad0.0000000000000000[read ][0, 4096]
   19015   34206  419360     2  39     [0,121,11]     R     4096    1705     rbd_data.374de3730ad0.0000000000000000[read ][4096, 4096]
   19015   34206  419361     2  39     [0,121,11]     R     4096    1334     rbd_data.374de3730ad0.0000000000000000[read ][12288, 4096]
   19015   34206  419362     2  2b     [77,11,1]     iR     4096    2180     rbd_data.374de3730ad0.00000000000000ff[read ][4128768, 4096]

πŸ“– Detailed guide: Getting Started

🎬 Demo

πŸ“Ί 10-Minute Live Demo(Demo is explained in below Cephalocon talk)

See cephtrace in action troubleshooting real performance issues.

🎀 Cephalocon 2025 Presentation

"Efficient Ceph Performance Troubleshooting in Production Using eBPF" - Learn the techniques and real-world use cases.

πŸ“š Documentation

πŸ“˜ User Guides

Tool Description Link
osdtrace OSD-side tracing with detailed latency breakdown Guide
radostrace Client-side librados operation tracing Guide
kfstrace Kernel client (CephFS/RBD) tracing Guide
DWARF Files Managing debug information for tracing Guide

πŸ“Š Analysis & Tools

🐳 Deployment Scenarios

πŸ”¨ Building

Requirements

  • Kernel: Linux 5.8 or later
  • Architecture: x86_64

🀝 Contributing

We welcome contributions! Here's how you can help:

πŸ› Report Issues

Found a bug or have a feature request?

πŸ“ Submit DWARF Files

Help expand version support:

  1. Generate DWARF JSON for your Ceph version
  2. Submit a PR to files/ directory
  3. Help others with the same version

πŸ“„ License

This project is licensed under the GNU General Public License v2.0 - see the LICENSE file for details.

πŸ™ Acknowledgments


Made with ❀️ for the Ceph community

⬆ Back to Top

About

eBPF based ceph tracing and monitoring

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 8