Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ DeepXTrace is a lightweight diagnostic tool designed to efficiently and precisel

DeepXTrace supports diagnosis of various slowdown scenarios, including:

* *Comp-Slow*: Slowdown caused by the destination rank (e.g., xPU compute latency).
* *Mixed-Slow*: Slowdown caused by the source rank(e.g., uneven expert distribution or hotspot congestion).
* *Comm-Slow*: Slowdown caused by the communication path between specific source and destination ranks(e.g., communication link issues).
* *Comp-Slow*: Slowdown caused by sender-side issues, such as uneven computation (e.g., Attention/MoE) that delays `send` communication operators.
* *Mixed-Slow*: Slowdown caused by receiver-side issues, such as uneven computation (e.g., Attention/MoE) that triggers early `recv` communication operators on GPUs, or hotspot experts that cause network Incast.
* *Comm-Slow*: Slowdown caused by the communication path between the sender and receiver (e.g., communication link issues).


![slow](figures/slow.png)
Expand All @@ -21,6 +21,8 @@ The following figure shows the latency matrix for the Combine operator's token r

![combine](figures/combine.png)

For performance analysis, use the **[DeepXTrace Heatmap Visualization Tool](tools/README.md)** to visualize communication bottlenecks.

## MoE-COMM-Metrics-Probe

A low-overhead module for measuring critical diagnostic indicators during MoE communication. Supported Implementations:
Expand Down