diff --git a/README.md b/README.md index 8d9789c..b0ee48f 100644 --- a/README.md +++ b/README.md @@ -4,9 +4,9 @@ DeepXTrace is a lightweight diagnostic tool designed to efficiently and precisel DeepXTrace supports diagnosis of various slowdown scenarios, including: -* *Comp-Slow*: Slowdown caused by the destination rank (e.g., xPU compute latency). -* *Mixed-Slow*: Slowdown caused by the source rank(e.g., uneven expert distribution or hotspot congestion). -* *Comm-Slow*: Slowdown caused by the communication path between specific source and destination ranks(e.g., communication link issues). +* *Comp-Slow*: Slowdown caused by sender-side issues, such as uneven computation (e.g., Attention/MoE) that delays `send` communication operators. +* *Mixed-Slow*: Slowdown caused by receiver-side issues, such as uneven computation (e.g., Attention/MoE) that triggers early `recv` communication operators on GPUs, or hotspot experts that cause network Incast. +* *Comm-Slow*: Slowdown caused by the communication path between the sender and receiver (e.g., communication link issues). ![slow](figures/slow.png) @@ -21,6 +21,8 @@ The following figure shows the latency matrix for the Combine operator's token r ![combine](figures/combine.png) +For performance analysis, use the **[DeepXTrace Heatmap Visualization Tool](tools/README.md)** to visualize communication bottlenecks. + ## MoE-COMM-Metrics-Probe A low-overhead module for measuring critical diagnostic indicators during MoE communication. Supported Implementations: