From 4b4e3a763002a36edaa625fcb503eceefe6b6171 Mon Sep 17 00:00:00 2001 From: wangfakang Date: Fri, 19 Dec 2025 18:42:39 +0800 Subject: [PATCH 1/2] update readme. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: wangfakang Co-authored-by: 毅松 --- README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 8d9789c..6240ea5 100644 --- a/README.md +++ b/README.md @@ -4,9 +4,9 @@ DeepXTrace is a lightweight diagnostic tool designed to efficiently and precisel DeepXTrace supports diagnosis of various slowdown scenarios, including: -* *Comp-Slow*: Slowdown caused by the destination rank (e.g., xPU compute latency). -* *Mixed-Slow*: Slowdown caused by the source rank(e.g., uneven expert distribution or hotspot congestion). -* *Comm-Slow*: Slowdown caused by the communication path between specific source and destination ranks(e.g., communication link issues). +* *Comp-Slow*: Slowdown caused by sender-side issues, such as uneven computation (e.g., Attention/MoE) delays send communication operators. +* *Mixed-Slow*: Slowdown caused by receiver-side issues, such as uneven computation (e.g., Attention/MoE) triggers early recv communication operators on GPUs, or hotspot experts cause network Incast. +* *Comm-Slow*: Slowdown caused by the communication path between the sender and receiver(e.g., communication link issues). ![slow](figures/slow.png) @@ -21,6 +21,8 @@ The following figure shows the latency matrix for the Combine operator's token r ![combine](figures/combine.png) +For performance analysis, use the **[DeepXTrace Heatmap Visualization Tool](tools/README.md)** to visualize communication bottlenecks. + ## MoE-COMM-Metrics-Probe A low-overhead module for measuring critical diagnostic indicators during MoE communication. Supported Implementations: From efc55d8f1258cf3428e48f7198c3047316131ef7 Mon Sep 17 00:00:00 2001 From: sky Date: Fri, 19 Dec 2025 20:50:27 +0800 Subject: [PATCH 2/2] Update README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 6240ea5..b0ee48f 100644 --- a/README.md +++ b/README.md @@ -4,9 +4,9 @@ DeepXTrace is a lightweight diagnostic tool designed to efficiently and precisel DeepXTrace supports diagnosis of various slowdown scenarios, including: -* *Comp-Slow*: Slowdown caused by sender-side issues, such as uneven computation (e.g., Attention/MoE) delays send communication operators. -* *Mixed-Slow*: Slowdown caused by receiver-side issues, such as uneven computation (e.g., Attention/MoE) triggers early recv communication operators on GPUs, or hotspot experts cause network Incast. -* *Comm-Slow*: Slowdown caused by the communication path between the sender and receiver(e.g., communication link issues). +* *Comp-Slow*: Slowdown caused by sender-side issues, such as uneven computation (e.g., Attention/MoE) that delays `send` communication operators. +* *Mixed-Slow*: Slowdown caused by receiver-side issues, such as uneven computation (e.g., Attention/MoE) that triggers early `recv` communication operators on GPUs, or hotspot experts that cause network Incast. +* *Comm-Slow*: Slowdown caused by the communication path between the sender and receiver (e.g., communication link issues). ![slow](figures/slow.png)