Add a visualization tool for DeepXTrace heatmap generation #10

wangfakang · 2025-12-11T09:45:35Z

Summary by Sourcery

Add a command-line Python tool to generate DeepXTrace performance heatmaps from log file matrix data and document its usage in the tools README.

New Features:

Introduce a DeepXTrace heatmap generator that parses matrix data from logs and produces color-coded visualizations with a custom colormap.
Provide a CLI interface to configure heatmap generation parameters such as title, figure size, DPI, output format, and cell size scaling.

Documentation:

Add tools documentation describing installation, input format expectations, CLI options, and usage examples for the DeepXTrace heatmap generator.

sourcery-ai · 2025-12-11T09:45:41Z

Reviewer's Guide

Adds a standalone Python CLI tool to generate DeepXTrace heatmap visualizations from log files plus accompanying usage documentation.

Sequence diagram for DeepXTrace heatmap CLI execution flow

sequenceDiagram
    actor User
    participant CLI as DeepXTraceHeatmapCLI
    participant Argparse
    participant FileSystem
    participant Parser as MatrixParser
    participant Plotter as HeatmapPlotter
    participant Matplotlib

    User->>CLI: python deepxtrace_heatmap.py input_file --options

    CLI->>Argparse: parse_args()
    Argparse-->>CLI: args

    CLI->>FileSystem: open(input_file)
    FileSystem-->>CLI: log_content

    CLI->>Parser: parse_matrix_data(log_content)
    Parser-->>CLI: data_matrix

    CLI->>Plotter: plot_deepxtrace_heatmap(data_matrix, title, figsize, dpi, format, cell_ratio)

    Plotter->>Matplotlib: configure_figure_and_colormap()
    Matplotlib-->>Plotter: figure_axes

    Plotter->>Matplotlib: sns.heatmap(...)
    Matplotlib-->>Plotter: rendered_heatmap

    Plotter->>Matplotlib: savefig("deepxtrace.{format}")
    Matplotlib-->>Plotter: file_written

    Plotter-->>CLI: completion
    CLI-->>User: print saving_started_and_completed_timestamps

Class diagram for deepxtrace_heatmap Python tool structure

classDiagram
    class DeepxtraceHeatmapModule {
        <<module>>
        +create_optimized_ryg_cmap() LinearSegmentedColormap
        +parse_matrix_data(log_data string) ndarray
        +read_log_file(file_path string) string
        +plot_deepxtrace_heatmap(matrix ndarray, title string, figsize float[], dpi int, output_format string, cell_ratio float) void
        +main() void
    }

    class ArgparseModule {
        <<library>>
    }

    class NumpyModule {
        <<library>>
    }

    class MatplotlibPyplotModule {
        <<library>>
    }

    class SeabornModule {
        <<library>>
    }

    DeepxtraceHeatmapModule ..> ArgparseModule : uses
    DeepxtraceHeatmapModule ..> NumpyModule : uses
    DeepxtraceHeatmapModule ..> MatplotlibPyplotModule : uses
    DeepxtraceHeatmapModule ..> SeabornModule : uses

File-Level Changes

Change	Details	Files
Introduce a DeepXTrace heatmap generator script with CLI support for reading log files, parsing matrix data, and rendering configurable heatmaps to disk.	Create a custom optimized red-yellow-green LinearSegmentedColormap for visualizing token wait times Implement matrix parsing from log content using regex to extract bracketed numeric sequences into a 2D NumPy array Add robust file-reading helper with basic error handling and user-friendly messages Generate a log-scaled seaborn heatmap with dynamic figure sizing, annotation font sizing, and customized colorbar/labels Provide a command-line interface that wires together input parsing, log reading, matrix conversion, and heatmap generation with configurable output options	`tools/deepxtrace_heatmap.py`
Document how to install dependencies and use the DeepXTrace heatmap visualization tool from the command line, including expected input format and sample output.	Describe tool purpose and main capabilities in a new README for tools Document installation requirements and basic/advanced CLI usage examples Specify CLI parameters, defaults, and expected log input matrix format with a concrete example Include a sample output heatmap reference image path for quick visual confirmation	`tools/README.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

gemini-code-assist · 2025-12-11T09:45:49Z

Summary of Changes

Hello @wangfakang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new utility to enhance the analysis of DeepXTrace performance data. It provides a dedicated Python tool that transforms raw numerical matrix data from log files into visually intuitive heatmaps. This aims to streamline the process of identifying performance bottlenecks and patterns by offering a clear, customizable graphical representation of the data.

Highlights

New Visualization Tool: Introduced a new Python-based tool for generating heatmap visualizations specifically designed for DeepXTrace performance data.
Data Parsing and Processing: The tool automatically parses matrix data from log files, supporting Dispatch or Combine matrix formats, and applies a log-scale transformation for better visualization of varied data.
Customizable Heatmaps: Heatmaps are generated with an optimized Red-Yellow-Green colormap, dynamic cell sizing, and configurable output options including title, figure dimensions, DPI, and format (PNG, SVG, PDF).
Command-Line Interface: A user-friendly command-line interface is provided, allowing easy configuration of heatmap generation parameters.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Consider validating the parsed matrix (e.g., non-empty, consistent row lengths) and failing with a clear error message before calling np.array and plotting, to avoid confusing runtime errors on malformed logs.
The output filename is currently hard-coded to deepxtrace.<format>; you may want to allow the user to override this (e.g., via a --output argument) so multiple runs on different inputs don’t overwrite each other.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider validating the parsed matrix (e.g., non-empty, consistent row lengths) and failing with a clear error message before calling `np.array` and plotting, to avoid confusing runtime errors on malformed logs.
- The output filename is currently hard-coded to `deepxtrace.<format>`; you may want to allow the user to override this (e.g., via a `--output` argument) so multiple runs on different inputs don’t overwrite each other.

## Individual Comments

### Comment 1
<location> `tools/deepxtrace_heatmap.py:24-27` </location>
<code_context>
+    return LinearSegmentedColormap.from_list("optimized_ryg", colors)
+
+
+def parse_matrix_data(log_data):
+    """Parse matrix data from log string containing bracketed number sequences"""
+    number_sequences = re.findall(r'\[([\d\s]+)\]', log_data)
+    return np.array([list(map(int, seq.split())) for seq in number_sequences])
+
+
</code_context>

<issue_to_address>
**issue:** Handle empty or irregular matrix sequences more defensively before constructing the numpy array.

`re.findall` can return an empty list, and inconsistent row lengths will produce a ragged array. That can lead to unexpected shapes or downstream errors when accessing `matrix.shape`. Consider validating `number_sequences` (non-empty, consistent lengths) and raising a clear error on malformed input instead of relying on later failures.
</issue_to_address>

### Comment 2
<location> `tools/deepxtrace_heatmap.py:62-63` </location>
<code_context>
+        cell_ratio: Cell size scaling factor (default: 1.5)
+    """
+    # Calculate adjusted figure size based on matrix dimensions
+    rows, cols = matrix.shape
+    adjusted_figsize = (figsize[0] * cell_ratio * (cols / 10),
+                        figsize[1] * cell_ratio * (rows / 10))
+
</code_context>

<issue_to_address>
**suggestion (performance):** Consider clamping the auto-scaled figure size to avoid excessively large outputs for big matrices.

Because `adjusted_figsize` scales linearly with `rows` and `cols`, very large matrices can produce excessively large figures, risking high memory use and unusable plots. Consider capping the width/height (or switching to a nonlinear scaling) so sizes stay within reasonable bounds while still reflecting matrix dimensions.
</issue_to_address>

### Comment 3
<location> `tools/deepxtrace_heatmap.py:82-90` </location>
<code_context>
+    annot_size = max(8, min(20, 10 * cell_ratio))
+
+    # Generate heatmap
+    ax = sns.heatmap(
+        log_matrix,
+        cmap=cmap,
+        norm=norm,
+        annot=matrix,
+        fmt='.2e',
+        linewidths=0.5,
+        linecolor="white",
+        annot_kws={
+            "size": annot_size,
+            "color": "black"
</code_context>

<issue_to_address>
**suggestion (performance):** Automatically disabling annotations for large matrices would improve performance and readability.

`annot=matrix` doesn’t scale well for large matrices in terms of runtime and visual noise. Consider disabling annotations when `rows * cols` exceeds a threshold, or gate them behind a parameter/CLI flag so large traces stay responsive while smaller ones can still show detailed values.

Suggested implementation:

```python
    # Create colormap and normalize data
    cmap = create_optimized_ryg_cmap()
    log_matrix = np.log1p(matrix)
    norm = plt.Normalize(vmin=log_matrix.min(), vmax=log_matrix.max())

    # Dynamic annotation size based on cell size
    annot_size = max(8, min(20, 10 * cell_ratio))

    # Automatically disable annotations for large matrices (performance & readability)
    # Threshold can be tuned; 400 cells (~20x20) keeps small traces detailed and larger ones clean.
    show_annotations = matrix.size <= 400

    if show_annotations:
        annot_data = matrix
        annot_kws = {
            "size": annot_size,
            "color": "black",
        }
    else:
        annot_data = None
        annot_kws = None

    # Generate heatmap
    ax = sns.heatmap(
        log_matrix,
        cmap=cmap,
        norm=norm,
        annot=annot_data,
        fmt='.2e',
        linewidths=0.5,
        linecolor="white",
        annot_kws=annot_kws,
        cbar_kws={
            "label": "Log(Value + 1) Scale",
            "format": ScalarFormatter(),
            "shrink": 0.8
        }
    )

    # Configure labels and title
    ax.set_title(title, fontsize=16 * cell_ratio, pad=20, fontweight='bold')

```

If you want this behavior to be configurable (e.g., via a parameter or CLI flag), you can:
1. Add a function argument such as `enable_annotations: bool | None = None` and/or `annotation_threshold: int = 400`.
2. Replace the hardcoded `show_annotations = matrix.size <= 400` with logic that respects those parameters:
   - If `enable_annotations` is `True` / `False`, override the automatic behavior.
   - If `enable_annotations` is `None`, fall back to `matrix.size <= annotation_threshold`.
3. Thread the new argument(s) through any CLI parsing or caller functions that invoke this heatmap generator.
</issue_to_address>

### Comment 4
<location> `tools/deepxtrace_heatmap.py:116-117` </location>
<code_context>
+    )
+
+    # Save output
+    output_file = f"deepxtrace.{output_format}"
+    print(
+        f"Saving started at {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}")
+    plt.savefig(
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Derive the output filename from the input file (or allow configuration) instead of always using a fixed name.

Using a fixed `deepxtrace.{format}` filename risks overwriting previous runs and makes it hard to match outputs to their inputs. Consider deriving the name from `input_file` (e.g., its stem plus suffix/timestamp) to make repeated or batch runs safer and more traceable.

Suggested implementation:

```python
from datetime import datetime
from pathlib import Path

```

```python
    # Save output
    # Derive output filename from input file to avoid overwriting and improve traceability
    input_path = Path(input_file)  # assumes input_file is available in this scope
    timestamp = datetime.now().strftime('%Y%m%d-%H%M%S')
    output_file = f"{input_path.stem}.deepxtrace.{timestamp}.{output_format}"
    print(
        f"Saving to {output_file} started at {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}")

```

1. Ensure that `input_file` is available in this scope (e.g., passed into the function generating the plot). If it has a different name, update `Path(input_file)` accordingly.
2. If the file header does not already contain `from datetime import datetime`, adjust the import replacement block to match the actual imports at the top of the file.
3. If you want to make the output filename fully configurable, add an optional `output_file`/`output_dir` argument to the relevant function and CLI, and only fall back to the derived name when no explicit value is provided.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-12-11T09:47:01Z

tools/deepxtrace_heatmap.py

+def parse_matrix_data(log_data):
+    """Parse matrix data from log string containing bracketed number sequences"""
+    number_sequences = re.findall(r'\[([\d\s]+)\]', log_data)
+    return np.array([list(map(int, seq.split())) for seq in number_sequences])


issue: Handle empty or irregular matrix sequences more defensively before constructing the numpy array.

re.findall can return an empty list, and inconsistent row lengths will produce a ragged array. That can lead to unexpected shapes or downstream errors when accessing matrix.shape. Consider validating number_sequences (non-empty, consistent lengths) and raising a clear error on malformed input instead of relying on later failures.

sourcery-ai · 2025-12-11T09:47:01Z

tools/deepxtrace_heatmap.py

+    rows, cols = matrix.shape
+    adjusted_figsize = (figsize[0] * cell_ratio * (cols / 10),


suggestion (performance): Consider clamping the auto-scaled figure size to avoid excessively large outputs for big matrices.

Because adjusted_figsize scales linearly with rows and cols, very large matrices can produce excessively large figures, risking high memory use and unusable plots. Consider capping the width/height (or switching to a nonlinear scaling) so sizes stay within reasonable bounds while still reflecting matrix dimensions.

sourcery-ai · 2025-12-11T09:47:01Z

tools/deepxtrace_heatmap.py

+    ax = sns.heatmap(
+        log_matrix,
+        cmap=cmap,
+        norm=norm,
+        annot=matrix,
+        fmt='.2e',
+        linewidths=0.5,
+        linecolor="white",
+        annot_kws={


suggestion (performance): Automatically disabling annotations for large matrices would improve performance and readability.

annot=matrix doesn’t scale well for large matrices in terms of runtime and visual noise. Consider disabling annotations when rows * cols exceeds a threshold, or gate them behind a parameter/CLI flag so large traces stay responsive while smaller ones can still show detailed values.

Suggested implementation:

# Create colormap and normalize data cmap = create_optimized_ryg_cmap() log_matrix = np.log1p(matrix) norm = plt.Normalize(vmin=log_matrix.min(), vmax=log_matrix.max()) # Dynamic annotation size based on cell size annot_size = max(8, min(20, 10 * cell_ratio)) # Automatically disable annotations for large matrices (performance & readability) # Threshold can be tuned; 400 cells (~20x20) keeps small traces detailed and larger ones clean. show_annotations = matrix.size <= 400 if show_annotations: annot_data = matrix annot_kws = { "size": annot_size, "color": "black", } else: annot_data = None annot_kws = None # Generate heatmap ax = sns.heatmap( log_matrix, cmap=cmap, norm=norm, annot=annot_data, fmt='.2e', linewidths=0.5, linecolor="white", annot_kws=annot_kws, cbar_kws={ "label": "Log(Value + 1) Scale", "format": ScalarFormatter(), "shrink": 0.8 } ) # Configure labels and title ax.set_title(title, fontsize=16 * cell_ratio, pad=20, fontweight='bold')

If you want this behavior to be configurable (e.g., via a parameter or CLI flag), you can:

Add a function argument such as enable_annotations: bool | None = None and/or annotation_threshold: int = 400.

Replace the hardcoded show_annotations = matrix.size <= 400 with logic that respects those parameters:

If enable_annotations is True / False, override the automatic behavior.

If enable_annotations is None, fall back to matrix.size <= annotation_threshold.

Thread the new argument(s) through any CLI parsing or caller functions that invoke this heatmap generator.

sourcery-ai · 2025-12-11T09:47:01Z

tools/deepxtrace_heatmap.py

+    output_file = f"deepxtrace.{output_format}"
+    print(


suggestion (bug_risk): Derive the output filename from the input file (or allow configuration) instead of always using a fixed name.

Using a fixed deepxtrace.{format} filename risks overwriting previous runs and makes it hard to match outputs to their inputs. Consider deriving the name from input_file (e.g., its stem plus suffix/timestamp) to make repeated or batch runs safer and more traceable.

Suggested implementation:

from datetime import datetime from pathlib import Path

# Save output # Derive output filename from input file to avoid overwriting and improve traceability input_path = Path(input_file) # assumes input_file is available in this scope timestamp = datetime.now().strftime('%Y%m%d-%H%M%S') output_file = f"{input_path.stem}.deepxtrace.{timestamp}.{output_format}" print( f"Saving to {output_file} started at {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}")

Ensure that input_file is available in this scope (e.g., passed into the function generating the plot). If it has a different name, update Path(input_file) accordingly.

If the file header does not already contain from datetime import datetime, adjust the import replacement block to match the actual imports at the top of the file.

If you want to make the output filename fully configurable, add an optional output_file/output_dir argument to the relevant function and CLI, and only fall back to the derived name when no explicit value is provided.

gemini-code-assist

Code Review

This pull request introduces a very useful command-line tool for generating heatmaps from DeepXTrace log data, along with corresponding documentation. The implementation is solid and covers a good range of features like custom colormaps and configurable output. My review focuses on improving the tool's robustness and usability. The key suggestions are to add a command-line option for the output file to avoid overwriting results, handle cases where no data is found in the log file to prevent crashes, and make the data parsing more resilient. I've also included a minor formatting suggestion to improve code readability.

tools/deepxtrace_heatmap.py

gemini-code-assist · 2025-12-11T09:47:21Z

tools/deepxtrace_heatmap.py

+def parse_matrix_data(log_data):
+    """Parse matrix data from log string containing bracketed number sequences"""
+    number_sequences = re.findall(r'\[([\d\s]+)\]', log_data)
+    return np.array([list(map(int, seq.split())) for seq in number_sequences])


The list comprehension for parsing numbers can be made more robust. If a log line contains brackets with only whitespace (e.g., [ ]), seq.split() will produce an empty list. This will result in np.array creating a jagged array, which will likely cause errors later on. You can prevent this by filtering out sequences that are empty or contain only whitespace.

Suggested change

return np.array([list(map(int, seq.split())) for seq in number_sequences])

return np.array([list(map(int, seq.split())) for seq in number_sequences if seq.strip()])

tools/deepxtrace_heatmap.py

Signed-off-by: wangfakang <fakangwang@gmail.com> Co-authored-by: 毅松 <fakang.wfk@antgroup.com> Signed-off-by: wangfakang <fakangwang@gmail.com>

sourcery-ai bot reviewed Dec 11, 2025

View reviewed changes

gemini-code-assist bot reviewed Dec 11, 2025

View reviewed changes

wangfakang force-pushed the heatmap_tool branch from 3bdddfa to 35bf73c Compare December 11, 2025 10:00

add a tool for deepxtrace heatmap

f5fccad

Signed-off-by: wangfakang <fakangwang@gmail.com> Co-authored-by: 毅松 <fakang.wfk@antgroup.com> Signed-off-by: wangfakang <fakangwang@gmail.com>

wangfakang force-pushed the heatmap_tool branch from 35bf73c to f5fccad Compare December 11, 2025 10:10

wangfakang merged commit 30b57ca into antgroup:main Dec 11, 2025
3 checks passed

		rows, cols = matrix.shape
		adjusted_figsize = (figsize[0] * cell_ratio * (cols / 10),

	return np.array([list(map(int, seq.split())) for seq in number_sequences])
	return np.array([list(map(int, seq.split())) for seq in number_sequences if seq.strip()])

Add a visualization tool for DeepXTrace heatmap generation #10

Add a visualization tool for DeepXTrace heatmap generation #10

Uh oh!

Conversation

wangfakang commented Dec 11, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for DeepXTrace heatmap CLI execution flow

Class diagram for deepxtrace_heatmap Python tool structure

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

gemini-code-assist bot commented Dec 11, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wangfakang commented Dec 11, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Dec 11, 2025 •

edited

Loading