Skip to content

Add CUDA profiling tools with roofline analysis support#9

Merged
Alessandro624 merged 7 commits intodevfrom
profiling
Dec 30, 2025
Merged

Add CUDA profiling tools with roofline analysis support#9
Alessandro624 merged 7 commits intodevfrom
profiling

Conversation

@Alessandro624
Copy link
Owner

Description

This PR introduces a complete CUDA profiling toolkit aimed at performance analysis and optimization, with a strong focus on roofline modeling.

Key changes

  • Updated .gitignore and README to align with the new profiling workflow.

  • Added a new profiling_tools/ module, including:

    • gpu_info.cu to extract and report GPU hardware characteristics.
    • parse_metrics.py to process and aggregate profiling metrics.
    • Makefile to streamline build and execution.
    • Dedicated README documenting usage and workflow.
  • Integrated CUDA profiling scripts enabling automated metric collection.

  • Added gnuplot scripts to visualize results and support roofline analysis.

Impact

This enhancement provides a structured, reproducible way to analyze GPU performance, bridging low-level CUDA metrics with high-level performance modeling. It improves developer productivity and makes performance bottlenecks easier to identify and communicate.

Notes

The tooling is designed to be modular and easily extensible for future profiling metrics or visualization strategies.

@Alessandro624 Alessandro624 self-assigned this Dec 30, 2025
@Alessandro624 Alessandro624 added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 30, 2025
@Alessandro624 Alessandro624 merged commit 99dea95 into dev Dec 30, 2025
1 check passed
@Alessandro624 Alessandro624 deleted the profiling branch December 30, 2025 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant