Skip to content

Conversation

@Alessandro624
Copy link
Owner

Description

This PR fixes an issue in the device properties output related to shared memory formatting and introduces a complete CUDA matrix multiplication example, including build, execution, and profiling support.

Key changes

  • Fixed the shared memory output format in the device properties display to improve correctness and readability.

  • Added a new matrix_multiplication/ module featuring:

    • matrixMul.cu: CUDA implementation of matrix multiplication.
    • Makefile to simplify compilation.
    • run.sh for easy execution.
    • profile_nvprof.sh to collect performance metrics via NVIDIA profiling tools.
    • README documenting usage, build steps, and profiling workflow.

Impact

This PR combines a small but important correctness fix with a practical CUDA example that can be used as a benchmark or learning reference. It strengthens the project’s focus on GPU performance analysis by pairing executable code with reproducible profiling scripts.

@Alessandro624 Alessandro624 self-assigned this Dec 30, 2025
@Alessandro624 Alessandro624 added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 30, 2025
@Alessandro624 Alessandro624 merged commit 05d73a3 into dev Dec 30, 2025
1 check passed
@Alessandro624 Alessandro624 deleted the matrix-mul branch December 30, 2025 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant