Skip to content

Conversation

@Alessandro624
Copy link
Owner

Description

This PR adds a new CUDA parallel histogram module, showcasing a classic parallel pattern with a strong emphasis on performance analysis and profiling.

Key changes

  • Added a new parallel_histogram/ module including:

    • CUDA kernels implementing a parallel histogram strategy.
    • Makefile to handle compilation.
    • run.sh for execution.
    • profile_nvprof.sh for GPU performance profiling.
    • README describing the algorithm, usage, and profiling steps.
  • Updated .gitignore to include generated artifacts related to the parallel histogram module.

Impact

The parallel histogram example introduces a workload characterized by contention and memory access challenges, making it an excellent case study for analyzing synchronization, atomics, and memory behavior on GPUs. It further enriches the repository as a hands-on CUDA performance playground.

@Alessandro624 Alessandro624 self-assigned this Dec 30, 2025
@Alessandro624 Alessandro624 added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 30, 2025
@Alessandro624 Alessandro624 merged commit fb018dc into dev Dec 30, 2025
1 check passed
@Alessandro624 Alessandro624 deleted the parallel-histo branch December 30, 2025 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant