Writing CUDA Kernels - A VSCode Setup & Tutorial

This repository is designed to simplify your introduction to CUDA kernel development by providing a ready-to-use VSCode setup. With it, you can both profile your kernels and debug them directly from the VSCode editor, so you can dive into online tutorials immediately without wrestling with your toolchain first.

Build system: CMake (tested with version 3.28.3)
Tested with CUDA 13.0 and Python 3.12.3

This repository contains different CUDA implementations of an sgemm kernel, inspired by the tutorials from siboehm and leimao. The triton folder contains examples of how to tune a gemm kernel with Triton. The pybind folder contains an example of how to invoke a CUDA kernel written in C++ from Python.

Build sgemm kernels in VSCode

Make sure you have all necessary VSCode extensions:
- C/C++ & C/C++ Extension Pack
- CMake & CMake Tools
- (Kernel profiling) Nsight Visual Studio Code Edition
- (For Python examples only) Python & Python Extension Pack
- Clang-format (by X. Hellauer) for C++ formatting
- Yapf for Python formatting
Adapt the pathes in the settings.json file.
Select the build variant (Release or Debug): (F1) -> (CMake: Select Variant)
Configure + Build + Install the executable: (F1) -> (CMake: Install)
You should now be able to see the binary called sgemm in the build or release folder depending on the variant.

Run and debug kernels in VSCode

The run and debug configurations can be found in the launch.json file.

Adapt the pathes in the launch.json file.

To just run the kernel (release version), select:

(F1) -> (Debug: Select and Start Debugging) -> (Run kernel)

To set breakpoints in VSCode to debug the host code and/or the GPU code, select:

(F1) -> (Debug: Select and Start Debugging) -> (Debug kernel)

Profile the kernels in VSCode

To collect meaningful performance metrics, you should always profile the release version of your kernel.

By default, NVIDIA’s profiler (ncu) requires elevated (root) privileges to access GPU performance counters.

To allow all users to run ncu without invoking sudo, NVIDIA describes a permanent, non-root workaround here.

Follow the steps on the website if you wish to continue without sudo.
(F1) -> (Tasks: Run task) -> (Profile SGEMM with Nsight {sudo/ no sudo})
Enter the kernel name, e.g., sgemm_simple
Select the section you want to profile.
Enter the sudo password in the terminal in VSCode.

Build sgemm kernels in terminal

mkdir -p build/debug/build && cd build/debug/build
cmake \
    -DCMAKE_BUILD_TYPE={Debug/Release} \
    -DCMAKE_INSTALL_PREFIX=../ \
    -DCMAKE_CUDA_TOOLKIT_ROOT_DIR=<CUDA_PATH, e.g.: `/usr/local/cuda-13`> \
    ../../../
make
make install

# Execute in main directory
./build/{debug/release}/bin/sgemm

# Show instructions
${CUDA_PATH}/bin/cuobjdump --dump-ptx build/{debug/release}/bin/sgemm

Run/Debug Python file

Install requirements:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Reload VSCode (F1) -> Developer: Reload Window to ensure that the venv is activated when you open a new terminal. If not: (F1) -> Python: Create Environment... -> Venv -> Use Existing and reload the window again.

This repository also contains a Triton implementation of a GEMM kernel. You can find it in this folder. To run the file (without debugging):

Open the Python file you want to execute.
Press (F1) -> Python: Run Python File in Terminal.

To debug the Python file, use the corresponding configuration. To debug the triton kernel, TRITON_INTERPRET needs to be set to 1. This activates the interpreter mode instead of executing the compiled kernel. More information can be found here.

Open the Python file you want to debug.
Press (F1) -> Debug: Select and Start Debugging
Choose: Debug Python File

Implementations included

The following sgemm implementations are included in this repository:

Simple sgemm

Coalesced sgemm

Tiled sgemm

2D-Tiled sgemm & 2D-Tiled sgemm (vectorized v2)

2D-Tiled sgemm (vectorized v1)

2D Warptiling

Tensorcores

Tracing

Enable the collection of tracing information in the settings.json.

Trace the kernel <my_kernel>, e.g. sgemm_warptiling:

${CUDA_PATH}/bin/ncu \
  --set full -f \
  --kernel-name <my_kernel> \
  --export sgemm.ncu-rep \
  ./build/release/bin/sgemm

Open the file with nsight:
```
${CUDA_PATH}/bin/ncu-ui sgemm.ncu-rep
```

Profile additional metrics:

# Show all metrics
${CUDA_PATH}/bin/ncu --query-metrics

# Profile more metrics (m1, m2, and m3)
${CUDA_PATH}/bin/ncu [...] --metrics m1,m2,m3 [...]

To print the results, use --page raw .

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
examples		examples
inc		inc
src		src
util		util
.clang-format		.clang-format
.gitignore		.gitignore
.style.yapf		.style.yapf
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
autotuning.bash		autotuning.bash
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Writing CUDA Kernels - A VSCode Setup & Tutorial

Build sgemm kernels in VSCode

Run and debug kernels in VSCode

Profile the kernels in VSCode

Build sgemm kernels in terminal

Run/Debug Python file

Implementations included

Simple sgemm

Coalesced sgemm

Tiled sgemm

2D-Tiled sgemm & 2D-Tiled sgemm (vectorized v2)

2D-Tiled sgemm (vectorized v1)

2D Warptiling

Tensorcores

Tracing

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

rpelke/cuda-vscode-setup

Folders and files

Latest commit

History

Repository files navigation

Writing CUDA Kernels - A VSCode Setup & Tutorial

Build sgemm kernels in VSCode

Run and debug kernels in VSCode

Profile the kernels in VSCode

Build sgemm kernels in terminal

Run/Debug Python file

Implementations included

Simple sgemm

Coalesced sgemm

Tiled sgemm

2D-Tiled sgemm & 2D-Tiled sgemm (vectorized v2)

2D-Tiled sgemm (vectorized v1)

2D Warptiling

Tensorcores

Tracing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages