🚀 Optimizing CUDA Memory Layouts

Which Memory Layouts exist in CUDA?

In Cuda there exists several possibilities to allocate shared memory, we will mainly look at the following three:

`Memory Layout`	`Prefetching Possible?`	`Pinning Possible?`	`Oversubscribing VRAM?`
`Pinned Memory`	❌	✅	❌
`Mapped Memory`	❌	❌	✅
`Unified Memory`	✅	✅	✅

🔎 How should we work with Unified Memory?

When working in the heterogeneous world where we can access the GPU and CPU simoultanesly the question arise on how we should allocate our shared memory. There are several ways for allocating this memory. In this repo we will mainly look at Unified Memory and Mapped Memory, since both are options to oversubscribe our VRAM without complex code changes.

🔎 Why should we look at oversubscribing VRAM?

For memory intensive workloads like sorting or training DNN VRAM gets heavily pressured. Especially for 3D generation neural networks (e.g. MVDream_threestudio) eat up like 20GiB of VRAM which isn't feasable on a normal RTX 5090. Upgrading to a larger GPU results in additional cost expenses. Exploitiong physical RAM would in this case reduce costs.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
images		images
include		include
src		src
.gitignore		.gitignore
Makefile		Makefile
ReadMe.md		ReadMe.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Optimizing CUDA Memory Layouts

Which Memory Layouts exist in CUDA?

🔎 How should we work with Unified Memory?

🔎 Why should we look at oversubscribing VRAM?

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

sl-badcoder/GPUOpt

Folders and files

Latest commit

History

Repository files navigation

🚀 Optimizing CUDA Memory Layouts

Which Memory Layouts exist in CUDA?

🔎 How should we work with Unified Memory?

🔎 Why should we look at oversubscribing VRAM?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages