-
Notifications
You must be signed in to change notification settings - Fork 466
Description
Description of the issue
When running containers in Docker Swarm mode using generic GPU resources, Docker Swarm correctly assigns distinct GPU UUIDs to each replica, but NVIDIA Container Runtime ignores this assignment and exposes all GPUs to every container. As a result, multiple containers scheduled on the same node end up using the same physical GPU (GPU 0), leading to memory oversubscription and OOM failures.
Steps to reproduce
- Configure NVIDIA Container Toolkit for Docker:
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker
this result in my environment of following daemon.json
{
"default-runtime": "nvidia",
"default-shm-size": "1G",
"default-ulimits": {
"memlock": {
"hard": -1,
"name": "memlock",
"soft": -1
},
"stack": {
"hard": 67108864,
"name": "stack",
"soft": 67108864
}
},
"node-generic-resources": [
"NVIDIA-GPU=GPU-f2ba9cd4-6f6b-860f-3c78-4a6639e4b5db",
"NVIDIA-GPU=GPU-f12d79c4-d485-2fd1-ca2d-cd5eef76fe40"
],
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
- configure nvidia-container-runtime /etc/nvidia-container-runtime/config.toml as follow
#accept-nvidia-visible-devices-as-volume-mounts = false
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
swarm-resource = "DOCKER_RESOURCE_NVIDIA-GPU"
[nvidia-container-cli]
#debug = "/var/log/nvidia-container-toolkit.log"
environment = []
#ldcache = "/etc/ld.so.cache"
ldconfig = "@/sbin/ldconfig.real"
load-kmods = true
#no-cgroups = false
#path = "/usr/bin/nvidia-container-cli"
#root = "/run/nvidia/driver"
#user = "root:video"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime.modes.legacy]
cuda-compat-mode = "ldconfig"
[nvidia-container-runtime-hook]
path = "nvidia-container-runtime-hook"
skip-mode-detection = false
[nvidia-ctk]
path = "nvidia-ctk"
- Deploy a Swarm service requesting one GPU per replica:
services:
worker-service:
image: <image>
deploy:
replicas: 2
resources:
reservations:
generic_resources:
- discrete_resource_spec:
kind: NVIDIA-GPU
value: 1
command: python3 -m launch_bare_metal
- Inspect the running containers:
docker inspect <container_id>
You will observe:
Container 1:
DOCKER_RESOURCE_NVIDIA-GPU=GPU-f2ba9cd4-6f6b-860f-3c78-4a6639e4b5db
Container 2:
DOCKER_RESOURCE_NVIDIA-GPU=GPU-f12d79c4-d485-2fd1-ca2d-cd5eef76fe40
- Run nvidia-smi on the host.
Wed Jan 21 16:46:38 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.195.03 Driver Version: 570.195.03 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:01:00.0 Off | 0 |
| N/A 29C P0 118W / 700W | 11951MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 Off | 00000000:02:00.0 Off | 0 |
| N/A 31C P0 71W / 700W | 4MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 215251 C python3 6000MiB |
| 0 N/A N/A 215859 C python3 5936MiB |
+-----------------------------------------------------------------------------------------+
Expected behavior
Each container should see and use only the GPU assigned by Docker Swarm, i.e.:
Container 1 → id 0 (GPU-f2ba9cd4-6f6b-860f-3c78-4a6639e4b5db)
Container 2 → id 1 (GPU-f12d79c4-d485-2fd1-ca2d-cd5eef76fe40)
Actual behavior
Both containers see all GPUs
CUDA defaults to GPU 0
Both containers allocate memory on GPU 0
GPU 1 remains unused
version
ii nvidia-container-toolkit 1.18.1-1 amd64 NVIDIA Container toolkit
ii nvidia-container-toolkit-base 1.18.1-1 amd64 NVIDIA Container Toolkit Base
thank you for your help