NVIDIA Container Runtime ignores Docker Swarm generic GPU resource assignments

**Description of the issue** 

When running containers in Docker Swarm mode using generic GPU resources, Docker Swarm correctly assigns distinct GPU UUIDs to each replica, but NVIDIA Container Runtime ignores this assignment and exposes all GPUs to every container. As a result, multiple containers scheduled on the same node end up using the same physical GPU (GPU 0), leading to memory oversubscription and OOM failures.



**Steps to reproduce**

- Configure NVIDIA Container Toolkit for Docker:

```
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker
```

this result in my environment of following daemon.json  

```
{
    "default-runtime": "nvidia",
    "default-shm-size": "1G",
    "default-ulimits": {
        "memlock": {
            "hard": -1,
            "name": "memlock",
            "soft": -1
        },
        "stack": {
            "hard": 67108864,
            "name": "stack",
            "soft": 67108864
        }
    },
    "node-generic-resources": [
        "NVIDIA-GPU=GPU-f2ba9cd4-6f6b-860f-3c78-4a6639e4b5db",
        "NVIDIA-GPU=GPU-f12d79c4-d485-2fd1-ca2d-cd5eef76fe40"
    ],
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}
```
- configure nvidia-container-runtime  /etc/nvidia-container-runtime/config.toml as follow 
```
#accept-nvidia-visible-devices-as-volume-mounts = false
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
swarm-resource = "DOCKER_RESOURCE_NVIDIA-GPU"

[nvidia-container-cli]
#debug = "/var/log/nvidia-container-toolkit.log"
environment = []
#ldcache = "/etc/ld.so.cache"
ldconfig = "@/sbin/ldconfig.real"
load-kmods = true
#no-cgroups = false
#path = "/usr/bin/nvidia-container-cli"
#root = "/run/nvidia/driver"
#user = "root:video"

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]

[nvidia-container-runtime.modes]

[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]

[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime.modes.legacy]
cuda-compat-mode = "ldconfig"

[nvidia-container-runtime-hook]
path = "nvidia-container-runtime-hook"
skip-mode-detection = false

[nvidia-ctk]
path = "nvidia-ctk"
```



- Deploy a Swarm service requesting one GPU per replica:


```
services:
  worker-service:
    image: <image>
    deploy:
      replicas: 2
      resources:
        reservations:
          generic_resources:
            - discrete_resource_spec:
                kind: NVIDIA-GPU
                value: 1
    command: python3 -m launch_bare_metal
```

- Inspect the running containers:

docker inspect <container_id>


You will observe:

Container 1:

`DOCKER_RESOURCE_NVIDIA-GPU=GPU-f2ba9cd4-6f6b-860f-3c78-4a6639e4b5db`

Container 2:

`DOCKER_RESOURCE_NVIDIA-GPU=GPU-f12d79c4-d485-2fd1-ca2d-cd5eef76fe40`


- Run nvidia-smi on the host.
```
Wed Jan 21 16:46:38 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.195.03             Driver Version: 570.195.03     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          Off |   00000000:01:00.0 Off |                    0 |
| N/A   29C    P0            118W /  700W |   11951MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H100 80GB HBM3          Off |   00000000:02:00.0 Off |                    0 |
| N/A   31C    P0             71W /  700W |       4MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A          215251      C   python3                                6000MiB |
|    0   N/A  N/A          215859      C   python3                                5936MiB |
+-----------------------------------------------------------------------------------------+
```

**Expected behavior**

Each container should see and use only the GPU assigned by Docker Swarm, i.e.:

Container 1 → id 0 (GPU-f2ba9cd4-6f6b-860f-3c78-4a6639e4b5db)

Container 2 → id 1 (GPU-f12d79c4-d485-2fd1-ca2d-cd5eef76fe40)



**Actual behavior**

Both containers see all GPUs

CUDA defaults to GPU 0

Both containers allocate memory on GPU 0

GPU 1 remains unused


**version**

```
ii  nvidia-container-toolkit              1.18.1-1                                amd64        NVIDIA Container toolkit
ii  nvidia-container-toolkit-base         1.18.1-1                                amd64        NVIDIA Container Toolkit Base
```

thank you for your help


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA Container Runtime ignores Docker Swarm generic GPU resource assignments #1599

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NVIDIA Container Runtime ignores Docker Swarm generic GPU resource assignments #1599

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions