Skip to content

performance of device_local_copy does not meet expectations on NVIDIA H20-3e #48

@yoqiu-amd

Description

@yoqiu-amd

hi @esitaridi

Recently I tried to do device_local_copy test on NVIDIA H20-3e, according to official data indicators, GPU memory bandwidth should be 4.8TB/s.
But I only get 1TB/s, is there a problem somewhere? here are my test results and basic information about the server.

thanks,
henry.

Image
root@sglang-host-cuda:/workspace/yongjie/nvbandwidth# ./nvbandwidth -t device_local_copy
nvbandwidth Version: v0.8
Built from Git version: v0.8

CUDA Runtime Version: 12060
CUDA Driver Version: 12060
Driver Version: 550.127.08

sglang-host-cuda
Device 0: NVIDIA H20-3e (00000000:18:00)
Device 1: NVIDIA H20-3e (00000000:38:00)
Device 2: NVIDIA H20-3e (00000000:49:00)
Device 3: NVIDIA H20-3e (00000000:59:00)
Device 4: NVIDIA H20-3e (00000000:9b:00)
Device 5: NVIDIA H20-3e (00000000:bb:00)
Device 6: NVIDIA H20-3e (00000000:ca:00)
Device 7: NVIDIA H20-3e (00000000:da:00)

Running device_local_copy.
memcpy local GPU(column) bandwidth (GB/s)
           0         1         2         3         4         5         6         7
 0   1116.30   1116.30   1115.72   1116.16   1116.16   1116.88   1117.17   1116.30

SUM device_local_copy 8930.99

NOTE: The reported results may not reflect the full capabilities of the platform.
Performance can vary with software drivers, hardware clocks, and system topology.

root@sglang-host-cuda:/workspace/yongjie/nvbandwidth# nvidia-smi
Wed Sep 17 02:03:25 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08             Driver Version: 550.127.08     CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H20-3e                  Off |   00000000:18:00.0 Off |                    0 |
| N/A   38C    P0            120W /  500W |  137557MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H20-3e                  Off |   00000000:38:00.0 Off |                    0 |
| N/A   33C    P0            114W /  500W |  137703MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H20-3e                  Off |   00000000:49:00.0 Off |                    0 |
| N/A   38C    P0            119W /  500W |  137683MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H20-3e                  Off |   00000000:59:00.0 Off |                    0 |
| N/A   33C    P0            117W /  500W |  137703MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H20-3e                  Off |   00000000:9B:00.0 Off |                    0 |
| N/A   33C    P0            117W /  500W |  137703MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H20-3e                  Off |   00000000:BB:00.0 Off |                    0 |
| N/A   39C    P0            118W /  500W |  137703MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H20-3e                  Off |   00000000:CA:00.0 Off |                    0 |
| N/A   33C    P0            118W /  500W |  137703MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H20-3e                  Off |   00000000:DA:00.0 Off |                    0 |
| N/A   39C    P0            122W /  500W |  136983MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions