-
Notifications
You must be signed in to change notification settings - Fork 65
Open
Description
hi @esitaridi
Recently I tried to do device_local_copy test on NVIDIA H20-3e, according to official data indicators, GPU memory bandwidth should be 4.8TB/s.
But I only get 1TB/s, is there a problem somewhere? here are my test results and basic information about the server.
thanks,
henry.
root@sglang-host-cuda:/workspace/yongjie/nvbandwidth# ./nvbandwidth -t device_local_copy
nvbandwidth Version: v0.8
Built from Git version: v0.8
CUDA Runtime Version: 12060
CUDA Driver Version: 12060
Driver Version: 550.127.08
sglang-host-cuda
Device 0: NVIDIA H20-3e (00000000:18:00)
Device 1: NVIDIA H20-3e (00000000:38:00)
Device 2: NVIDIA H20-3e (00000000:49:00)
Device 3: NVIDIA H20-3e (00000000:59:00)
Device 4: NVIDIA H20-3e (00000000:9b:00)
Device 5: NVIDIA H20-3e (00000000:bb:00)
Device 6: NVIDIA H20-3e (00000000:ca:00)
Device 7: NVIDIA H20-3e (00000000:da:00)
Running device_local_copy.
memcpy local GPU(column) bandwidth (GB/s)
0 1 2 3 4 5 6 7
0 1116.30 1116.30 1115.72 1116.16 1116.16 1116.88 1117.17 1116.30
SUM device_local_copy 8930.99
NOTE: The reported results may not reflect the full capabilities of the platform.
Performance can vary with software drivers, hardware clocks, and system topology.
root@sglang-host-cuda:/workspace/yongjie/nvbandwidth# nvidia-smi
Wed Sep 17 02:03:25 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H20-3e Off | 00000000:18:00.0 Off | 0 |
| N/A 38C P0 120W / 500W | 137557MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H20-3e Off | 00000000:38:00.0 Off | 0 |
| N/A 33C P0 114W / 500W | 137703MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H20-3e Off | 00000000:49:00.0 Off | 0 |
| N/A 38C P0 119W / 500W | 137683MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H20-3e Off | 00000000:59:00.0 Off | 0 |
| N/A 33C P0 117W / 500W | 137703MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA H20-3e Off | 00000000:9B:00.0 Off | 0 |
| N/A 33C P0 117W / 500W | 137703MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA H20-3e Off | 00000000:BB:00.0 Off | 0 |
| N/A 39C P0 118W / 500W | 137703MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA H20-3e Off | 00000000:CA:00.0 Off | 0 |
| N/A 33C P0 118W / 500W | 137703MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA H20-3e Off | 00000000:DA:00.0 Off | 0 |
| N/A 39C P0 122W / 500W | 136983MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Metadata
Metadata
Assignees
Labels
No labels