-
Notifications
You must be signed in to change notification settings - Fork 65
Open
Description
I am using Ubuntu 22.04 and testing the performance of an NVIDIA A100 GPU with the nvbandwidth tool. I observed that as the buffer size increases, the reported throughput decreases:
Test case: host_to_device_memcpy_sm
- 512 MiB: 25.13 GB/s
- 1 GiB: 25.13 GB/s
- 10 GiB: 18.05 GB/s
- 20 GiB: 16.50 GB/s
Below is the output:
$ ./nvbandwidth -t host_to_device_memcpy_sm -b 512
nvbandwidth Version: v0.8
Built from Git version: v0.8
CUDA Runtime Version: 12060
CUDA Driver Version: 12060
Driver Version: 560.35.05
Device 0: NVIDIA A100-SXM4-40GB (00000000:99:00)
Running host_to_device_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
0
0 25.13
SUM host_to_device_memcpy_sm 25.13
NOTE: The reported results may not reflect the full capabilities of the platform.
Performance can vary with software drivers, hardware clocks, and system topology.$ ./nvbandwidth -t host_to_device_memcpy_sm -b 1024
nvbandwidth Version: v0.8
Built from Git version: v0.8
CUDA Runtime Version: 12060
CUDA Driver Version: 12060
Driver Version: 560.35.05
Device 0: NVIDIA A100-SXM4-40GB (00000000:99:00)
Running host_to_device_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
0
0 25.13
SUM host_to_device_memcpy_sm 25.13
NOTE: The reported results may not reflect the full capabilities of the platform.
Performance can vary with software drivers, hardware clocks, and system topology.
$ ./nvbandwidth -t host_to_device_memcpy_sm -b 10240
nvbandwidth Version: v0.8
Built from Git version: v0.8
CUDA Runtime Version: 12060
CUDA Driver Version: 12060
Driver Version: 560.35.05
Device 0: NVIDIA A100-SXM4-40GB (00000000:99:00)
Running host_to_device_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
0
0 18.05
SUM host_to_device_memcpy_sm 18.05
NOTE: The reported results may not reflect the full capabilities of the platform.
Performance can vary with software drivers, hardware clocks, and system topology.
$ ./nvbandwidth -t host_to_device_memcpy_sm -b 20480
nvbandwidth Version: v0.8
Built from Git version: v0.8
CUDA Runtime Version: 12060
CUDA Driver Version: 12060
Driver Version: 560.35.05
Device 0: NVIDIA A100-SXM4-40GB (00000000:99:00)
Running host_to_device_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
0
0 16.50
SUM host_to_device_memcpy_sm 16.50
NOTE: The reported results may not reflect the full capabilities of the platform.
Performance can vary with software drivers, hardware clocks, and system topology.
Metadata
Metadata
Assignees
Labels
No labels