Benchmark: Micro benchmark - Add float datatype support and other refinements to GPU Stream #769

WenqingLan1 · 2025-12-19T20:05:13Z

Refinements:

Use 128-bit aligned vector types (double2/float4) for optimal memory bandwidth.
Add support for float execution.
Add --data_type <float|double> CLI option for runtime type selection.
Move template kernel implementations to header file (required for CUDA template instantiation across compilation units).
Rename entry point file from gpu_stream_test.cpp to gpu_stream_main.cpp.
Updated hard-coded GPU iteration to single node run so it can run with SuperBench's distributed execution in config.yaml.
Updated numa assignment from hard coded numa_alloc_onnode to numa_alloc_local to optimize memory allocation.
Updated micro benchmark doc to reflect new metric name removing gpu_id.

New config:

    gpu-stream:fp64:
      <<: *default_local_mode
      timeout: 600
      parameters:
        num_warm_up: 10
        num_loops: 40
        size: 1308622848
        data_type: double
    gpu-stream:fp64-correctness:
      <<: *default_local_mode
      timeout: 600
      parameters:
        num_warm_up: 0
        num_loops: 1
        size: 1048576
        data_type: double
        check_data: true
    gpu-stream:fp32:
      <<: *default_local_mode
      timeout: 600
      parameters:
        num_warm_up: 10
        num_loops: 40
        size: 2617245696
        data_type: float
    gpu-stream:fp32-correctness:
      <<: *default_local_mode
      timeout: 600
      parameters:
        num_warm_up: 0
        num_loops: 1
        size: 1048576
        data_type: float
        check_data: true

New rule:

    gpu-stream:
      statistics:
        - mean
      categories: GPU-STREAM
      aggregate: True
      metrics:
        - gpu-stream:fp(?:32|64)/STREAM_.*_(?:bw|ratio):(\d+)

Example results:

"gpu-stream:fp32/STREAM_COPY_float_buffer_2617245696_block_256_bw:0": 1234, 
"gpu-stream:fp32/STREAM_COPY_float_buffer_2617245696_block_256_bw:1": 1234, 
"gpu-stream:fp32/STREAM_COPY_float_buffer_2617245696_block_256_bw:2": 1234, 
"gpu-stream:fp32/STREAM_COPY_float_buffer_2617245696_block_256_bw:3": 1234

Processed by rules:

| gpu-stream:fp32/STREAM_COPY_float_buffer_2617245696_block_256_bw | mean | 1234|

codecov · 2025-12-19T20:14:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.70%. Comparing base (c99380b) to head (3c359a3).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #769   +/-   ##
=======================================
  Coverage   85.69%   85.70%           
=======================================
  Files         102      102           
  Lines        7699     7700    +1     
=======================================
+ Hits         6598     6599    +1     
  Misses       1101     1101

Flag	Coverage Δ
cpu-python3.10-unit-test	`70.97% <50.00%> (+<0.01%)`	⬆️
cpu-python3.7-unit-test	`70.42% <50.00%> (+<0.01%)`	⬆️
cuda-unit-test	`83.61% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

WenqingLan1 added 3 commits December 18, 2025 01:16

remove fixed gpu id & numa id assignment

7f23c75

use 128bit alignment, add float support, cleanup

d63fe8c

add data_type arg

242714e

WenqingLan1 requested a review from a team as a code owner December 19, 2025 20:05

WenqingLan1 added the micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks label Dec 19, 2025

guoshzhao self-assigned this Dec 19, 2025

guoshzhao requested review from guoshzhao and polarG December 19, 2025 20:32

WenqingLan1 and others added 4 commits December 19, 2025 23:31

fix lint

e8d0282

fix clang lint

5a18946

update doc

fddf56e

Merge branch 'main' into wenqinglan/refine-gpu-stream

3c359a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark: Micro benchmark - Add float datatype support and other refinements to GPU Stream #769

Benchmark: Micro benchmark - Add float datatype support and other refinements to GPU Stream #769

Uh oh!

WenqingLan1 commented Dec 19, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Benchmark: Micro benchmark - Add float datatype support and other refinements to GPU Stream #769

Are you sure you want to change the base?

Benchmark: Micro benchmark - Add float datatype support and other refinements to GPU Stream #769

Uh oh!

Conversation

WenqingLan1 commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WenqingLan1 commented Dec 19, 2025 •

edited

Loading

codecov bot commented Dec 19, 2025 •

edited

Loading