feat: MLPerf v3 Compliance by hazemawadalla · Pull Request #231 · mlcommons/storage

hazemawadalla · 2026-01-27T23:57:37Z

Summary

This PR brings the KV Cache Benchmark into alignment with MLPerf Storage v3.0 requirements, adds a comprehensive configuration system, and includes a full test suite.

Core Benchmark Engine (kv-cache.py)

Add ConfigLoader class with YAML config file support and schema validation
Add cfg() helper function for config-driven parameter access
Add validate_args() with safety limits for protected system paths
Rename all nvme_* metrics to storage_* for MLPerf terminology compliance
Add extended QoS percentiles: P99.9 and P99.99 latency tracking
Add per-tier bandwidth metrics (read/write GB/s per tier)
Add per-tier KV bytes tracking for detailed storage analysis
Fix GPU metadata desync bug via on_eviction_callback pattern
Change eviction from single-shot to iterative loop until space freed
Replace print statements with Python logging module
Add waterfall LRU eviction with configurable high/low watermarks
Add storage_health section with PASS/FAIL criteria
Add storage_throughput_tokens_per_sec as primary MLPerf metric

Wrapper Script (kv-cache-wrapper.sh)

Add -c DIR option for custom config directory
Generate and pass config.yaml to Python script via --config flag
Add --xlsx-output support for Excel export
Update jq queries for new storage_* metric names
Add mlperf_submission workload with required trial parameters
Enhance system detection for thread counts and memory limits
Update metric parsing for storage_throughput primary metric

Test Suite (test_kv_cache.py)

Add 170+ tests covering all new functionality
Add ConfigLoader tests: schema validation, defaults, file loading
Add cfg() helper tests for config-driven parameters
Add validate_args() tests for path safety and input validation
Add extended QoS tests for P99.9 and P99.99 percentiles
Add GPU eviction callback tests for metadata sync
Add per-tier bandwidth and KV bytes metric tests
Add storage_* metric naming tests for MLPerf compliance
Add waterfall eviction tests with high/low watermarks
Add storage_health PASS/FAIL criteria tests

Excel Export (json_to_xlsx.py)

Add P99.9 and P99.99 latency columns
Add per-tier KV bytes columns (GPU, CPU, Storage)
Add per-tier bandwidth columns (read/write GB/s)
Add storage tier device vs host latency breakdown
Rename nvme_entries to storage_entries for MLPerf compliance
Add storage_throughput_tokens_per_sec as primary metric

Configuration (config.yaml)

Add user_templates section with conversation patterns
Add qos_profiles with latency thresholds per tier
Add eviction settings with waterfall LRU parameters
Add storage_health criteria for PASS/FAIL determination
Add cache_sizing defaults for GPU/CPU/Storage tiers
Provides validated defaults for all tunable parameters

Documentation (README.md)

Add Configuration section with YAML parameter reference
Add MLPerf Submission Guidelines with validated commands
Add Excel metrics reference table with all output columns
Add installation instructions including pyyaml dependency
Add CLI arguments vs config file precedence documentation
Add workload definitions and tier configuration examples
Add troubleshooting section for common issues

Dependencies (requirements.txt)

Add pyyaml>=6.0 for YAML configuration file parsing

Test Plan

[x ] Run pytest on test_kv_cache.py (170+ tests passing)
[ x] Verify config.yaml loads correctly with default values
[x ] Run benchmark with --config flag and verify output metrics
[ x] Confirm storage_* metric naming in JSON output
[ x] Test Excel export with new columns

- Add ConfigLoader class with YAML config file support and schema validation - Add cfg() helper function for config-driven parameter access - Add validate_args() with safety limits for protected system paths - Rename all nvme_* metrics to storage_* for MLPerf terminology compliance - Add extended QoS percentiles: P99.9 and P99.99 latency tracking - Add per-tier bandwidth metrics (read/write GB/s per tier) - Add per-tier KV bytes tracking for detailed storage analysis - Fix GPU metadata desync bug via on_eviction_callback pattern - Change eviction from single-shot to iterative loop until space freed - Replace print statements with Python logging module - Add waterfall LRU eviction with configurable high/low watermarks - Add storage_health section with PASS/FAIL criteria - Add storage_throughput_tokens_per_sec as primary MLPerf metric

- Add -c DIR option for custom config directory - Generate and pass config.yaml to Python script via --config flag - Add --xlsx-output support for Excel export - Update jq queries for new storage_* metric names - Add mlperf_submission workload with required trial parameters - Enhance system detection for thread counts and memory limits - Update metric parsing for storage_throughput primary metric

- Add 170+ tests covering all new functionality - Add ConfigLoader tests: schema validation, defaults, file loading - Add cfg() helper tests for config-driven parameters - Add validate_args() tests for path safety and input validation - Add extended QoS tests for P99.9 and P99.99 percentiles - Add GPU eviction callback tests for metadata sync - Add per-tier bandwidth and KV bytes metric tests - Add storage_* metric naming tests for MLPerf compliance - Add waterfall eviction tests with high/low watermarks - Add storage_health PASS/FAIL criteria tests

- Add Configuration section with YAML parameter reference - Add MLPerf Submission Guidelines with validated commands - Add Excel metrics reference table with all output columns - Add installation instructions including pyyaml dependency - Add CLI arguments vs config file precedence documentation - Add workload definitions and tier configuration examples - Add troubleshooting section for common issues

- Add kv-cache-test-report.html with full test execution results - All 170+ tests passing for v3.0 features - Create unit_test_results directory for test artifacts

- Add P99.9 and P99.99 latency columns - Add per-tier KV bytes columns (GPU, CPU, Storage) - Add per-tier bandwidth columns (read/write GB/s) - Add storage tier device vs host latency breakdown - Rename nvme_entries to storage_entries for MLPerf compliance - Add storage_throughput_tokens_per_sec as primary metric

- Add pyyaml>=6.0 for YAML configuration file parsing - Required for ConfigLoader and --config CLI argument

- Add user_templates section with conversation patterns - Add qos_profiles with latency thresholds per tier - Add eviction settings with waterfall LRU parameters - Add storage_health criteria for PASS/FAIL determination - Add cache_sizing defaults for GPU/CPU/Storage tiers - Provides validated defaults for all tunable parameters

github-actions · 2026-01-27T23:57:46Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

hazemawadalla added 8 commits January 27, 2026 15:42

test(results): add pytest HTML test report

166f2b2

- Add kv-cache-test-report.html with full test execution results - All 170+ tests passing for v3.0 features - Create unit_test_results directory for test artifacts

deps(requirements): add pyyaml for config support

1bfe885

- Add pyyaml>=6.0 for YAML configuration file parsing - Required for ConfigLoader and --config CLI argument

hazemawadalla requested a review from a team January 27, 2026 23:57

hazemawadalla requested a review from a team as a code owner January 27, 2026 23:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: MLPerf v3 Compliance#231

feat: MLPerf v3 Compliance#231
hazemawadalla wants to merge 8 commits intomlcommons:mainfrom
hazemawadalla:main

hazemawadalla commented Jan 27, 2026

Uh oh!

github-actions bot commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

hazemawadalla commented Jan 27, 2026

Summary

Core Benchmark Engine (kv-cache.py)

Wrapper Script (kv-cache-wrapper.sh)

Test Suite (test_kv_cache.py)

Excel Export (json_to_xlsx.py)

Configuration (config.yaml)

Documentation (README.md)

Dependencies (requirements.txt)

Test Plan

Uh oh!

github-actions bot commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant