Revert "[AMD][BugFix] Fix omission of wvSplitK kernel for small batc… Test by MitchLewis930 · Pull Request #18 · Signal65/vllm-code-review

MitchLewis930 · 2026-01-21T18:16:22Z

…h sizes (1-4) due to torch.compile (vllm-project#21350)"

This reverts commit b361f14.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Summary by CodeRabbit

Refactor
- Restructured internal matrix multiplication operations for improved code organization and maintainability.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

… sizes (1-4) due to torch.compile (vllm-project#21350)" This reverts commit b361f14.

coderabbitai · 2026-01-21T18:17:11Z

📝 Walkthrough

Walkthrough

This pull request refactors GEMM utility functions by removing custom operation registration infrastructure and adding a layer parameter to unquantized GEMM functions. The changes consolidate platform-specific implementations while simplifying the dispatch mechanism.

Changes

Cohort / File(s)	Summary
GEMM Function Signature Updates `vllm/model_executor/layers/utils.py`	Added `layer: torch.nn.Module` parameter to `rocm_unquantized_gemm()`, `default_unquantized_gemm()`, and `cpu_unquantized_gemm()` functions. Removed `rocm_unquantized_gemm_impl()` and its fake variant. Removed `direct_register_custom_op` import and custom op registration block. Dispatch function updated to align with new signatures.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A hop through the code, a cleanup so neat,
Custom ops retired, their final defeat,
Layers now passed where functions align,
Simpler paths forward, just right on the line! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title indicates a revert of a previous commit about ROCm/AMD GPU optimization, which aligns with the changes removing custom op registration and modifying GEMM function signatures.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b361f14 and df3c55e.

📒 Files selected for processing (1)

vllm/model_executor/layers/utils.py

🧰 Additional context used

🪛 Ruff (0.14.13)

vllm/model_executor/layers/utils.py

73-73: Unused function argument: layer

(ARG001)

🔇 Additional comments (1)

vllm/model_executor/layers/utils.py (1)

73-97: Unused layer parameter is intentional for API consistency.

The static analysis correctly identifies that layer is unused here, but this is by design—all three GEMM functions share the same signature so dispatch_unquantized_gemm can return any of them interchangeably. The cpu_unquantized_gemm variant (line 104) actually uses layer.

Potential regression risk with torch.compile.

This PR reverts the custom op registration (direct_register_custom_op) that was specifically introduced to fix wvSplitK kernel omission for small batch sizes (1-4) when using torch.compile on ROCm. Without the custom op registration, torch.compile may not properly trace through this function, potentially bypassing the skinny GEMM optimizations for those batch sizes.

Please verify that this revert does not re-introduce the original issue when running with torch.compile on AMD GPUs with batch sizes 1-4.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Revert "[AMD][BugFix] Fix omission of wvSplitK kernel for small batch…

df3c55e

… sizes (1-4) due to torch.compile (vllm-project#21350)" This reverts commit b361f14.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Revert "[AMD][BugFix] Fix omission of wvSplitK kernel for small batc… Test#18

Revert "[AMD][BugFix] Fix omission of wvSplitK kernel for small batc… Test#18
MitchLewis930 wants to merge 1 commit intobench/PR_004_basefrom
bench/PR_004_bug__CodeRabbit

MitchLewis930 commented Jan 21, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 21, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

MitchLewis930 commented Jan 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MitchLewis930 commented Jan 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 21, 2026 •

edited

Loading