PR_001 - CodeRabbit by MitchLewis930 · Pull Request #25 · Signal65/vllm-code-review

MitchLewis930 · 2026-01-21T21:24:28Z

PR_001

Summary by CodeRabbit

Release Notes

Refactor
- Improved request cancellation handling to better detect and respond to client disconnections during streaming operations across all API endpoints (chat completions, completions, embeddings, and scoring).

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ct#11190)" This reverts commit 2d1b9ba.

coderabbitai · 2026-01-21T21:25:12Z

📝 Walkthrough

Walkthrough

This PR refactors request cancellation handling by replacing a decorator-based approach (@with_cancellation) with utility functions (iterate_with_cancellation and enhanced merge_async_iterators). The with_cancellation decorator and related utilities are removed from vllm/entrypoints/utils.py, while new cancellation-aware utilities are added to vllm/utils.py. Multiple API endpoint files are updated to use the new utilities with client disconnect callbacks.

Changes

Cohort / File(s)	Summary
Cancellation utilities refactoring `vllm/entrypoints/utils.py`, `vllm/utils.py`	Removed decorator-based cancellation helpers (`with_cancellation`, `listen_for_disconnect`); added `iterate_with_cancellation` utility and enhanced `merge_async_iterators` with optional `is_cancelled` callback for polling-based cancellation checks.
API server endpoints `vllm/entrypoints/api_server.py`, `vllm/entrypoints/openai/api_server.py`	Replaced `@with_cancellation` decorator usage with direct calls to `iterate_with_cancellation` (api_server) or removed decorator entirely (openai/api_server).
OpenAI serving modules `vllm/entrypoints/openai/serving_*.py` (chat, completion, embedding, score)	Updated to pass `is_cancelled=raw_request.is_disconnected` callback to `merge_async_iterators` or wrapped generators with `iterate_with_cancellation` for client-disconnect-aware cancellation during request processing.
Async engine `vllm/engine/async_llm_engine.py`	Removed try/except blocks catching `asyncio.CancelledError` in generate, encode methods; cancellation now handled via utility layer instead of method-level abort logic.
Tests `tests/entrypoints/openai/test_basic.py`, `tests/test_utils.py`, `tests/utils.py`	Removed test_request_cancellation; added is_cancelled parameter test in test_utils.py; removed **kwargs from RemoteOpenAIServer.get_async_client.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Cancellation takes a simpler hop,
From decorators that guard each top,
Now utilities poll with grace,
When clients disconnect their trace,
A cleaner path through async space! 🌙

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'PR_001 - CodeRabbit' is vague and generic, providing no meaningful information about the changeset's actual purpose or main changes.	Replace the generic title with a descriptive summary of the main change, such as 'Revert request cancellation changes' or 'Remove iterate_with_cancellation implementation'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d1b9ba and 2533886.

📒 Files selected for processing (12)

tests/entrypoints/openai/test_basic.py
tests/test_utils.py
tests/utils.py
vllm/engine/async_llm_engine.py
vllm/entrypoints/api_server.py
vllm/entrypoints/openai/api_server.py
vllm/entrypoints/openai/serving_chat.py
vllm/entrypoints/openai/serving_completion.py
vllm/entrypoints/openai/serving_embedding.py
vllm/entrypoints/openai/serving_score.py
vllm/entrypoints/utils.py
vllm/utils.py

💤 Files with no reviewable changes (3)

tests/entrypoints/openai/test_basic.py
vllm/entrypoints/openai/api_server.py
vllm/entrypoints/utils.py

🧰 Additional context used

🧬 Code graph analysis (6)

vllm/entrypoints/openai/serving_score.py (1)

vllm/utils.py (1)

merge_async_iterators (408-453)

tests/test_utils.py (1)

vllm/utils.py (1)

merge_async_iterators (408-453)

vllm/entrypoints/openai/serving_completion.py (1)

vllm/utils.py (1)

merge_async_iterators (408-453)

vllm/entrypoints/openai/serving_chat.py (1)

vllm/utils.py (1)

iterate_with_cancellation (373-405)

vllm/entrypoints/api_server.py (1)

vllm/utils.py (2)

iterate_with_cancellation (373-405)

random_uuid (344-345)

vllm/entrypoints/openai/serving_embedding.py (1)

vllm/utils.py (1)

merge_async_iterators (408-453)

🔇 Additional comments (13)

vllm/utils.py (2)

373-406: Approve the new iterate_with_cancellation utility with a minor observation.

The implementation correctly polls for cancellation while iterating. The 1.5s timeout with 1s cancellation check intervals provides reasonable responsiveness to client disconnects.

One observation: when done is empty (timeout with no items ready), the loop correctly continues and rechecks cancellation without yielding, which is the expected behavior.

408-453: LGTM - backward-compatible cancellation support.

The is_cancelled parameter integration is well-designed:

Defaults to None for backward compatibility

Only sets timeout when cancellation checking is needed

Proper cleanup of remaining iterators in the finally block

tests/utils.py (1)

166-171: LGTM - simplified test client construction.

The removal of **kwargs forwarding and explicit max_retries=0 setting is appropriate for test utilities where fail-fast behavior is preferred. This aligns with the PR's broader cancellation handling changes.

tests/test_utils.py (1)

29-32: Test exercises cancellation infrastructure without triggering cancellation.

The is_cancelled=partial(asyncio.sleep, 0, result=False) creates a callable that returns an awaitable yielding False, which means the is_cancelled path in merge_async_iterators is exercised but never triggers cancellation. The test still validates cleanup behavior via the external task.cancel().

Consider adding a separate test case that exercises the actual is_cancelled returning True path to ensure asyncio.CancelledError("client cancelled") is raised correctly.

vllm/engine/async_llm_engine.py (2)

1150-1158: Consistent simplification with generate() method.

The same pattern change is applied here, removing explicit CancelledError handling and relying on the AsyncStream and upstream cancellation utilities for cleanup.

1068-1077: Cancellation cleanup has been refactored, not removed—verify this architectural change is intentional.

The try/except asyncio.CancelledError blocks were removed from generate() and encode(), but cancellation cleanup is still occurring through a different mechanism. When the async generator is closed (garbage collected or explicitly closed), GeneratorExit is caught in AsyncStream.generator() (line 118), which triggers self._cancel(request_id). This _cancel callable is actually RequestTracker.abort_request(), which puts the request in the abort queue and finishes the stream.

This is a refactoring toward centralized cancellation handling rather than a removal of cleanup logic. The docstring example (lines 1058–1066) correctly documents the expected pattern: clients should manually call await engine.abort(request_id) for explicit disconnection handling.

Verify this architectural shift—moving from scattered try/except blocks to centralized GeneratorExit handling in AsyncStream—is intentional and properly handles all cancellation scenarios (client disconnect, early iteration exit, request timeout).

vllm/entrypoints/openai/serving_score.py (1)

189-192: LGTM - proper cancellation integration.

The is_cancelled callback is correctly passed to merge_async_iterators, with appropriate handling for when raw_request is None. The asyncio.CancelledError catch block (lines 211-212) properly converts cancellation to a user-friendly error response.

vllm/entrypoints/openai/serving_completion.py (1)

162-163: LGTM - cancellation integration consistent with other entrypoints.

Since raw_request is a required parameter in create_completion() (line 62), directly passing raw_request.is_disconnected without a null check is correct. The asyncio.CancelledError is properly caught downstream (lines 214-215).

vllm/entrypoints/openai/serving_embedding.py (1)

205-208: Cancellation hook wired into merge.

Passing raw_request.is_disconnected into merge_async_iterators correctly enables client disconnect handling for embeddings.
vllm/entrypoints/openai/serving_chat.py (2)
35-35: Import aligns with new cancellation utility.

237-239: Verify streaming path handles CancelledError cleanly.

iterate_with_cancellation raises asyncio.CancelledError on disconnect; chat_completion_stream_generator only catches ValueError. Please confirm this doesn’t surface as noisy errors or unintended 500s, and consider explicitly swallowing CancelledError to end the stream quietly.
✅ Optional defensive tweak
-        except ValueError as e:
+        except asyncio.CancelledError:
+            return
+        except ValueError as e:
             # TODO: Use a vllm-specific Validation Error
             logger.exception("Error in chat completion stream generator.")
             data = self.create_streaming_error_response(str(e))
             yield f"data: {data}\n\n"
vllm/entrypoints/api_server.py (2)

23-24: Import updated for cancellation helper.

57-58: Cancellation-aware generator wiring looks good.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Revert "[Bugfix] Fix request cancellation without polling (vllm-proje…

2533886

…ct#11190)" This reverts commit 2d1b9ba.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

PR_001 - CodeRabbit#25

PR_001 - CodeRabbit#25
MitchLewis930 wants to merge 1 commit intobench/PR_001_basefrom
bench/PR_001_bug

MitchLewis930 commented Jan 21, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 21, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

MitchLewis930 commented Jan 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MitchLewis930 commented Jan 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 21, 2026 •

edited

Loading