Skip to content

Comments

PR_001 - CodeRabbit#25

Open
MitchLewis930 wants to merge 1 commit intobench/PR_001_basefrom
bench/PR_001_bug
Open

PR_001 - CodeRabbit#25
MitchLewis930 wants to merge 1 commit intobench/PR_001_basefrom
bench/PR_001_bug

Conversation

@MitchLewis930
Copy link
Collaborator

@MitchLewis930 MitchLewis930 commented Jan 21, 2026

PR_001

Summary by CodeRabbit

Release Notes

  • Refactor
    • Improved request cancellation handling to better detect and respond to client disconnections during streaming operations across all API endpoints (chat completions, completions, embeddings, and scoring).

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 21, 2026

📝 Walkthrough

Walkthrough

This PR refactors request cancellation handling by replacing a decorator-based approach (@with_cancellation) with utility functions (iterate_with_cancellation and enhanced merge_async_iterators). The with_cancellation decorator and related utilities are removed from vllm/entrypoints/utils.py, while new cancellation-aware utilities are added to vllm/utils.py. Multiple API endpoint files are updated to use the new utilities with client disconnect callbacks.

Changes

Cohort / File(s) Summary
Cancellation utilities refactoring
vllm/entrypoints/utils.py, vllm/utils.py
Removed decorator-based cancellation helpers (with_cancellation, listen_for_disconnect); added iterate_with_cancellation utility and enhanced merge_async_iterators with optional is_cancelled callback for polling-based cancellation checks.
API server endpoints
vllm/entrypoints/api_server.py, vllm/entrypoints/openai/api_server.py
Replaced @with_cancellation decorator usage with direct calls to iterate_with_cancellation (api_server) or removed decorator entirely (openai/api_server).
OpenAI serving modules
vllm/entrypoints/openai/serving_*.py (chat, completion, embedding, score)
Updated to pass is_cancelled=raw_request.is_disconnected callback to merge_async_iterators or wrapped generators with iterate_with_cancellation for client-disconnect-aware cancellation during request processing.
Async engine
vllm/engine/async_llm_engine.py
Removed try/except blocks catching asyncio.CancelledError in generate, encode methods; cancellation now handled via utility layer instead of method-level abort logic.
Tests
tests/entrypoints/openai/test_basic.py, tests/test_utils.py, tests/utils.py
Removed test_request_cancellation; added is_cancelled parameter test in test_utils.py; removed **kwargs from RemoteOpenAIServer.get_async_client.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Cancellation takes a simpler hop,
From decorators that guard each top,
Now utilities poll with grace,
When clients disconnect their trace,
A cleaner path through async space! 🌙

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'PR_001 - CodeRabbit' is vague and generic, providing no meaningful information about the changeset's actual purpose or main changes. Replace the generic title with a descriptive summary of the main change, such as 'Revert request cancellation changes' or 'Remove iterate_with_cancellation implementation'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d1b9ba and 2533886.

📒 Files selected for processing (12)
  • tests/entrypoints/openai/test_basic.py
  • tests/test_utils.py
  • tests/utils.py
  • vllm/engine/async_llm_engine.py
  • vllm/entrypoints/api_server.py
  • vllm/entrypoints/openai/api_server.py
  • vllm/entrypoints/openai/serving_chat.py
  • vllm/entrypoints/openai/serving_completion.py
  • vllm/entrypoints/openai/serving_embedding.py
  • vllm/entrypoints/openai/serving_score.py
  • vllm/entrypoints/utils.py
  • vllm/utils.py
💤 Files with no reviewable changes (3)
  • tests/entrypoints/openai/test_basic.py
  • vllm/entrypoints/openai/api_server.py
  • vllm/entrypoints/utils.py
🧰 Additional context used
🧬 Code graph analysis (6)
vllm/entrypoints/openai/serving_score.py (1)
vllm/utils.py (1)
  • merge_async_iterators (408-453)
tests/test_utils.py (1)
vllm/utils.py (1)
  • merge_async_iterators (408-453)
vllm/entrypoints/openai/serving_completion.py (1)
vllm/utils.py (1)
  • merge_async_iterators (408-453)
vllm/entrypoints/openai/serving_chat.py (1)
vllm/utils.py (1)
  • iterate_with_cancellation (373-405)
vllm/entrypoints/api_server.py (1)
vllm/utils.py (2)
  • iterate_with_cancellation (373-405)
  • random_uuid (344-345)
vllm/entrypoints/openai/serving_embedding.py (1)
vllm/utils.py (1)
  • merge_async_iterators (408-453)
🔇 Additional comments (13)
vllm/utils.py (2)

373-406: Approve the new iterate_with_cancellation utility with a minor observation.

The implementation correctly polls for cancellation while iterating. The 1.5s timeout with 1s cancellation check intervals provides reasonable responsiveness to client disconnects.

One observation: when done is empty (timeout with no items ready), the loop correctly continues and rechecks cancellation without yielding, which is the expected behavior.


408-453: LGTM - backward-compatible cancellation support.

The is_cancelled parameter integration is well-designed:

  • Defaults to None for backward compatibility
  • Only sets timeout when cancellation checking is needed
  • Proper cleanup of remaining iterators in the finally block
tests/utils.py (1)

166-171: LGTM - simplified test client construction.

The removal of **kwargs forwarding and explicit max_retries=0 setting is appropriate for test utilities where fail-fast behavior is preferred. This aligns with the PR's broader cancellation handling changes.

tests/test_utils.py (1)

29-32: Test exercises cancellation infrastructure without triggering cancellation.

The is_cancelled=partial(asyncio.sleep, 0, result=False) creates a callable that returns an awaitable yielding False, which means the is_cancelled path in merge_async_iterators is exercised but never triggers cancellation. The test still validates cleanup behavior via the external task.cancel().

Consider adding a separate test case that exercises the actual is_cancelled returning True path to ensure asyncio.CancelledError("client cancelled") is raised correctly.

vllm/engine/async_llm_engine.py (2)

1150-1158: Consistent simplification with generate() method.

The same pattern change is applied here, removing explicit CancelledError handling and relying on the AsyncStream and upstream cancellation utilities for cleanup.


1068-1077: Cancellation cleanup has been refactored, not removed—verify this architectural change is intentional.

The try/except asyncio.CancelledError blocks were removed from generate() and encode(), but cancellation cleanup is still occurring through a different mechanism. When the async generator is closed (garbage collected or explicitly closed), GeneratorExit is caught in AsyncStream.generator() (line 118), which triggers self._cancel(request_id). This _cancel callable is actually RequestTracker.abort_request(), which puts the request in the abort queue and finishes the stream.

This is a refactoring toward centralized cancellation handling rather than a removal of cleanup logic. The docstring example (lines 1058–1066) correctly documents the expected pattern: clients should manually call await engine.abort(request_id) for explicit disconnection handling.

Verify this architectural shift—moving from scattered try/except blocks to centralized GeneratorExit handling in AsyncStream—is intentional and properly handles all cancellation scenarios (client disconnect, early iteration exit, request timeout).

vllm/entrypoints/openai/serving_score.py (1)

189-192: LGTM - proper cancellation integration.

The is_cancelled callback is correctly passed to merge_async_iterators, with appropriate handling for when raw_request is None. The asyncio.CancelledError catch block (lines 211-212) properly converts cancellation to a user-friendly error response.

vllm/entrypoints/openai/serving_completion.py (1)

162-163: LGTM - cancellation integration consistent with other entrypoints.

Since raw_request is a required parameter in create_completion() (line 62), directly passing raw_request.is_disconnected without a null check is correct. The asyncio.CancelledError is properly caught downstream (lines 214-215).

vllm/entrypoints/openai/serving_embedding.py (1)

205-208: Cancellation hook wired into merge.

Passing raw_request.is_disconnected into merge_async_iterators correctly enables client disconnect handling for embeddings.

vllm/entrypoints/openai/serving_chat.py (2)

35-35: Import aligns with new cancellation utility.


237-239: Verify streaming path handles CancelledError cleanly.

iterate_with_cancellation raises asyncio.CancelledError on disconnect; chat_completion_stream_generator only catches ValueError. Please confirm this doesn’t surface as noisy errors or unintended 500s, and consider explicitly swallowing CancelledError to end the stream quietly.

✅ Optional defensive tweak
-        except ValueError as e:
+        except asyncio.CancelledError:
+            return
+        except ValueError as e:
             # TODO: Use a vllm-specific Validation Error
             logger.exception("Error in chat completion stream generator.")
             data = self.create_streaming_error_response(str(e))
             yield f"data: {data}\n\n"
vllm/entrypoints/api_server.py (2)

23-24: Import updated for cancellation helper.


57-58: Cancellation-aware generator wiring looks good.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant