Revert "[Bugfix] Fix request cancellation without polling (#11190)" Test#15
Revert "[Bugfix] Fix request cancellation without polling (#11190)" Test#15MitchLewis930 wants to merge 1 commit intobench/PR_001_basefrom
Conversation
📝 WalkthroughWalkthroughThe PR refactors request cancellation handling across the vllm codebase, replacing a decorator-based Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant APIEndpoint
participant IterateWithCancel
participant Generator
participant Engine
Note over Client,Engine: NEW: Explicit Wrapping Pattern
Client->>APIEndpoint: send request
APIEndpoint->>Generator: create result_generator
APIEndpoint->>IterateWithCancel: wrap with iterate_with_cancellation()
IterateWithCancel->>Generator: start iteration
loop Every ~1 second
IterateWithCancel->>Client: poll is_disconnected
alt Client disconnected
IterateWithCancel->>Generator: cancel task
IterateWithCancel->>IterateWithCancel: raise CancelledError
else Still connected
IterateWithCancel->>Generator: yield item
IterateWithCancel->>APIEndpoint: forward to client
end
end
sequenceDiagram
participant Client
participant APIEndpoint as OpenAI<br/>Endpoint
participant Handler as Handler<br/>Function
participant Engine as AsyncEngine
Note over Client,Engine: OLD: Decorator Pattern (REMOVED)
Client->>APIEndpoint: send request
APIEndpoint->>Handler: `@with_cancellation` handler()
Handler->>Engine: generate()
par Handler Path
Handler->>Engine: await result
and Listen Path
APIEndpoint->>APIEndpoint: listen_for_disconnect()
APIEndpoint->>Client: poll http.disconnect
end
alt Disconnect detected
APIEndpoint->>Handler: cancel task
APIEndpoint->>APIEndpoint: return None
else Handler completes
Handler->>APIEndpoint: return result
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes The PR involves heterogeneous changes across 11 files: removal of an entire utility module, introduction of new async generator logic with polling mechanisms, signature updates across multiple serving modules, and simplifications to engine methods. While individual changes are relatively straightforward, the scope, pattern diversity (removals, additions, and refactoring), and the architectural shift from decorators to explicit wrapping require careful verification across multiple integration points. Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@vllm/utils.py`:
- Around line 373-406: The asyncio.wait timeout in iterate_with_cancellation is
set to 1.5 which can delay cancellation polling beyond the documented "at least
once per second"; update the timeout passed to asyncio.wait to 1.0 in
iterate_with_cancellation (and likewise in merge_async_iterators) so the
is_cancelled() check runs at least once per second; ensure the change targets
the asyncio.wait(...) calls inside the iterate_with_cancellation and
merge_async_iterators functions and nothing else.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
tests/entrypoints/openai/test_basic.pytests/test_utils.pytests/utils.pyvllm/engine/async_llm_engine.pyvllm/entrypoints/api_server.pyvllm/entrypoints/openai/api_server.pyvllm/entrypoints/openai/serving_chat.pyvllm/entrypoints/openai/serving_completion.pyvllm/entrypoints/openai/serving_embedding.pyvllm/entrypoints/openai/serving_score.pyvllm/entrypoints/utils.pyvllm/utils.py
💤 Files with no reviewable changes (3)
- vllm/entrypoints/utils.py
- vllm/entrypoints/openai/api_server.py
- tests/entrypoints/openai/test_basic.py
🧰 Additional context used
🧬 Code graph analysis (7)
vllm/entrypoints/openai/serving_completion.py (1)
vllm/utils.py (1)
merge_async_iterators(408-453)
vllm/entrypoints/openai/serving_chat.py (1)
vllm/utils.py (1)
iterate_with_cancellation(373-405)
vllm/engine/async_llm_engine.py (2)
vllm/engine/llm_engine.py (5)
add_request(681-692)add_request(696-708)add_request(714-814)LLMEngine(137-2046)validate_output(184-196)vllm/outputs.py (2)
RequestOutput(83-329)PoolingRequestOutput(335-378)
vllm/entrypoints/openai/serving_score.py (1)
vllm/utils.py (1)
merge_async_iterators(408-453)
vllm/entrypoints/api_server.py (1)
vllm/utils.py (2)
iterate_with_cancellation(373-405)random_uuid(344-345)
vllm/entrypoints/openai/serving_embedding.py (1)
vllm/utils.py (1)
merge_async_iterators(408-453)
tests/test_utils.py (1)
vllm/utils.py (1)
merge_async_iterators(408-453)
🔇 Additional comments (8)
tests/utils.py (1)
166-171: No breaking change: all call sites already invokeget_async_client()without arguments.Examined all 30+ call sites across the test suite—none pass kwargs or custom AsyncOpenAI options. The hardcoding of
base_url,api_key, andmax_retries=0is safe and aligns with actual usage.vllm/engine/async_llm_engine.py (1)
1068-1077: Confirm cancellation propagation is acceptable for public async APIs.With the removal of the
CancelledErrorhandling here, any cancellation raised by the underlying async stream (e.g., on early termination) will now bubble to callers. Please verify that downstream callers are prepared for this behavior change (or that it is explicitly documented/tested).Also applies to: 1150-1158
tests/test_utils.py (1)
29-32: LGTM: test now exercises the newis_cancelledsignature.vllm/entrypoints/api_server.py (1)
57-59: Nice: cancellation-aware wrapper applied to the results generator.vllm/entrypoints/openai/serving_embedding.py (1)
205-208: LGTM: propagates client-disconnect cancellation into the merged stream.vllm/entrypoints/openai/serving_completion.py (1)
162-163: LGTM: cancellation signal wired into merged iterator.vllm/entrypoints/openai/serving_chat.py (1)
237-239: Cancellation-aware wrapping is clean and aligned.This integrates the new helper in the right spot and keeps disconnect handling centralized before streaming/full generation.
vllm/entrypoints/openai/serving_score.py (1)
189-192: Good integration of cancellation callback.Passing
raw_request.is_disconnectedintomerge_async_iteratorscorrectly opts into the new cancellation pathway.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| async def iterate_with_cancellation( | ||
| iterator: AsyncGenerator[T, None], | ||
| is_cancelled: Callable[[], Awaitable[bool]], | ||
| ) -> AsyncGenerator[T, None]: | ||
| """Convert async iterator into one that polls the provided function | ||
| at least once per second to check for client cancellation. | ||
| """ | ||
|
|
||
| loop = asyncio.get_running_loop() | ||
|
|
||
| awaits: List[Future[T]] = [_next_task(iterator, loop)] | ||
| next_cancel_check: float = 0 | ||
| while True: | ||
| done, pending = await asyncio.wait(awaits, timeout=1.5) | ||
|
|
||
| # Check for cancellation at most once per second | ||
| time_now = time.time() | ||
| if time_now >= next_cancel_check: | ||
| if await is_cancelled(): | ||
| with contextlib.suppress(BaseException): | ||
| awaits[0].cancel() | ||
| await iterator.aclose() | ||
| raise asyncio.CancelledError("client cancelled") | ||
| next_cancel_check = time_now + 1 | ||
|
|
||
| if done: | ||
| try: | ||
| item = await awaits[0] | ||
| awaits[0] = _next_task(iterator, loop) | ||
| yield item | ||
| except StopAsyncIteration: | ||
| # we are done | ||
| return | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find vllm/utils.py and check its size
wc -l vllm/utils.pyRepository: Signal65/vllm-code-review
Length of output: 87
🏁 Script executed:
# Read the relevant sections
sed -n '373,406p' vllm/utils.py
echo "---"
sed -n '425,438p' vllm/utils.pyRepository: Signal65/vllm-code-review
Length of output: 1933
🏁 Script executed:
# Also check the docstrings and context around these functions
sed -n '365,445p' vllm/utils.pyRepository: Signal65/vllm-code-review
Length of output: 3071
Align cancellation polling interval with the stated "once per second" contract.
Both iterate_with_cancellation (line 385) and merge_async_iterators (line 432) use timeout=1.5 with asyncio.wait(), which can delay cancellation checks beyond 1 second and violates the docstring promise of "at least once per second" checks. Change both timeouts to 1.0 to guarantee the stated contract.
Proposed fix
In iterate_with_cancellation (line 385):
- done, pending = await asyncio.wait(awaits, timeout=1.5)
+ done, pending = await asyncio.wait(awaits, timeout=1.0)In merge_async_iterators (line 432):
- timeout = None if is_cancelled is None else 1.5
+ timeout = None if is_cancelled is None else 1.0🤖 Prompt for AI Agents
In `@vllm/utils.py` around lines 373 - 406, The asyncio.wait timeout in
iterate_with_cancellation is set to 1.5 which can delay cancellation polling
beyond the documented "at least once per second"; update the timeout passed to
asyncio.wait to 1.0 in iterate_with_cancellation (and likewise in
merge_async_iterators) so the is_cancelled() check runs at least once per
second; ensure the change targets the asyncio.wait(...) calls inside the
iterate_with_cancellation and merge_async_iterators functions and nothing else.
This reverts commit 2d1b9ba.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)
Summary by CodeRabbit
Release Notes
Bug Fixes
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.