Revert "[Bugfix] Fix request cancellation without polling (#11190)" Test by MitchLewis930 · Pull Request #15 · Signal65/vllm-code-review

MitchLewis930 · 2026-01-21T17:10:27Z

This reverts commit 2d1b9ba.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved handling of client disconnections during streaming operations across chat completions, embeddings, tokenization, and scoring endpoints. The server now more reliably stops processing when clients disconnect mid-request, reducing unnecessary resource consumption.
Refactor
- Internal refactoring of request cancellation logic for cleaner, more maintainable code architecture.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ct#11190)" This reverts commit 2d1b9ba.

coderabbitai · 2026-01-21T18:18:58Z

📝 Walkthrough

Walkthrough

The PR refactors request cancellation handling across the vllm codebase, replacing a decorator-based with_cancellation pattern with explicit wrapping using iterate_with_cancellation and an enhanced merge_async_iterators that accepts an is_cancelled callback. Correspondingly, explicit cancellation exception handling is removed from the async engine.

Changes

Cohort / File(s)	Summary
Test Cleanup `tests/entrypoints/openai/test_basic.py`	Removed entire `test_request_cancellation` test and related asyncio/openai imports; retained `test_show_version` and `test_check_health`
Test Utilities `tests/test_utils.py`, `tests/utils.py`	Updated test setup to pass `is_cancelled` argument to `merge_async_iterators` via `partial(asyncio.sleep, ...)`; refactored `RemoteOpenAIServer.get_async_client()` to remove `\\\\kwargs` and hard-code `max_retries=0`
Cancellation Utilities Removal `vllm/entrypoints/utils.py`	Removed entire module including `listen_for_disconnect()` and `with_cancellation()` decorator functions
Core Cancellation Utilities `vllm/utils.py`	Added `iterate_with_cancellation()` async generator; extended `merge_async_iterators()` signature to accept optional `is_cancelled: Callable[[], Awaitable[bool]]` parameter with periodic polling
API Server Refactoring `vllm/entrypoints/api_server.py`	Removed `_generate()` wrapper and `with_cancellation` decorator; now wraps generator inline using `iterate_with_cancellation()` with `request.is_disconnected`
OpenAI Endpoints `vllm/entrypoints/openai/api_server.py`	Removed `@with_cancellation` decorator from 7 route handlers (tokenize, detokenize, chat/completions, completions, embeddings, score endpoints); removed corresponding import
Serving Modules `vllm/entrypoints/openai/serving_*.py` (4 files)	Updated cancellation handling: serving_chat.py and serving_score.py now wrap generators with `iterate_with_cancellation()`; serving_completion.py and serving_embedding.py pass `is_cancelled` to `merge_async_iterators()` using `raw_request.is_disconnected`
Async Engine `vllm/engine/async_llm_engine.py`	Removed try/except `asyncio.CancelledError` blocks from `generate()` and `encode()` methods; functions now directly iterate and yield results without explicit cancellation handling

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant APIEndpoint
    participant IterateWithCancel
    participant Generator
    participant Engine

    Note over Client,Engine: NEW: Explicit Wrapping Pattern
    Client->>APIEndpoint: send request
    APIEndpoint->>Generator: create result_generator
    APIEndpoint->>IterateWithCancel: wrap with iterate_with_cancellation()
    IterateWithCancel->>Generator: start iteration
    loop Every ~1 second
        IterateWithCancel->>Client: poll is_disconnected
        alt Client disconnected
            IterateWithCancel->>Generator: cancel task
            IterateWithCancel->>IterateWithCancel: raise CancelledError
        else Still connected
            IterateWithCancel->>Generator: yield item
            IterateWithCancel->>APIEndpoint: forward to client
        end
    end

sequenceDiagram
    participant Client
    participant APIEndpoint as OpenAI<br/>Endpoint
    participant Handler as Handler<br/>Function
    participant Engine as AsyncEngine

    Note over Client,Engine: OLD: Decorator Pattern (REMOVED)
    Client->>APIEndpoint: send request
    APIEndpoint->>Handler: `@with_cancellation` handler()
    Handler->>Engine: generate()
    par Handler Path
        Handler->>Engine: await result
    and Listen Path
        APIEndpoint->>APIEndpoint: listen_for_disconnect()
        APIEndpoint->>Client: poll http.disconnect
    end
    alt Disconnect detected
        APIEndpoint->>Handler: cancel task
        APIEndpoint->>APIEndpoint: return None
    else Handler completes
        Handler->>APIEndpoint: return result
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

The PR involves heterogeneous changes across 11 files: removal of an entire utility module, introduction of new async generator logic with polling mechanisms, signature updates across multiple serving modules, and simplifications to engine methods. While individual changes are relatively straightforward, the scope, pattern diversity (removals, additions, and refactoring), and the architectural shift from decorators to explicit wrapping require careful verification across multiple integration points.

Poem

🐰 A decorator's tale, now wrapped up tight,
No more listening shadows in the night,
Cancellation flows with explicit care,
As generators dance through the air,
Simpler patterns, cleaner and bright! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly describes the main action: reverting a specific bugfix commit (`#11190`) related to request cancellation without polling.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@vllm/utils.py`:
- Around line 373-406: The asyncio.wait timeout in iterate_with_cancellation is
set to 1.5 which can delay cancellation polling beyond the documented "at least
once per second"; update the timeout passed to asyncio.wait to 1.0 in
iterate_with_cancellation (and likewise in merge_async_iterators) so the
is_cancelled() check runs at least once per second; ensure the change targets
the asyncio.wait(...) calls inside the iterate_with_cancellation and
merge_async_iterators functions and nothing else.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d1b9ba and 2533886.

📒 Files selected for processing (12)

tests/entrypoints/openai/test_basic.py
tests/test_utils.py
tests/utils.py
vllm/engine/async_llm_engine.py
vllm/entrypoints/api_server.py
vllm/entrypoints/openai/api_server.py
vllm/entrypoints/openai/serving_chat.py
vllm/entrypoints/openai/serving_completion.py
vllm/entrypoints/openai/serving_embedding.py
vllm/entrypoints/openai/serving_score.py
vllm/entrypoints/utils.py
vllm/utils.py

💤 Files with no reviewable changes (3)

vllm/entrypoints/utils.py
vllm/entrypoints/openai/api_server.py
tests/entrypoints/openai/test_basic.py

🧰 Additional context used

🧬 Code graph analysis (7)

vllm/entrypoints/openai/serving_completion.py (1)

vllm/utils.py (1)

merge_async_iterators (408-453)

vllm/entrypoints/openai/serving_chat.py (1)

vllm/utils.py (1)

iterate_with_cancellation (373-405)

vllm/engine/async_llm_engine.py (2)

vllm/engine/llm_engine.py (5)

add_request (681-692)

add_request (696-708)

add_request (714-814)

LLMEngine (137-2046)

validate_output (184-196)

vllm/outputs.py (2)

RequestOutput (83-329)

PoolingRequestOutput (335-378)

vllm/entrypoints/openai/serving_score.py (1)

vllm/utils.py (1)

merge_async_iterators (408-453)

vllm/entrypoints/api_server.py (1)

vllm/utils.py (2)

iterate_with_cancellation (373-405)

random_uuid (344-345)

vllm/entrypoints/openai/serving_embedding.py (1)

vllm/utils.py (1)

merge_async_iterators (408-453)

tests/test_utils.py (1)

vllm/utils.py (1)

merge_async_iterators (408-453)

🔇 Additional comments (8)

tests/utils.py (1)

166-171: No breaking change: all call sites already invoke get_async_client() without arguments.

Examined all 30+ call sites across the test suite—none pass kwargs or custom AsyncOpenAI options. The hardcoding of base_url, api_key, and max_retries=0 is safe and aligns with actual usage.

vllm/engine/async_llm_engine.py (1)

1068-1077: Confirm cancellation propagation is acceptable for public async APIs.

With the removal of the CancelledError handling here, any cancellation raised by the underlying async stream (e.g., on early termination) will now bubble to callers. Please verify that downstream callers are prepared for this behavior change (or that it is explicitly documented/tested).

Also applies to: 1150-1158

tests/test_utils.py (1)

29-32: LGTM: test now exercises the new is_cancelled signature.

vllm/entrypoints/api_server.py (1)

57-59: Nice: cancellation-aware wrapper applied to the results generator.

vllm/entrypoints/openai/serving_embedding.py (1)

205-208: LGTM: propagates client-disconnect cancellation into the merged stream.

vllm/entrypoints/openai/serving_completion.py (1)

162-163: LGTM: cancellation signal wired into merged iterator.

vllm/entrypoints/openai/serving_chat.py (1)

237-239: Cancellation-aware wrapping is clean and aligned.

This integrates the new helper in the right spot and keeps disconnect handling centralized before streaming/full generation.

vllm/entrypoints/openai/serving_score.py (1)

189-192: Good integration of cancellation callback.

Passing raw_request.is_disconnected into merge_async_iterators correctly opts into the new cancellation pathway.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-21T18:25:16Z

vllm/utils.py

+async def iterate_with_cancellation(
+    iterator: AsyncGenerator[T, None],
+    is_cancelled: Callable[[], Awaitable[bool]],
+) -> AsyncGenerator[T, None]:
+    """Convert async iterator into one that polls the provided function
+    at least once per second to check for client cancellation.
+    """
+
+    loop = asyncio.get_running_loop()
+
+    awaits: List[Future[T]] = [_next_task(iterator, loop)]
+    next_cancel_check: float = 0
+    while True:
+        done, pending = await asyncio.wait(awaits, timeout=1.5)
+
+        # Check for cancellation at most once per second
+        time_now = time.time()
+        if time_now >= next_cancel_check:
+            if await is_cancelled():
+                with contextlib.suppress(BaseException):
+                    awaits[0].cancel()
+                    await iterator.aclose()
+                raise asyncio.CancelledError("client cancelled")
+            next_cancel_check = time_now + 1
+
+        if done:
+            try:
+                item = await awaits[0]
+                awaits[0] = _next_task(iterator, loop)
+                yield item
+            except StopAsyncIteration:
+                # we are done
+                return
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find vllm/utils.py and check its size wc -l vllm/utils.py

Repository: Signal65/vllm-code-review

Length of output: 87

🏁 Script executed:

# Read the relevant sections sed -n '373,406p' vllm/utils.py echo "---" sed -n '425,438p' vllm/utils.py

Repository: Signal65/vllm-code-review

Length of output: 1933

🏁 Script executed:

# Also check the docstrings and context around these functions sed -n '365,445p' vllm/utils.py

Repository: Signal65/vllm-code-review

Length of output: 3071

Align cancellation polling interval with the stated "once per second" contract.

Both iterate_with_cancellation (line 385) and merge_async_iterators (line 432) use timeout=1.5 with asyncio.wait(), which can delay cancellation checks beyond 1 second and violates the docstring promise of "at least once per second" checks. Change both timeouts to 1.0 to guarantee the stated contract.

Proposed fix

In iterate_with_cancellation (line 385):

- done, pending = await asyncio.wait(awaits, timeout=1.5) + done, pending = await asyncio.wait(awaits, timeout=1.0)

In merge_async_iterators (line 432):

- timeout = None if is_cancelled is None else 1.5 + timeout = None if is_cancelled is None else 1.0

🤖 Prompt for AI Agents

In `@vllm/utils.py` around lines 373 - 406, The asyncio.wait timeout in iterate_with_cancellation is set to 1.5 which can delay cancellation polling beyond the documented "at least once per second"; update the timeout passed to asyncio.wait to 1.0 in iterate_with_cancellation (and likewise in merge_async_iterators) so the is_cancelled() check runs at least once per second; ensure the change targets the asyncio.wait(...) calls inside the iterate_with_cancellation and merge_async_iterators functions and nothing else.

Revert "[Bugfix] Fix request cancellation without polling (vllm-proje…

2533886

…ct#11190)" This reverts commit 2d1b9ba.

MitchLewis930 marked this pull request as draft January 21, 2026 18:18

MitchLewis930 marked this pull request as ready for review January 21, 2026 18:18

coderabbitai bot reviewed Jan 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Revert "[Bugfix] Fix request cancellation without polling (#11190)" Test#15

Revert "[Bugfix] Fix request cancellation without polling (#11190)" Test#15
MitchLewis930 wants to merge 1 commit intobench/PR_001_basefrom
bench/PR_001_bug__coderabbit

MitchLewis930 commented Jan 21, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 21, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

MitchLewis930 commented Jan 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MitchLewis930 commented Jan 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 21, 2026 •

edited

Loading