Revert "[Bugfix] Use heartbeats instead of health checks (#8583)" Test#16
Revert "[Bugfix] Use heartbeats instead of health checks (#8583)" Test#16MitchLewis930 wants to merge 1 commit intobench/PR_002_basefrom
Conversation
📝 WalkthroughWalkthroughThe changes replace a heartbeat-based health signaling mechanism with a dedicated RPC-driven health-check system. A new Changes
Sequence DiagramsequenceDiagram
participant Client
participant Engine
participant HealthChannel
Client->>HealthChannel: Poll health_socket (idle)
alt Health Check Needed
Client->>Engine: Send RPCHealthRequest
Engine->>Engine: Receive & process health request
Engine->>Engine: Call engine.check_health()
alt Engine Healthy
Engine->>HealthChannel: Send HEALTHY via health_socket
else Engine Unhealthy
Engine->>HealthChannel: Send UNHEALTHY via health_socket
end
HealthChannel->>Client: Receive health status
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@vllm/engine/multiprocessing/engine.py`:
- Around line 282-288: In _handle_health_request, when self._errored_with is set
you currently call self._send_unhealthy(self._errored_with) but then continue to
run self.engine.check_health() and self._send_healthy(), which can mask the
error; fix by returning immediately after calling
self._send_unhealthy(self._errored_with) so that _handle_health_request does not
invoke engine.check_health() or _send_healthy in the errored path (update the
_handle_health_request method to short-circuit on self._errored_with).
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/mq_llm_engine/test_error_handling.pyvllm/engine/multiprocessing/__init__.pyvllm/engine/multiprocessing/client.pyvllm/engine/multiprocessing/engine.py
🧰 Additional context used
🧬 Code graph analysis (1)
vllm/engine/multiprocessing/client.py (1)
vllm/engine/multiprocessing/__init__.py (5)
RPCError(35-38)RPCHealthRequest(46-47)RPCProcessRequest(25-31)RPCStartupRequest(50-51)RPCStartupResponse(55-56)
🔇 Additional comments (7)
tests/mq_llm_engine/test_error_handling.py (1)
156-175: Delayed abort synchronization improves test determinism.Scheduling the abort after a short delay and awaiting the task keeps the abort within the generation window and avoids races with the subsequent health check; the larger
max_tokenssupports that.vllm/engine/multiprocessing/__init__.py (1)
46-60: RPCHealthRequest integration looks consistent.Adding the type and extending
RPC_REQUEST_Taligns the public protocol with the new health-check RPC flow.vllm/engine/multiprocessing/client.py (1)
19-25: Health-check loop wiring looks coherent.The new
health_socket, proactiveRPCHealthRequestprobes, and setup launch of the check loop fit together cleanly.Also applies to: 98-101, 127-155, 236-237
vllm/engine/multiprocessing/engine.py (4)
14-20: Health socket wiring aligns with the new health channel.Importing
RPCHealthRequestand binding a dedicated health socket match the client-side health path.Also applies to: 94-96
201-214: RPCError emission on engine_step exceptions is solid.Sending
RPCErrorbefore re-raising ensures clients see the root failure.
232-235: Health requests are now handled in the input loop.Routing
RPCHealthRequestthrough_handle_health_requestkeeps health checks responsive.
297-303: Health responses on the health socket look correct.Using the dedicated health socket for healthy/unhealthy responses matches the client’s poll loop.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| def _handle_health_request(self): | ||
| if self._errored_with is not None: | ||
| self._send_unhealthy(self._errored_with) | ||
|
|
||
| # Check for life of the main loop | ||
| elif time.time() - self._last_alive_time > self.last_alive_threshold: | ||
| self._send_unhealthy(RuntimeError("Engine loop has died")) | ||
|
|
||
| else: | ||
| # Otherwise- check health of the engine | ||
| # self.engine.check_health() raises on unhealthy | ||
| try: | ||
| self.engine.check_health() | ||
| self._send_healthy() | ||
| except Exception as e: | ||
| self._set_errored(e) | ||
| self._send_unhealthy(e) | ||
| # Raises error if unhealthy. | ||
| self.engine.check_health() | ||
| self._send_healthy() |
There was a problem hiding this comment.
Return after reporting unhealthy state.
When _errored_with is set, the method sends UNHEALTHY but then still runs check_health() and can send HEALTHY, which can mask failures.
🔧 Proposed fix
def _handle_health_request(self):
if self._errored_with is not None:
self._send_unhealthy(self._errored_with)
+ return
# Raises error if unhealthy.
self.engine.check_health()
self._send_healthy()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _handle_health_request(self): | |
| if self._errored_with is not None: | |
| self._send_unhealthy(self._errored_with) | |
| # Check for life of the main loop | |
| elif time.time() - self._last_alive_time > self.last_alive_threshold: | |
| self._send_unhealthy(RuntimeError("Engine loop has died")) | |
| else: | |
| # Otherwise- check health of the engine | |
| # self.engine.check_health() raises on unhealthy | |
| try: | |
| self.engine.check_health() | |
| self._send_healthy() | |
| except Exception as e: | |
| self._set_errored(e) | |
| self._send_unhealthy(e) | |
| # Raises error if unhealthy. | |
| self.engine.check_health() | |
| self._send_healthy() | |
| def _handle_health_request(self): | |
| if self._errored_with is not None: | |
| self._send_unhealthy(self._errored_with) | |
| return | |
| # Raises error if unhealthy. | |
| self.engine.check_health() | |
| self._send_healthy() |
🤖 Prompt for AI Agents
In `@vllm/engine/multiprocessing/engine.py` around lines 282 - 288, In
_handle_health_request, when self._errored_with is set you currently call
self._send_unhealthy(self._errored_with) but then continue to run
self.engine.check_health() and self._send_healthy(), which can mask the error;
fix by returning immediately after calling
self._send_unhealthy(self._errored_with) so that _handle_health_request does not
invoke engine.check_health() or _send_healthy in the errored path (update the
_handle_health_request method to short-circuit on self._errored_with).
This reverts commit 6e0c9d6.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.