[Bugfix] Improve GPU validation logging in Ray fallback scenarios (#2… by MitchLewis930 · Pull Request #29 · Signal65/vllm-code-review

MitchLewis930 · 2026-01-24T01:23:54Z

test

Summary by CodeRabbit

Bug Fixes
- Enhanced error messages when GPU allocation fails, providing specific GPU requirements and remediation guidance.
- Added pre-validation warnings during Ray cluster initialization to detect insufficient GPU availability and prevent runtime failures.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…lm-project#25775) Signed-off-by: Sairam Pillai <sairam.pillai61@gmail.com>

coderabbitai · 2026-01-24T01:24:27Z

📝 Walkthrough

Walkthrough

These changes enhance error handling and resource validation in Ray-based parallel execution. A Ray-availability check is replaced with explicit GPU validation in configuration, while Ray cluster initialization gains improved error messaging and GPU availability pre-checks to provide users with clearer diagnostics.

Changes

Cohort / File(s)	Summary
GPU Validation in Parallel Config `vllm/config/parallel.py`	Removed conditional Ray availability check for multi-node inference. Replaced with direct ValueError when CUDA tensor world size exceeds available GPU count, including GPU count details in the error message.
Ray Utilities Enhanced Diagnostics `vllm/v1/executor/ray_utils.py`	Improved `_wait_until_pg_ready_` with computed GPU requirements from placement group specs and detailed error messages including tensor parallel guidance. Added pre-validation in `_initialize_ray_cluster_` that logs warnings when world size exceeds available GPUs before Ray processing begins.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 No more silent Ray falls,
Clear GPU counts now on the walls,
Pre-checks warn with wisdom's call,
Better errors catch them all,
Parallel paths need not stall! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: improving GPU validation logging in Ray fallback scenarios, which aligns with modifications to GPU validation and error handling in both config/parallel.py and v1/executor/ray_utils.py.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@vllm/v1/executor/ray_utils.py`:
- Around line 323-338: Update the incorrect import and clarify the warning:
change the import of cuda_device_count_stateless to come from
vllm.utils.torch_utils (replace "from vllm.utils import
cuda_device_count_stateless" with the correct module import), and update the
logger.warning message that references parallel_config.world_size to state
"world_size (tensor_parallel_size × pipeline_parallel_size)" instead of "Tensor
parallel size" so it accurately describes parallel_config.world_size when
logging available_gpus vs parallel_config.world_size in the prevalidation block
that uses cuda_device_count_stateless and parallel_config.world_size.

🧹 Nitpick comments (1)

vllm/config/parallel.py (1)

521-528: Redundant call to cuda_device_count_stateless() and misleading error message.

Two issues:

cuda_device_count_stateless() is called on line 522 for the condition check and again on line 524. While the function is cached, this is still redundant.
The error message refers to "Tensor parallel size" but self.world_size is TPxPP (tensor × pipeline parallel), which could mislead users using pipeline parallelism.

Suggested fix

-            elif (
-                current_platform.is_cuda()
-                and cuda_device_count_stateless() < self.world_size
-            ):
-                gpu_count = cuda_device_count_stateless()
+            elif current_platform.is_cuda():
+                gpu_count = cuda_device_count_stateless()
+                if gpu_count < self.world_size:
+                    raise ValueError(
+                        f"world_size ({self.world_size}) cannot be larger than "
+                        f"the number of available GPUs ({gpu_count}). "
+                        f"Reduce tensor_parallel_size or pipeline_parallel_size."
+                    )
+            elif self.data_parallel_backend == "ray":
-                raise ValueError(
-                    f"Tensor parallel size ({self.world_size}) cannot be "
-                    f"larger than the number of available GPUs ({gpu_count})."
-                )

Note: The diff above shows the conceptual change. The actual fix needs to preserve the subsequent elif branches properly.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c01f6e5 and 7437438.

📒 Files selected for processing (2)

vllm/config/parallel.py
vllm/v1/executor/ray_utils.py

🧰 Additional context used

🧬 Code graph analysis (2)

vllm/config/parallel.py (1)

vllm/utils/torch_utils.py (1)

cuda_device_count_stateless (414-424)

vllm/v1/executor/ray_utils.py (1)

vllm/utils/torch_utils.py (1)

cuda_device_count_stateless (414-424)

🪛 Ruff (0.14.13)

vllm/config/parallel.py

525-528: Avoid specifying long messages outside the exception class

(TRY003)

vllm/v1/executor/ray_utils.py

266-276: Avoid specifying long messages outside the exception class

(TRY003)

278-284: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)

vllm/v1/executor/ray_utils.py (1)

258-284: LGTM! Well-structured error messaging for GPU resource failures.

The conditional logic to provide more detailed error messages for multi-GPU scenarios (> 1 GPU required) is a good UX improvement. The guidance about tensor parallel size and cluster resources will help users diagnose placement group allocation failures more quickly.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-24T01:30:03Z

vllm/v1/executor/ray_utils.py

+    # Prevalidate GPU requirements before Ray processing
+    if current_platform.is_cuda() and parallel_config.world_size > 1:
+        from vllm.utils import cuda_device_count_stateless
+
+        available_gpus = cuda_device_count_stateless()
+        if parallel_config.world_size > available_gpus:
+            logger.warning(
+                "Tensor parallel size (%d) exceeds available GPUs (%d). "
+                "This may result in Ray placement group allocation failures. "
+                "Consider reducing tensor_parallel_size to %d or less, "
+                "or ensure your Ray cluster has %d GPUs available.",
+                parallel_config.world_size,
+                available_gpus,
+                available_gpus,
+                parallel_config.world_size,
+            )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check if cuda_device_count_stateless is exported from vllm.utils rg -n "cuda_device_count_stateless" --type py vllm/utils/ | head -20

Repository: Signal65/vllm-code-review

Length of output: 349

🏁 Script executed:

# Check the parallel_config definition to understand world_size fd -e py -p "parallel_config" vllm/ | head -5

Repository: Signal65/vllm-code-review

Length of output: 51

🏁 Script executed:

# Search for world_size definition in parallel_config rg -n "world_size" --type py vllm/ -A 2 | grep -E "(def|world_size|=)" | head -30

Repository: Signal65/vllm-code-review

Length of output: 2699

🏁 Script executed:

# Check the specific file mentioned in the review cat -n vllm/v1/executor/ray_utils.py | sed -n '320,340p'

Repository: Signal65/vllm-code-review

Length of output: 1108

🏁 Script executed:

# Check if cuda_device_count_stateless is re-exported from vllm/utils/__init__.py rg -n "cuda_device_count_stateless" vllm/utils/__init__.py

Repository: Signal65/vllm-code-review

Length of output: 51

🏁 Script executed:

# Find the ParallelConfig class definition fd -e py "parallel_config" vllm/ -x grep -l "class.*ParallelConfig"

Repository: Signal65/vllm-code-review

Length of output: 51

🏁 Script executed:

# Search for world_size property or attribute definition in parallel config rg -n "def world_size|world_size\s*=" --type py vllm/ -B 2 -A 2 | head -50

Repository: Signal65/vllm-code-review

Length of output: 3976

🏁 Script executed:

# Search for ParallelConfig class definition more broadly rg -n "class ParallelConfig" --type py vllm/

Repository: Signal65/vllm-code-review

Length of output: 117

🏁 Script executed:

# Check if cuda_device_count_stateless is available via direct import from torch_utils cat vllm/utils/torch_utils.py | grep -n "def cuda_device_count_stateless" -A 10

Repository: Signal65/vllm-code-review

Length of output: 628

🏁 Script executed:

# Search vllm/utils/__init__.py to see what is exported head -100 vllm/utils/__init__.py

Repository: Signal65/vllm-code-review

Length of output: 3202

🏁 Script executed:

# Check how parallel_config.world_size is used/defined rg -n "parallel_config\.world_size" --type py vllm/ -B 1 -A 1 | head -40

Repository: Signal65/vllm-code-review

Length of output: 2649

🏁 Script executed:

# Check the ParallelConfig class definition to see world_size sed -n '68,200p' vllm/config/parallel.py

Repository: Signal65/vllm-code-review

Length of output: 6774

🏁 Script executed:

# Verify the import path - try to find if cuda_device_count_stateless is actually accessible via vllm.utils rg -n "from vllm\.utils import cuda_device_count_stateless" --type py vllm/

Repository: Signal65/vllm-code-review

Length of output: 161

🏁 Script executed:

# Check what's actually exported from vllm/utils or torch_utils rg -n "__all__" vllm/utils/torch_utils.py

Repository: Signal65/vllm-code-review

Length of output: 51

🏁 Script executed:

# Search for world_size as a property or method in ParallelConfig rg -n "def world_size|@property" vllm/config/parallel.py -A 3 | head -50

Repository: Signal65/vllm-code-review

Length of output: 637

🏁 Script executed:

# Check if there's a __getattr__ or similar in ParallelConfig rg -n "__getattr__|world_size" vllm/config/parallel.py

Repository: Signal65/vllm-code-review

Length of output: 982

🏁 Script executed:

# Look at the full ParallelConfig class to find world_size cat vllm/config/parallel.py | grep -n "world_size" -B 2 -A 2

Repository: Signal65/vllm-code-review

Length of output: 2298

🏁 Script executed:

# Check if cuda_device_count_stateless might be re-exported dynamically or if the import works python3 << 'EOF' import sys try: from vllm.utils import cuda_device_count_stateless print("✓ Import SUCCESS: cuda_device_count_stateless is accessible from vllm.utils") except ImportError as e: print(f"✗ Import FAILED: {e}") except Exception as e: print(f"✗ Unexpected error: {e}") EOF

Repository: Signal65/vllm-code-review

Length of output: 155

🏁 Script executed:

# Check the imports at the top of vllm/config/parallel.py to see how cuda_device_count_stateless is imported head -50 vllm/config/parallel.py | grep -E "^from|^import"

Repository: Signal65/vllm-code-review

Length of output: 647

Fix incorrect import path and misleading warning message.

Two issues:

Import path is incorrect (line 325): The import from vllm.utils import cuda_device_count_stateless will fail. Should be from vllm.utils.torch_utils import cuda_device_count_stateless to match how it's imported elsewhere (e.g., vllm/config/parallel.py).

Warning message is misleading: parallel_config.world_size is defined as tensor_parallel_size × pipeline_parallel_size, not just tensor parallel. The warning says "Tensor parallel size" but should clarify it's the world_size.

Suggested fix

if current_platform.is_cuda() and parallel_config.world_size > 1: - from vllm.utils import cuda_device_count_stateless + from vllm.utils.torch_utils import cuda_device_count_stateless available_gpus = cuda_device_count_stateless() if parallel_config.world_size > available_gpus: logger.warning( - "Tensor parallel size (%d) exceeds available GPUs (%d). " + "Required GPU count (world_size=%d) exceeds available GPUs (%d). " "This may result in Ray placement group allocation failures. " - "Consider reducing tensor_parallel_size to %d or less, " + "Consider reducing tensor_parallel_size or pipeline_parallel_size " + "so that their product is %d or less, " "or ensure your Ray cluster has %d GPUs available.", parallel_config.world_size, available_gpus, available_gpus, parallel_config.world_size, )

🤖 Prompt for AI Agents

In `@vllm/v1/executor/ray_utils.py` around lines 323 - 338, Update the incorrect import and clarify the warning: change the import of cuda_device_count_stateless to come from vllm.utils.torch_utils (replace "from vllm.utils import cuda_device_count_stateless" with the correct module import), and update the logger.warning message that references parallel_config.world_size to state "world_size (tensor_parallel_size × pipeline_parallel_size)" instead of "Tensor parallel size" so it accurately describes parallel_config.world_size when logging available_gpus vs parallel_config.world_size in the prevalidation block that uses cuda_device_count_stateless and parallel_config.world_size.

[Bugfix] Improve GPU validation logging in Ray fallback scenarios (vl…

7437438

…lm-project#25775) Signed-off-by: Sairam Pillai <sairam.pillai61@gmail.com>

coderabbitai bot reviewed Jan 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[Bugfix] Improve GPU validation logging in Ray fallback scenarios (#2…#29

[Bugfix] Improve GPU validation logging in Ray fallback scenarios (#2…#29
MitchLewis930 wants to merge 1 commit intogpu_ray_beforefrom
gpu_ray_after

MitchLewis930 commented Jan 24, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

MitchLewis930 commented Jan 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MitchLewis930 commented Jan 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 24, 2026 •

edited

Loading