[Bugfix] Improve GPU validation logging in Ray fallback scenarios (#2…#29
[Bugfix] Improve GPU validation logging in Ray fallback scenarios (#2…#29MitchLewis930 wants to merge 1 commit intogpu_ray_beforefrom
Conversation
…lm-project#25775) Signed-off-by: Sairam Pillai <sairam.pillai61@gmail.com>
📝 WalkthroughWalkthroughThese changes enhance error handling and resource validation in Ray-based parallel execution. A Ray-availability check is replaced with explicit GPU validation in configuration, while Ray cluster initialization gains improved error messaging and GPU availability pre-checks to provide users with clearer diagnostics. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@vllm/v1/executor/ray_utils.py`:
- Around line 323-338: Update the incorrect import and clarify the warning:
change the import of cuda_device_count_stateless to come from
vllm.utils.torch_utils (replace "from vllm.utils import
cuda_device_count_stateless" with the correct module import), and update the
logger.warning message that references parallel_config.world_size to state
"world_size (tensor_parallel_size × pipeline_parallel_size)" instead of "Tensor
parallel size" so it accurately describes parallel_config.world_size when
logging available_gpus vs parallel_config.world_size in the prevalidation block
that uses cuda_device_count_stateless and parallel_config.world_size.
🧹 Nitpick comments (1)
vllm/config/parallel.py (1)
521-528: Redundant call tocuda_device_count_stateless()and misleading error message.Two issues:
cuda_device_count_stateless()is called on line 522 for the condition check and again on line 524. While the function is cached, this is still redundant.- The error message refers to "Tensor parallel size" but
self.world_sizeisTPxPP(tensor × pipeline parallel), which could mislead users using pipeline parallelism.Suggested fix
- elif ( - current_platform.is_cuda() - and cuda_device_count_stateless() < self.world_size - ): - gpu_count = cuda_device_count_stateless() + elif current_platform.is_cuda(): + gpu_count = cuda_device_count_stateless() + if gpu_count < self.world_size: + raise ValueError( + f"world_size ({self.world_size}) cannot be larger than " + f"the number of available GPUs ({gpu_count}). " + f"Reduce tensor_parallel_size or pipeline_parallel_size." + ) + elif self.data_parallel_backend == "ray": - raise ValueError( - f"Tensor parallel size ({self.world_size}) cannot be " - f"larger than the number of available GPUs ({gpu_count})." - )Note: The diff above shows the conceptual change. The actual fix needs to preserve the subsequent
elifbranches properly.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
vllm/config/parallel.pyvllm/v1/executor/ray_utils.py
🧰 Additional context used
🧬 Code graph analysis (2)
vllm/config/parallel.py (1)
vllm/utils/torch_utils.py (1)
cuda_device_count_stateless(414-424)
vllm/v1/executor/ray_utils.py (1)
vllm/utils/torch_utils.py (1)
cuda_device_count_stateless(414-424)
🪛 Ruff (0.14.13)
vllm/config/parallel.py
525-528: Avoid specifying long messages outside the exception class
(TRY003)
vllm/v1/executor/ray_utils.py
266-276: Avoid specifying long messages outside the exception class
(TRY003)
278-284: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (1)
vllm/v1/executor/ray_utils.py (1)
258-284: LGTM! Well-structured error messaging for GPU resource failures.The conditional logic to provide more detailed error messages for multi-GPU scenarios (> 1 GPU required) is a good UX improvement. The guidance about tensor parallel size and cluster resources will help users diagnose placement group allocation failures more quickly.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| # Prevalidate GPU requirements before Ray processing | ||
| if current_platform.is_cuda() and parallel_config.world_size > 1: | ||
| from vllm.utils import cuda_device_count_stateless | ||
|
|
||
| available_gpus = cuda_device_count_stateless() | ||
| if parallel_config.world_size > available_gpus: | ||
| logger.warning( | ||
| "Tensor parallel size (%d) exceeds available GPUs (%d). " | ||
| "This may result in Ray placement group allocation failures. " | ||
| "Consider reducing tensor_parallel_size to %d or less, " | ||
| "or ensure your Ray cluster has %d GPUs available.", | ||
| parallel_config.world_size, | ||
| available_gpus, | ||
| available_gpus, | ||
| parallel_config.world_size, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if cuda_device_count_stateless is exported from vllm.utils
rg -n "cuda_device_count_stateless" --type py vllm/utils/ | head -20Repository: Signal65/vllm-code-review
Length of output: 349
🏁 Script executed:
# Check the parallel_config definition to understand world_size
fd -e py -p "parallel_config" vllm/ | head -5Repository: Signal65/vllm-code-review
Length of output: 51
🏁 Script executed:
# Search for world_size definition in parallel_config
rg -n "world_size" --type py vllm/ -A 2 | grep -E "(def|world_size|=)" | head -30Repository: Signal65/vllm-code-review
Length of output: 2699
🏁 Script executed:
# Check the specific file mentioned in the review
cat -n vllm/v1/executor/ray_utils.py | sed -n '320,340p'Repository: Signal65/vllm-code-review
Length of output: 1108
🏁 Script executed:
# Check if cuda_device_count_stateless is re-exported from vllm/utils/__init__.py
rg -n "cuda_device_count_stateless" vllm/utils/__init__.pyRepository: Signal65/vllm-code-review
Length of output: 51
🏁 Script executed:
# Find the ParallelConfig class definition
fd -e py "parallel_config" vllm/ -x grep -l "class.*ParallelConfig"Repository: Signal65/vllm-code-review
Length of output: 51
🏁 Script executed:
# Search for world_size property or attribute definition in parallel config
rg -n "def world_size|world_size\s*=" --type py vllm/ -B 2 -A 2 | head -50Repository: Signal65/vllm-code-review
Length of output: 3976
🏁 Script executed:
# Search for ParallelConfig class definition more broadly
rg -n "class ParallelConfig" --type py vllm/Repository: Signal65/vllm-code-review
Length of output: 117
🏁 Script executed:
# Check if cuda_device_count_stateless is available via direct import from torch_utils
cat vllm/utils/torch_utils.py | grep -n "def cuda_device_count_stateless" -A 10Repository: Signal65/vllm-code-review
Length of output: 628
🏁 Script executed:
# Search vllm/utils/__init__.py to see what is exported
head -100 vllm/utils/__init__.pyRepository: Signal65/vllm-code-review
Length of output: 3202
🏁 Script executed:
# Check how parallel_config.world_size is used/defined
rg -n "parallel_config\.world_size" --type py vllm/ -B 1 -A 1 | head -40Repository: Signal65/vllm-code-review
Length of output: 2649
🏁 Script executed:
# Check the ParallelConfig class definition to see world_size
sed -n '68,200p' vllm/config/parallel.pyRepository: Signal65/vllm-code-review
Length of output: 6774
🏁 Script executed:
# Verify the import path - try to find if cuda_device_count_stateless is actually accessible via vllm.utils
rg -n "from vllm\.utils import cuda_device_count_stateless" --type py vllm/Repository: Signal65/vllm-code-review
Length of output: 161
🏁 Script executed:
# Check what's actually exported from vllm/utils or torch_utils
rg -n "__all__" vllm/utils/torch_utils.pyRepository: Signal65/vllm-code-review
Length of output: 51
🏁 Script executed:
# Search for world_size as a property or method in ParallelConfig
rg -n "def world_size|@property" vllm/config/parallel.py -A 3 | head -50Repository: Signal65/vllm-code-review
Length of output: 637
🏁 Script executed:
# Check if there's a __getattr__ or similar in ParallelConfig
rg -n "__getattr__|world_size" vllm/config/parallel.pyRepository: Signal65/vllm-code-review
Length of output: 982
🏁 Script executed:
# Look at the full ParallelConfig class to find world_size
cat vllm/config/parallel.py | grep -n "world_size" -B 2 -A 2Repository: Signal65/vllm-code-review
Length of output: 2298
🏁 Script executed:
# Check if cuda_device_count_stateless might be re-exported dynamically or if the import works
python3 << 'EOF'
import sys
try:
from vllm.utils import cuda_device_count_stateless
print("✓ Import SUCCESS: cuda_device_count_stateless is accessible from vllm.utils")
except ImportError as e:
print(f"✗ Import FAILED: {e}")
except Exception as e:
print(f"✗ Unexpected error: {e}")
EOFRepository: Signal65/vllm-code-review
Length of output: 155
🏁 Script executed:
# Check the imports at the top of vllm/config/parallel.py to see how cuda_device_count_stateless is imported
head -50 vllm/config/parallel.py | grep -E "^from|^import"Repository: Signal65/vllm-code-review
Length of output: 647
Fix incorrect import path and misleading warning message.
Two issues:
-
Import path is incorrect (line 325): The import
from vllm.utils import cuda_device_count_statelesswill fail. Should befrom vllm.utils.torch_utils import cuda_device_count_statelessto match how it's imported elsewhere (e.g., vllm/config/parallel.py). -
Warning message is misleading:
parallel_config.world_sizeis defined astensor_parallel_size × pipeline_parallel_size, not just tensor parallel. The warning says "Tensor parallel size" but should clarify it's the world_size.
Suggested fix
if current_platform.is_cuda() and parallel_config.world_size > 1:
- from vllm.utils import cuda_device_count_stateless
+ from vllm.utils.torch_utils import cuda_device_count_stateless
available_gpus = cuda_device_count_stateless()
if parallel_config.world_size > available_gpus:
logger.warning(
- "Tensor parallel size (%d) exceeds available GPUs (%d). "
+ "Required GPU count (world_size=%d) exceeds available GPUs (%d). "
"This may result in Ray placement group allocation failures. "
- "Consider reducing tensor_parallel_size to %d or less, "
+ "Consider reducing tensor_parallel_size or pipeline_parallel_size "
+ "so that their product is %d or less, "
"or ensure your Ray cluster has %d GPUs available.",
parallel_config.world_size,
available_gpus,
available_gpus,
parallel_config.world_size,
)🤖 Prompt for AI Agents
In `@vllm/v1/executor/ray_utils.py` around lines 323 - 338, Update the incorrect
import and clarify the warning: change the import of cuda_device_count_stateless
to come from vllm.utils.torch_utils (replace "from vllm.utils import
cuda_device_count_stateless" with the correct module import), and update the
logger.warning message that references parallel_config.world_size to state
"world_size (tensor_parallel_size × pipeline_parallel_size)" instead of "Tensor
parallel size" so it accurately describes parallel_config.world_size when
logging available_gpus vs parallel_config.world_size in the prevalidation block
that uses cuda_device_count_stateless and parallel_config.world_size.
test
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.