-
Notifications
You must be signed in to change notification settings - Fork 55
Open
Description
Failed to run a GPTQ model https://huggingface.co/JunHowie/Qwen3-8B-GPTQ-Int4 with latest vLLM with quantization="gptq_bitblas".
Got:
(EngineCore_DP0 pid=506800) torch._dynamo.exc.Unsupported: Attempted to call function marked as skipped
(EngineCore_DP0 pid=506800) Explanation: Dynamo does not know how to trace the builtin `<unknown module>._SimpleCData.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
... (omitted)
(APIServer pid=506583) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
Complete log (after tuning):
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:00<00:00, 1.38it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:01<00:00, 1.58it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:01<00:00, 1.54it/s]
(EngineCore_DP0 pid=506800)
(EngineCore_DP0 pid=506800) INFO 10-14 21:31:21 [default_loader.py:314] Loading weights took 1.41 seconds
(EngineCore_DP0 pid=506800) INFO 10-14 21:31:50 [gpu_model_runner.py:2910] Model loading took 5.6824 GiB and 489.310448 seconds
(EngineCore_DP0 pid=506800) /home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py:1481: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>._SimpleCData.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
(EngineCore_DP0 pid=506800) If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
(EngineCore_DP0 pid=506800) If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
(EngineCore_DP0 pid=506800) torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] EngineCore failed to start.
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] Traceback (most recent call last):
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 783, in run_engine_core
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 555, in __init__
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] super().__init__(
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 223, in _initialize_kv_caches
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/executor/abstract.py", line 88, in determine_available_memory
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/executor/uniproc_executor.py", line 74, in collective_rpc
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/utils/__init__.py", line 2977, in run_method
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] return func(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] return func(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/worker/gpu_worker.py", line 280, in determine_available_memory
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] self.model_runner.profile_run()
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/worker/gpu_model_runner.py", line 3722, in profile_run
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] return func(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/worker/gpu_model_runner.py", line 3455, in _dummy_run
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] outputs = self.model(
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/compilation/cuda_graph.py", line 126, in __call__
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen3.py", line 321, in forward
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] hidden_states = self.model(
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/compilation/decorators.py", line 407, in __call__
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] output = self.compiled_callable(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 745, in compile_wrapper
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] raise e.with_traceback(None) from e.__cause__ # User compiler error
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] torch._dynamo.exc.Unsupported: Attempted to call function marked as skipped
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] Explanation: Dynamo does not know how to trace the builtin `<unknown module>._SimpleCData.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] Developer debug context: module: <unknown module>, qualname: _SimpleCData.__new__, skip reason: <missing reason>
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] from user code:
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen2.py", line 385, in forward
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] hidden_states, residual = layer(positions, hidden_states, residual)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen3.py", line 225, in forward
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] hidden_states = self.self_attn(
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen3.py", line 144, in forward
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] qkv, _ = self.qkv_proj(hidden_states)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/layers/linear.py", line 582, in forward
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] output_parallel = self.quant_method.apply(self, input_, bias)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/layers/quantization/gptq_bitblas.py", line 479, in apply
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] out = self.kernel.apply_gptq_bitblas_linear(layer, x)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/layers/quantization/kernels/mixed_precision/bitblas.py", line 315, in apply_gptq_bitblas_linear
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] output = self.bitblas_matmul(*args) # type: ignore[operator]
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/bitblas/ops/general_matmul/__init__.py", line 756, in __call__
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] return self.forward(*args, **kwds)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/bitblas/ops/general_matmul/__init__.py", line 751, in forward
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] self._forward_from_prebuild_lib(*args, stream=stream.cuda_stream)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/bitblas/ops/operator.py", line 459, in _forward_from_prebuild_lib
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] ctypes.c_void_p(arr.data_ptr()) if not isinstance(arr, int) else arr for arr in args
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/polyfills/__init__.py", line 204, in instantiate_user_defined_class_object
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] obj = cls.__new__(cls, *args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]
(EngineCore_DP0 pid=506800) Process EngineCore_DP0:
(EngineCore_DP0 pid=506800) Traceback (most recent call last):
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=506800) self.run()
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=506800) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 796, in run_engine_core
(EngineCore_DP0 pid=506800) raise e
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 783, in run_engine_core
(EngineCore_DP0 pid=506800) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 555, in __init__
(EngineCore_DP0 pid=506800) super().__init__(
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore_DP0 pid=506800) num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 223, in _initialize_kv_caches
(EngineCore_DP0 pid=506800) available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/executor/abstract.py", line 88, in determine_available_memory
(EngineCore_DP0 pid=506800) return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/executor/uniproc_executor.py", line 74, in collective_rpc
(EngineCore_DP0 pid=506800) return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/utils/__init__.py", line 2977, in run_method
(EngineCore_DP0 pid=506800) return func(*args, **kwargs)
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=506800) return func(*args, **kwargs)
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/worker/gpu_worker.py", line 280, in determine_available_memory
(EngineCore_DP0 pid=506800) self.model_runner.profile_run()
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/worker/gpu_model_runner.py", line 3722, in profile_run
(EngineCore_DP0 pid=506800) hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=506800) return func(*args, **kwargs)
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/worker/gpu_model_runner.py", line 3455, in _dummy_run
(EngineCore_DP0 pid=506800) outputs = self.model(
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/compilation/cuda_graph.py", line 126, in __call__
(EngineCore_DP0 pid=506800) return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(EngineCore_DP0 pid=506800) return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(EngineCore_DP0 pid=506800) return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen3.py", line 321, in forward
(EngineCore_DP0 pid=506800) hidden_states = self.model(
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/compilation/decorators.py", line 407, in __call__
(EngineCore_DP0 pid=506800) output = self.compiled_callable(*args, **kwargs)
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 745, in compile_wrapper
(EngineCore_DP0 pid=506800) raise e.with_traceback(None) from e.__cause__ # User compiler error
(EngineCore_DP0 pid=506800) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) torch._dynamo.exc.Unsupported: Attempted to call function marked as skipped
(EngineCore_DP0 pid=506800) Explanation: Dynamo does not know how to trace the builtin `<unknown module>._SimpleCData.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
(EngineCore_DP0 pid=506800) Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
(EngineCore_DP0 pid=506800) Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
(EngineCore_DP0 pid=506800)
(EngineCore_DP0 pid=506800) Developer debug context: module: <unknown module>, qualname: _SimpleCData.__new__, skip reason: <missing reason>
(EngineCore_DP0 pid=506800)
(EngineCore_DP0 pid=506800)
(EngineCore_DP0 pid=506800) from user code:
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen2.py", line 385, in forward
(EngineCore_DP0 pid=506800) hidden_states, residual = layer(positions, hidden_states, residual)
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen3.py", line 225, in forward
(EngineCore_DP0 pid=506800) hidden_states = self.self_attn(
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen3.py", line 144, in forward
(EngineCore_DP0 pid=506800) qkv, _ = self.qkv_proj(hidden_states)
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/layers/linear.py", line 582, in forward
(EngineCore_DP0 pid=506800) output_parallel = self.quant_method.apply(self, input_, bias)
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/layers/quantization/gptq_bitblas.py", line 479, in apply
(EngineCore_DP0 pid=506800) out = self.kernel.apply_gptq_bitblas_linear(layer, x)
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/layers/quantization/kernels/mixed_precision/bitblas.py", line 315, in apply_gptq_bitblas_linear
(EngineCore_DP0 pid=506800) output = self.bitblas_matmul(*args) # type: ignore[operator]
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/bitblas/ops/general_matmul/__init__.py", line 756, in __call__
(EngineCore_DP0 pid=506800) return self.forward(*args, **kwds)
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/bitblas/ops/general_matmul/__init__.py", line 751, in forward
(EngineCore_DP0 pid=506800) self._forward_from_prebuild_lib(*args, stream=stream.cuda_stream)
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/bitblas/ops/operator.py", line 459, in _forward_from_prebuild_lib
(EngineCore_DP0 pid=506800) ctypes.c_void_p(arr.data_ptr()) if not isinstance(arr, int) else arr for arr in args
(EngineCore_DP0 pid=506800) File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/polyfills/__init__.py", line 204, in instantiate_user_defined_class_object
(EngineCore_DP0 pid=506800) obj = cls.__new__(cls, *args, **kwargs)
(EngineCore_DP0 pid=506800)
(EngineCore_DP0 pid=506800) Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
(EngineCore_DP0 pid=506800)
[rank0]:[W1014 21:31:51.724849712 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=506583) Traceback (most recent call last):
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/bin/vllm", line 10, in <module>
(APIServer pid=506583) sys.exit(main())
(APIServer pid=506583) ^^^^^^
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=506583) args.dispatch_function(args)
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/entrypoints/cli/serve.py", line 62, in cmd
(APIServer pid=506583) uvloop.run(run_server(args))
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run
(APIServer pid=506583) return __asyncio.run(
(APIServer pid=506583) ^^^^^^^^^^^^^^
(APIServer pid=506583) File "/home/lixiangyu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=506583) return runner.run(main)
(APIServer pid=506583) ^^^^^^^^^^^^^^^^
(APIServer pid=506583) File "/home/lixiangyu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=506583) return self._loop.run_until_complete(task)
(APIServer pid=506583) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=506583) return await main
(APIServer pid=506583) ^^^^^^^^^^
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/entrypoints/openai/api_server.py", line 1917, in run_server
(APIServer pid=506583) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/entrypoints/openai/api_server.py", line 1933, in run_server_worker
(APIServer pid=506583) async with build_async_engine_client(
(APIServer pid=506583) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583) File "/home/lixiangyu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=506583) return await anext(self.gen)
(APIServer pid=506583) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/entrypoints/openai/api_server.py", line 191, in build_async_engine_client
(APIServer pid=506583) async with build_async_engine_client_from_engine_args(
(APIServer pid=506583) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583) File "/home/lixiangyu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=506583) return await anext(self.gen)
(APIServer pid=506583) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/entrypoints/openai/api_server.py", line 238, in build_async_engine_client_from_engine_args
(APIServer pid=506583) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=506583) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/utils/__init__.py", line 1336, in inner
(APIServer pid=506583) return fn(*args, **kwargs)
(APIServer pid=506583) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/async_llm.py", line 208, in from_vllm_config
(APIServer pid=506583) return cls(
(APIServer pid=506583) ^^^^
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/async_llm.py", line 130, in __init__
(APIServer pid=506583) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=506583) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client
(APIServer pid=506583) return AsyncMPClient(*client_args)
(APIServer pid=506583) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core_client.py", line 807, in __init__
(APIServer pid=506583) super().__init__(
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core_client.py", line 468, in __init__
(APIServer pid=506583) with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=506583) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583) File "/home/lixiangyu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=506583) next(self.gen)
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/utils.py", line 816, in launch_core_engines
(APIServer pid=506583) wait_for_engine_startup(
(APIServer pid=506583) File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/utils.py", line 873, in wait_for_engine_startup
(APIServer pid=506583) raise RuntimeError(
(APIServer pid=506583) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
When setting enforce-eager=True, there is no such error — it seems BitBLAS's compiling conflicts with vLLM's torch.compile integration?
If so, what is the best practice for running vLLM + BitBLAS?
Metadata
Metadata
Assignees
Labels
No labels