Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
46972a0
Initial POC for VLLM Python
jaredoconnell Dec 11, 2025
cfcb09a
Set proper process limit for VLLM python
jaredoconnell Dec 11, 2025
165477f
Convert to using AsyncLLMEngine for VLLM Python
jaredoconnell Dec 12, 2025
2f6c47a
Added option to log errors from backends
jaredoconnell Dec 12, 2025
c2cc6d3
Refactor VLLM Py Backend to use same pattern as OpenAI backend
jaredoconnell Dec 12, 2025
6f818a0
Add way of specifying if target should be specified
jaredoconnell Dec 13, 2025
ef60f15
Add code for backend model requirements
jaredoconnell Dec 15, 2025
6707b1d
Initial support for audio requests
jaredoconnell Dec 15, 2025
afa992a
Start switching over to VLLM's image as base image
jaredoconnell Dec 17, 2025
f82bf3b
Use base image's components
jaredoconnell Dec 17, 2025
5a945c3
Fix missing line in Containerfile
jaredoconnell Dec 17, 2025
218fbb5
Remove conflicting packages
jaredoconnell Dec 17, 2025
69aa524
Try fix for container error
jaredoconnell Dec 18, 2025
48048e5
Try fix for container error
jaredoconnell Dec 18, 2025
6275e14
Try fix for container error
jaredoconnell Dec 18, 2025
0c393c1
Attempt to fix issues with copying venv
jaredoconnell Dec 19, 2025
40552d7
Use separate Containerfile for vllm
jaredoconnell Dec 22, 2025
45dbafe
Fixes to containerfiles
jaredoconnell Dec 22, 2025
5526048
Fix torchvision
jaredoconnell Dec 22, 2025
2d69656
Revert "Fixes to containerfiles"
jaredoconnell Dec 22, 2025
ea7b81b
Support CUDA in the containerfile
jaredoconnell Dec 22, 2025
2f39e9e
Fix errors
jaredoconnell Dec 22, 2025
5223f3d
Update script to include extras
jaredoconnell Dec 22, 2025
1838ef5
Revert Containerfile to main's version
jaredoconnell Jan 13, 2026
c0319b3
Merge branch 'main' into feat/vllm-python
jaredoconnell Feb 4, 2026
0f86fac
Updated vLLM python backend to use new backend design
jaredoconnell Feb 7, 2026
e5aa345
Default to streaming and fix token counts
jaredoconnell Feb 10, 2026
8a35197
Attempt to fix token stats for vllm backend
jaredoconnell Feb 10, 2026
ccecff3
Remove log message
jaredoconnell Feb 10, 2026
dbcb13f
Improve timing data for request stats
jaredoconnell Feb 10, 2026
f6e277c
Fix counting of tokens
jaredoconnell Feb 11, 2026
f7d908c
Fix max token count logic
jaredoconnell Feb 11, 2026
d8aa359
Help vllm python behavior match http backend
jaredoconnell Feb 11, 2026
f62abc9
Help vllm python behavior match http backend
jaredoconnell Feb 11, 2026
14b049c
Help vllm python behavior match http backend
jaredoconnell Feb 11, 2026
d4ad846
Revert "Help vllm python behavior match http backend"
jaredoconnell Feb 11, 2026
c21e21d
Work towards getting vllm-python backend working
jaredoconnell Feb 11, 2026
8a3e43e
Try to have vllm backend match http
jaredoconnell Feb 11, 2026
6493e8e
Fix vllm python token counting
jaredoconnell Feb 11, 2026
f2b0533
Simplify stream vs non-stream mode
jaredoconnell Feb 12, 2026
c123b77
Improve error validation and messages
jaredoconnell Feb 12, 2026
1c2f7cd
Remove unhelpful error message
jaredoconnell Feb 12, 2026
056ef9f
Remove default value for max model length in vllm python backend
jaredoconnell Feb 12, 2026
60fa86f
Cleanup to prepare branch for review
jaredoconnell Feb 12, 2026
8c58beb
Remove specified defaults for vllm inputs
jaredoconnell Feb 12, 2026
8e7f8e4
Update the vllm backend to use more shared code
jaredoconnell Feb 12, 2026
084dea6
Refactor code to fix linter errors
jaredoconnell Feb 16, 2026
774bcd8
Added tests for vllm_pthon backend.
jaredoconnell Feb 16, 2026
1c8cba8
Added containerfile for vLLM Python
jaredoconnell Feb 16, 2026
1d69eac
Fix vllm container and make audio dependency optional for vllm-python…
jaredoconnell Feb 16, 2026
ade6c3e
Merge branch 'main' into feat/vllm-python
jaredoconnell Feb 17, 2026
c94235f
Fix permission issues in home dir
jaredoconnell Feb 17, 2026
7cbff80
Handle CPU-only case more gracefully
jaredoconnell Feb 17, 2026
7cef453
Added documentation for vllm python backend
jaredoconnell Feb 18, 2026
92fc979
Revert merge error
jaredoconnell Feb 19, 2026
b14c4d7
Fix linter errors
jaredoconnell Feb 19, 2026
32e4af3
Simplify code and address review comments
jaredoconnell Feb 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions Containerfile.vllm
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Base image: vllm/vllm-openai (has vLLM pre-installed). Override with build arg.
ARG BASE_IMAGE=vllm/vllm-openai

FROM $BASE_IMAGE

# release: take the last version and add a post if build iteration
# candidate: increment to next minor, add 'rc' with build iteration
# nightly: increment to next minor, add 'a' with build iteration
# alpha: increment to next minor, add 'a' with build iteration
# dev: increment to next minor, add 'dev' with build iteration
ARG GUIDELLM_BUILD_TYPE=dev
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want dev default instead of release??


# Extra dependencies to install (e.g. recommended, all)
ARG GUIDELLM_BUILD_EXTRAS=recommended,audio

# Switch to root for installing system deps and pip install
USER root

# Install git for setuptools-git-versioning (version discovery)
RUN apt-get update && apt-get install -y --no-install-recommends git \
&& rm -rf /var/lib/apt/lists/*

ENV GUIDELLM_BUILD_TYPE=$GUIDELLM_BUILD_TYPE

# Copy repository and install GuideLLM from source with pip (no uv, to avoid
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. "no uv" sounds stilted ... "not [with] uv" would sound a bit better.

# conflicting with the pre-installed vLLM in the base image)
COPY / /src
WORKDIR /src
RUN pip install --no-cache-dir ".[${GUIDELLM_BUILD_EXTRAS}]"

# Metadata
LABEL io.k8s.display-name="GuideLLM" \
org.opencontainers.image.description="GuideLLM Performance Benchmarking Container" \
org.opencontainers.image.source="https://github.com/vllm-project/guidellm" \
org.opencontainers.image.documentation="https://blog.vllm.ai/guidellm/stable" \
org.opencontainers.image.license="Apache-2.0"

ENV HOME="/home/guidellm" \
GUIDELLM_OUTPUT_DIR="/results"

WORKDIR $HOME

# Ensure that the user home dir can be used by any user
# (OpenShift Pods can't use the cache otherwise)
Comment on lines +43 to +44
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment that I made on Kevin's PR ... granted, he responded that this is essentially a quote from the OpenShift documentation, it still bugs me. It can be used by any user *in gid 0". That includes both the random uid in OpenShift and the root (0) uid in standalone podman, both in gid 0; but it's still not "any user".

RUN chgrp -R 0 "$HOME" && chmod -R g=u "$HOME"

VOLUME /results

# Root group for k8s
USER 1001:0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A constant uid? Is that ... wise?


ENTRYPOINT [ "guidellm" ]
CMD [ "benchmark", "run" ]
2 changes: 1 addition & 1 deletion docs/getting-started/benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ weight: -6

# Run a Benchmark

After [installing GuideLLM](install.md) and [starting a server](server.md), you're ready to run benchmarks to evaluate your LLM deployment's performance.
After [installing GuideLLM](install.md) and [starting a server](server.md), you're ready to run benchmarks to evaluate your LLM deployment's performance. Alternatively, you can run benchmarks with the vLLM Python backend (`--backend vllm_python`) without a separate server; see [vLLM Python backend](../guides/vllm-python-backend.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd break this up: "after installing and starting a server [...] or run the Python backend" obscures the latter case. E.g.,

  1. Install GuideLLM
  2. You can run GuideLLM two ways:
    a. targeting a running OpenAI-compatible LLM server or
    b. with an installed vLLM package using the vLLM Python backend (...)


Running a GuideLLM benchmark is straightforward. The basic command structure is:

Expand Down
2 changes: 2 additions & 0 deletions docs/getting-started/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ guidellm --help

This should display the installed version of GuideLLM.

To use the vLLM Python backend (in-process inference), see [vLLM Python backend](../guides/vllm-python-backend.md) for recommended installation (container or existing vLLM environment) and pip installation notes.

## Troubleshooting

If you encounter any issues during installation, ensure that your Python and pip versions meet the prerequisites. For further assistance, please refer to the [GitHub Issues](https://github.com/vllm-project/guidellm/issues) page or consult the [Documentation](https://github.com/vllm-project/guidellm/tree/main/docs).
4 changes: 4 additions & 0 deletions docs/guides/backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ GuideLLM is designed to work with OpenAI-compatible HTTP servers, enabling seaml

GuideLLM supports OpenAI-compatible HTTP servers, which provide a standardized API for interacting with LLMs. This includes popular implementations such as [vLLM](https://github.com/vllm-project/vllm) and [Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference). These servers allow GuideLLM to perform evaluations, benchmarks, and optimizations with minimal setup.

### vLLM Python backend

GuideLLM supports running inference in the same process using the **vLLM Python backend** (`vllm_python`). This backend runs inference in the same process as GuideLLM's using vLLM's python API (AsyncLLMEngine), without an HTTP server. For setup, installation options (container, existing vLLM, pip), and examples, see [vLLM Python backend](vllm-python-backend.md).

## Examples for Spinning Up Compatible Servers

### 1. vLLM
Expand Down
4 changes: 2 additions & 2 deletions docs/guides/multimodal/audio.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,10 +106,10 @@ For example, to specify French as the target language for an audio translation r

#### "stream"

Turn streaming responses on or off (if supported by the server) using a boolean value. By default, streaming is enabled.
Turn streaming responses on or off (if supported by the backend) using a boolean value. By default, streaming is enabled. Use `--backend-kwargs`:

```bash
--request-formatter-kwargs '{"stream": false}'
--backend-kwargs '{"stream": false}'
```

## Expected Results
Expand Down
4 changes: 2 additions & 2 deletions docs/guides/multimodal/image.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,10 +97,10 @@ For example, to specify a specific system prompt or other body parameter:

#### "stream"

Turn streaming responses on or off (if supported by the server) using a boolean value. By default, streaming is enabled.
Turn streaming responses on or off (if supported by the backend) using a boolean value. By default, streaming is enabled. Use `--backend-kwargs`:

```bash
--request-formatter-kwargs '{"stream": false}'
--backend-kwargs '{"stream": false}'
```

## Expected Results
Expand Down
4 changes: 2 additions & 2 deletions docs/guides/multimodal/video.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,10 @@ For example, to specify a specific system prompt or other body parameter:

#### "stream"

Turn streaming responses on or off (if supported by the server) using a boolean value. By default, streaming is enabled.
Turn streaming responses on or off (if supported by the backend) using a boolean value. By default, streaming is enabled. Use `--backend-kwargs`:

```bash
--request-formatter-kwargs '{"stream": false}'
--backend-kwargs '{"stream": false}'
```

## Expected Results
Expand Down
68 changes: 68 additions & 0 deletions docs/guides/vllm-python-backend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# vLLM Python Backend

The **vLLM Python backend** (`vllm_python`) runs inference in the **same process** as GuideLLM using vLLM's [AsyncLLMEngine](https://docs.vllm.ai/). No HTTP server is involved, reducing overheat and variables. This is useful for isolating performance bottlenecks or simplifying your benchmark setup. You do **not** pass `--target`; you **must** pass `--model`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bolding on "same process" feels like overkill here.

Also, I think you want "overhead", not "overheat"; plus I think the "and variables" doesn't mean much in this context. Maybe "variability" would be better... or "variability due to network bandwidth and latency" or whatever else you have in mind here.


For all engine options and supported models, see vLLM's [Engine Arguments](https://docs.vllm.ai/en/stable/configuration/engine_args/) and the [vLLM documentation](https://docs.vllm.ai/).

## Installation

### Recommended methods

- **Official GuideLLM + vLLM image**
Build and run the image that uses the vLLM base image (e.g. [Containerfile.vllm](https://github.com/vllm-project/guidellm/blob/main/Containerfile.vllm)). It is based on `vllm/vllm-openai` and installs GuideLLM on top, giving a known-good vLLM + GuideLLM stack with hardware support as provided by the base image.

**Note:** This method will result in the preference for vllm's requirements as opposed to GuideLLM's requirements. Since vLLM is the more complex project, this is the recommended configuration, but this may result in an older Python or dependency version, resulting in sub-optimal GuideLLM performance and behavior in some scenarios.

- **Existing vLLM installation**
Install vLLM first for your environment (GPU/CPU, CUDA, etc.), then install GuideLLM in the same environment (e.g. `pip install guidellm` or with extras). You avoid a duplicate vLLM install and reuse your existing acceleration setup.

**Note:** Using [uv](https://github.com/astral-sh/uv) is not recommended for the vLLM Python backend because of potentially incompatible requirements between the two projects. Prefer pip or the container / existing vLLM environment.


It is also possible to install GuideLLM and vLLM via pip using `pip install guidellm[vllm]`. This method may make **hardware acceleration** (e.g. CUDA) harder to get working. See [vLLM installation](https://docs.vllm.ai/en/latest/getting_started/installation) and GPU/hardware-specific docs there. For production or GPU use, the container or existing-install path is recommended.


## Basic example

Run a benchmark with the vLLM Python backend (no `--target`):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording might suggest that it's the lack of --target that triggers the Python backend, when it's really the --backend. You've already said that we don't use --target, and you're not showing it below, so I'm not sure that the explicit reminder that "we don't use --target here" is really necessary.


```bash
guidellm benchmark run \
--backend vllm_python \
--model "Qwen/Qwen3-0.6B" \
--data "prompt_tokens=256,output_tokens=128" \
--max-seconds 20 \
--rate 3
```

Engine behavior (device, memory, etc.) follows vLLM defaults unless you override it via `--backend-kwargs` (e.g. `vllm_config`). When running without a GPU (e.g. the GuideLLM + vLLM container without GPU access), the backend automatically uses the CPU device unless you set `device` in `vllm_config`. For engine configuration options, see vLLM's [Engine Arguments](https://docs.vllm.ai/en/stable/configuration/engine_args/).

## Request format and backend options

- **`--request-format`**
Controls how chat prompts are built. Options: `plain` (no chat template; message content is concatenated as plain text), `default-template` (use the tokenizer’s default chat template), or a file path / single-line template string per vLLM’s supported options. The value is passed through to vLLM's chat template handling. For details, see vLLM's [Chat templates](https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/chat_templates/) documentation.

- **`--backend-kwargs`**
Backend-specific options are passed here as a JSON object: pass a `vllm_config` key whose value is a dict of engine option names and values. You can also pass `request_format` here as an alternative to `--request-format`.

**Using Engine Arguments in `vllm_config`:** The [Engine Arguments](https://docs.vllm.ai/en/stable/configuration/engine_args/) documentation describes options in **CLI form** (e.g. `--gpu-memory-utilization`, `--max-model-len`). For `vllm_config` you must use the **Python parameter names** instead: strip the leading `--` and replace dashes with underscores (e.g. `gpu_memory_utilization`, `max_model_len`). The keys are the same as the field names on vLLM's `EngineArgs` and `AsyncEngineArgs` dataclasses; for the exact list of allowed keys and types, see the [vLLM source: `vllm/engine/arg_utils.py`](https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py) (search for `class EngineArgs`).

Example — limit GPU memory use and context length:

```bash
--backend-kwargs '{"vllm_config": {"gpu_memory_utilization": 0.8, "max_model_len": 4096}}'
```

For the full list of options and their types, see vLLM's [Engine Arguments](https://docs.vllm.ai/en/stable/configuration/engine_args/) (CLI form) and the [EngineArgs source](https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py) (Python field names for `vllm_config`).

## See also

- [Backends](backends.md) — Overview of supported backends.
- [Run a benchmark](../getting-started/benchmark.md) — General benchmark options.
- [vLLM Engine Arguments](https://docs.vllm.ai/en/stable/configuration/engine_args/) — CLI-oriented docs; use Python names (e.g. `gpu_memory_utilization`) in `vllm_config`.
- [vLLM source: `vllm/engine/arg_utils.py`](https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py) — `EngineArgs` / `AsyncEngineArgs` field names and types for `vllm_config` keys.
- [vLLM AsyncEngineArgs API](https://docs.vllm.ai/en/stable/api/vllm/engine/arg_utils/#vllm.engine.arg_utils.AsyncEngineArgs) — API reference for the class that receives these options.
- [vLLM Chat templates](https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/chat_templates/) — For `--request-format` behavior.
- [vLLM documentation](https://docs.vllm.ai/)
- [vLLM installation](https://docs.vllm.ai/en/latest/getting_started/installation)
- [vLLM OpenAI-compatible server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html) — When using the HTTP server instead of the Python backend.
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,9 @@ vision = [
"datasets[vision]",
"pillow",
]
vllm = [
"vllm",
]
# Dev Tooling
dev = [
# Install all optional dependencies
Expand Down
45 changes: 44 additions & 1 deletion src/guidellm/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@
get_builtin_scenarios,
reimport_benchmarks_report,
)
from guidellm.benchmark.schemas.generative.entrypoints import (
backend_requires_model,
backend_requires_target,
)
from guidellm.mock_server import MockServer, MockServerConfig
from guidellm.scheduler import StrategyType
from guidellm.settings import print_config
Expand Down Expand Up @@ -186,7 +190,12 @@ def benchmark():
default=BenchmarkGenerativeTextArgs.get_default("request_format"),
help=(
"Format to use for requests. Options depend on backend. "
"If not provided, uses backend default."
"For vLLM backend: plain (no chat template, text appending only), "
"default-template (use tokenizer default), or a file path / single-line "
"template per vLLM docs. Default: default-template"
"For openai backend: http endpoint path (/v1/chat/completions, "
"/v1/completions, /v1/audio/transcriptions, /v1/audio/translations) or "
"alias (e.g. chat_completions); default /v1/chat/completions."
),
)
@click.option(
Expand Down Expand Up @@ -460,6 +469,40 @@ def run(**kwargs): # noqa: C901
status="warning",
)

# Early validation: check target and model parameters based on backend requirements
backend = kwargs.get("backend", BenchmarkGenerativeTextArgs.get_default("backend"))
target = kwargs.get("target", None)
model = kwargs.get("model", None)
requires_target = backend_requires_target(backend)
requires_model = backend_requires_model(backend)
backend_type = backend.type_ if hasattr(backend, "type_") else backend

# Validate target parameter
if requires_target and target is None:
raise click.BadParameter(
f"Backend '{backend_type}' requires a target parameter. "
"Please provide --target with a valid endpoint URL.",
ctx=click.get_current_context(),
param_hint="--target",
)

if not requires_target and target is not None:
raise click.BadParameter(
f"Backend '{backend_type}' does not support a target parameter. "
"Please remove --target as this backend runs locally.",
ctx=click.get_current_context(),
param_hint="--target",
)

# Validate model parameter
if requires_model and model is None:
raise click.BadParameter(
f"Backend '{backend_type}' requires a model parameter. "
"Please provide --model with a valid model identifier.",
ctx=click.get_current_context(),
param_hint="--model",
)

Comment on lines +472 to +505
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of the BenchmarkGenerativeTextArgs validation. Should not validate here.

try:
args = BenchmarkGenerativeTextArgs.create(
scenario=kwargs.pop("scenario", None), **kwargs
Expand Down
15 changes: 15 additions & 0 deletions src/guidellm/backends/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,17 @@
TextCompletionsRequestHandler,
)

# Conditionally import VLLM backend if available
try:
from .vllm_python.vllm import VLLMPythonBackend
from .vllm_python.vllm_response import VLLMResponseHandler
Comment on lines +24 to +25
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should import from next level. E.g. from .vllm_python import VLLMPythonBackend, VLLMResponseHandler.


HAS_VLLM_BACKEND = True
except ImportError:
VLLMPythonBackend = None # type: ignore[assignment, misc]
VLLMResponseHandler = None # type: ignore[assignment, misc]
HAS_VLLM_BACKEND = False

Comment on lines +22 to +32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the above comment on vLLM extras is better here:

Suggested change
# Conditionally import VLLM backend if available
try:
from .vllm_python.vllm import VLLMPythonBackend
from .vllm_python.vllm_response import VLLMResponseHandler
HAS_VLLM_BACKEND = True
except ImportError:
VLLMPythonBackend = None # type: ignore[assignment, misc]
VLLMResponseHandler = None # type: ignore[assignment, misc]
HAS_VLLM_BACKEND = False
from guidellm.extras.vllm import HAS_VLLM
# Conditionally import VLLM backend if available
if HAS_VLLM:
from .vllm_python import VLLMPythonBackend

__all__ = [
"AudioRequestHandler",
"Backend",
Expand All @@ -29,3 +40,7 @@
"OpenAIRequestHandlerFactory",
"TextCompletionsRequestHandler",
]

# Conditionally add VLLM backend and handler to exports
if HAS_VLLM_BACKEND:
__all__.extend(["VLLMPythonBackend", "VLLMResponseHandler"])
20 changes: 19 additions & 1 deletion src/guidellm/backends/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
]


BackendType = Literal["openai_http"]
BackendType = Literal["openai_http", "vllm_python"]


class Backend(
Expand Down Expand Up @@ -101,6 +101,24 @@ def requests_limit(self) -> int | None:
"""
return None

@classmethod
def requires_target(cls) -> bool:
"""
Indicate whether this backend requires a target parameter.

:return: True if the backend requires a target parameter, False otherwise
"""
return True # Default to True for safety (most backends need a target)

@classmethod
def requires_model(cls) -> bool:
"""
Indicate whether this backend requires a model parameter.

:return: True if the backend requires a model parameter, False otherwise
"""
return False # Default to False (model is optional by default)

@abstractmethod
async def default_model(self) -> str:
"""
Expand Down
19 changes: 19 additions & 0 deletions src/guidellm/backends/openai/http.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,25 @@ class OpenAIHTTPBackend(Backend):
await backend.process_shutdown()
"""

@classmethod
def requires_target(cls) -> bool:
"""
OpenAI HTTP backend requires a target URL.

:return: True, as this backend requires a target endpoint URL
"""
return True

@classmethod
def requires_model(cls) -> bool:
"""
OpenAI HTTP backend does not require a model parameter.
The model can be optional as we can use the server's default model.

:return: False, as this backend does not require a model parameter
"""
return False

def __init__(
self,
target: str,
Expand Down
10 changes: 10 additions & 0 deletions src/guidellm/backends/vllm_python/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"""
VLLM Python API backend package.

Provides the VLLM Python backend and response handler for compiling
OpenAI-style response dicts into GenerationResponse.
"""

from .vllm_response import VLLMResponseHandler

__all__ = ["VLLMResponseHandler"]
Loading
Loading