vLLM Python Backend by jaredoconnell · Pull Request #596 · vllm-project/guidellm

jaredoconnell · 2026-02-12T04:33:54Z

Summary

This backend is an alternative backend that starts a complete vLLM instance and uses its Python API rather than the HTTP API.

This PR will be in a draft since it still needs tests and documentation.

TODO

Details

[ ]

Test Plan

TODO

Related Issues

Resolves #

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

This reverts commit 45dbafe.

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

This reverts commit 14b049c.

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

… backend Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

sjmonson

Core handling of vLLM seems good, the rest is mostly cleanup and minor fixes. There are a lot of dead codepaths that likely exist due to this PR being started before #478. A lot of converting to/from /v1/chat/completions format can be removed. Multimodel probably needs bit more work. Audio should be easy to fix but image/video don't seem implemented; this can be a later follow-up PR. Both plain and chat template formats need fixes, see respective comments; neither should be difficult.

sjmonson · 2026-02-16T19:19:29Z

src/guidellm/backends/vllm_python/vllm.py

+try:
+    from vllm import SamplingParams
+    from vllm.engine.async_llm_engine import AsyncLLMEngine
+    from vllm.engine.arg_utils import AsyncEngineArgs
+    from vllm.outputs import RequestOutput
+
+    HAS_VLLM = True
+except ImportError:
+    AsyncLLMEngine = None  # type: ignore[assignment, misc]
+    AsyncEngineArgs = None  # type: ignore[assignment, misc]
+    SamplingParams = None  # type: ignore[assignment, misc]
+    RequestOutput = None  # type: ignore[assignment, misc]
+    HAS_VLLM = False


Move this under src/guidellm/extras/vllm.py and from guidellm.extras.vllm import SamplingParams, AsyncLLMEngine, ..., HAS_VLLM here.

sjmonson · 2026-02-16T19:27:00Z

src/guidellm/backends/vllm_python/vllm.py

+        arguments = getattr(request, "arguments", None)
+
+        if arguments is not None:
+            body = getattr(arguments, "body", None) or {}
+            stream_override = getattr(arguments, "stream", None)
+            stream = self.stream if stream_override is None else bool(stream_override)
+            files = getattr(arguments, "files", None) or {}
+        else:


A GenerationRequest will never have an arguments field as of #478; drop this.

sjmonson · 2026-02-16T19:34:55Z

src/guidellm/backends/__init__.py

+# Conditionally import VLLM backend if available
+try:
+    from .vllm_python.vllm import VLLMPythonBackend
+    from .vllm_python.vllm_response import VLLMResponseHandler
+
+    HAS_VLLM_BACKEND = True
+except ImportError:
+    VLLMPythonBackend = None  # type: ignore[assignment, misc]
+    VLLMResponseHandler = None  # type: ignore[assignment, misc]
+    HAS_VLLM_BACKEND = False
+


Actually the above comment on vLLM extras is better here:

Suggested change

# Conditionally import VLLM backend if available

try:

from .vllm_python.vllm import VLLMPythonBackend

from .vllm_python.vllm_response import VLLMResponseHandler

HAS_VLLM_BACKEND = True

except ImportError:

VLLMPythonBackend = None # type: ignore[assignment, misc]

VLLMResponseHandler = None # type: ignore[assignment, misc]

HAS_VLLM_BACKEND = False

from guidellm.extras.vllm import HAS_VLLM

# Conditionally import VLLM backend if available

if HAS_VLLM:

from .vllm_python import VLLMPythonBackend

sjmonson · 2026-02-16T19:35:47Z

src/guidellm/backends/__init__.py

+    from .vllm_python.vllm import VLLMPythonBackend
+    from .vllm_python.vllm_response import VLLMResponseHandler


Should import from next level. E.g. from .vllm_python import VLLMPythonBackend, VLLMResponseHandler.

sjmonson · 2026-02-16T19:48:41Z

src/guidellm/benchmark/schemas/generative/entrypoints.py

+        if not requires_target and self.target is not None:
+            raise ValueError(
+                f"Backend '{backend_type}' does not support a target parameter. "
+                "Please remove --target as this backend runs locally."
+            )


Don't do this, just ignore target if its not needed.

sjmonson · 2026-02-17T21:04:22Z

src/guidellm/backends/vllm_python/vllm.py

+                if token_delta > 0:
+                    state["total_output_tokens"] += token_delta


Have you observed token_delta being less than 0 or is this just catching the 0 case?

RE: "less than 0" -- the test is > 0, and I could argue the test probably isn't even worthwhile for 0 if there's no possibility of it being less than 0. (Though, if it can be, > 0 is more efficient than checking >= 0 and adding the 0. 🙂)

sjmonson · 2026-02-17T21:21:54Z

src/guidellm/backends/vllm_python/vllm.py

+            if hasattr(audio_samples.data, "numpy"):
+                audio_array = audio_samples.data.numpy()
+            elif hasattr(audio_samples.data, "cpu"):
+                audio_array = audio_samples.data.cpu().numpy()
+            else:
+                audio_array = np.asarray(audio_samples.data)


Be careful with hasattr, it will hide invalid code paths. The only valid option is the first one. audio_samples.data is a Tensor and all tensors have a numpy() function.

sjmonson · 2026-02-17T21:30:11Z

src/guidellm/backends/vllm_python/vllm.py

+        max_tokens = (
+            body.get("max_tokens")
+            if body.get("max_tokens") is not None
+            else (max_tokens_override if max_tokens_override is not None else 16)
+        )
+        if max_tokens == 0:
+            max_tokens = 16


Why set max_tokens to 16 by default. Should be uncapped so we let the model decide when its done.

sjmonson · 2026-02-17T21:34:22Z

src/guidellm/backends/vllm_python/vllm.py

+
+        return SamplingParams(**params)  # type: ignore[misc]
+
+    def _convert_vllm_output_to_openai_format(


Shouldn't the response handler be written to take RequestOutput natively? Seems like a lot of extra code/work to convert to this format first and then parse.

sjmonson · 2026-02-17T21:47:32Z

src/guidellm/backends/vllm_python/vllm.py

+    def _extract_prompt_chat_plain(
+        self, formatted_messages: list[dict[str, Any]]
+    ) -> str:
+        """Format messages as plain 'Role: content' lines."""
+        parts = []
+        for msg in formatted_messages:
+            role = msg.get("role", "user")
+            content = msg.get("content", "")
+            if role and content:
+                parts.append(f"{role.capitalize()}: {content}")
+        parts.append("Assistant: ")
+        return "\n".join(parts)


Plain should be equivalent to /v1/completions. Need to read vLLM code and double check but there should not be any "role" here. Prompts should be concated with " ".join(...).

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

dbutenhof

My eyes are starting to glaze over, in the middle of vllm.py, so I need a break. Might as well post what I've got so far and start fresh later...

dbutenhof · 2026-02-18T13:48:40Z

docs/getting-started/benchmark.md

 # Run a Benchmark

-After [installing GuideLLM](install.md) and [starting a server](server.md), you're ready to run benchmarks to evaluate your LLM deployment's performance.
+After [installing GuideLLM](install.md) and [starting a server](server.md), you're ready to run benchmarks to evaluate your LLM deployment's performance. Alternatively, you can run benchmarks with the vLLM Python backend (`--backend vllm_python`) without a separate server; see [vLLM Python backend](../guides/vllm-python-backend.md).


I'd break this up: "after installing and starting a server [...] or run the Python backend" obscures the latter case. E.g.,

Install GuideLLM

You can run GuideLLM two ways:
a. targeting a running OpenAI-compatible LLM server or
b. with an installed vLLM package using the vLLM Python backend (...)

dbutenhof · 2026-02-18T13:53:08Z

docs/guides/vllm-python-backend.md

@@ -0,0 +1,68 @@
+# vLLM Python Backend
+
+The **vLLM Python backend** (`vllm_python`) runs inference in the **same process** as GuideLLM using vLLM's [AsyncLLMEngine](https://docs.vllm.ai/). No HTTP server is involved, reducing overheat and variables. This is useful for isolating performance bottlenecks or simplifying your benchmark setup. You do **not** pass `--target`; you **must** pass `--model`.


The bolding on "same process" feels like overkill here.

Also, I think you want "overhead", not "overheat"; plus I think the "and variables" doesn't mean much in this context. Maybe "variability" would be better... or "variability due to network bandwidth and latency" or whatever else you have in mind here.

dbutenhof · 2026-02-18T16:37:37Z

docs/guides/vllm-python-backend.md

+
+## Basic example
+
+Run a benchmark with the vLLM Python backend (no `--target`):


This wording might suggest that it's the lack of --target that triggers the Python backend, when it's really the --backend. You've already said that we don't use --target, and you're not showing it below, so I'm not sure that the explicit reminder that "we don't use --target here" is really necessary.

dbutenhof · 2026-02-18T16:53:22Z

src/guidellm/backends/vllm_python/vllm.py

+        raise ImportError(
+            "vllm is not installed. Please install it using "
+            "'pip install guidellm[vllm]' or 'pip install vllm>=0.6.0'"
+        )


Except that since the dependency morass requires installing vLLM first if not together, suggesting at this point that they just pip install vllm seems potentially unhelpful ...

dbutenhof · 2026-02-18T17:10:48Z

src/guidellm/backends/vllm_python/vllm.py

+        """
+        config = dict(user_config)
+
+        # Ensure model is set in config (required; overrides user if they passed it)


This should be documented; there's a lot about discovering vLLM config options and how to pass them as backend args, so we should be careful to document options that can't be passed that way.

I get the necessity, although it seems fairly artificial (but probably too messy to fix now) ... target and model aren't actually required by Backend.create, even though resolve_backend always passes them ... arguably they should be demoted to backend_kwargs especially as we now have concrete examples which conflict in their requirements for those two parameters!

dbutenhof · 2026-02-18T17:13:59Z

src/guidellm/backends/vllm_python/vllm.py

+        if not self._in_process:
+            raise RuntimeError("Backend not started up for process.")
+
+        # Shutdown the async engine if it has a shutdown method


Misleading comment ... there's no conditional here for "if" it has a shutdown method -- if there's an _engine we just call it. If it's really an "if" then the call should be conditional or in a captured try block... if not, can we remove that comment phrase?

dbutenhof · 2026-02-18T17:18:08Z

src/guidellm/backends/vllm_python/vllm.py

+        :param iter_time: Current iteration time
+        """
+        if request_info.timings.first_request_iteration is None:
+            request_info.timings.first_request_iteration = iter_time


Without checking other references ... below in token timing, the is None dynamic initialization case sets token_iterations = 0 along with storing the first timestamp. Should this logic be following that same pattern?

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

dbutenhof

More comments / questions / rambling

dbutenhof · 2026-02-19T15:44:54Z

src/guidellm/backends/vllm_python/vllm.py

+            self._engine = None
+        self._in_process = False
+
+    async def validate(self):


Apparently the HTTP backend used to do a real request to validate but was "downgraded" to do a /health check. So the question here is: do we want to depart from that convention by doing a real request here, or should we use the check_health() API to be equivalent to the HTTP behavior?

dbutenhof · 2026-02-19T19:57:15Z

src/guidellm/backends/vllm_python/vllm.py

+                    block_type = block.get("type", "")
+                    if block_type == "text":
+                        text = block.get("text", "")


In both cases, it seems unnecessarily "wasteful" to add the get fallback to "" since the default None is perfectly (and more straightforwardly) falsey already. The fallback suggests that the result will be used as a str in a context where None would present a problem ... that represents an unnecessary cognitive load (i.e., it makes me scan the code to try to find such reference ... which doesn't exist. 😁 )

This obviously isn't a big deal, and probably isn't significantly slower -- it just feels like it makes the code slightly less easy to read, and that always bugs me.

dbutenhof · 2026-02-19T20:20:50Z

src/guidellm/backends/vllm_python/vllm.py

+        json_str_decoded = (
+            json_str.decode("utf-8") if isinstance(json_str, bytes) else json_str
+        )


Ouch ... this is entirely nonsensical unless you take the long journey to figure out that we try to import orjson as json. Not thrilled by that obfuscation, which makes this weird phrase necessary; but there you go. 😦

dbutenhof · 2026-02-19T20:33:20Z

src/guidellm/backends/vllm_python/vllm.py

+                if token_delta > 0:
+                    state["total_output_tokens"] += token_delta


RE: "less than 0" -- the test is > 0, and I could argue the test probably isn't even worthwhile for 0 if there's no possibility of it being less than 0. (Though, if it can be, > 0 is more efficient than checking >= 0 and adding the 0. 🙂)

dbutenhof · 2026-02-19T20:41:39Z

src/guidellm/backends/vllm_python/vllm.py

+        if engine is None:
+            raise RuntimeError("Backend not started up.")


There are several similar checks, some with slightly different exception text. Maybe one of the different patterns is meaningful, but I suspect it'd be better to give a single consistent message if the engine isn't started, probably by consistently calling _validate_backend_initialized rather than checking locally.

dbutenhof · 2026-02-19T20:47:13Z

src/guidellm/backends/vllm_python/vllm_response.py

+        if line == "data: [DONE]":
+            return None
+        line = (line or "").strip()
+        if not line or not line.startswith("data:"):


If line is "" (which is the least it can be after the previous line), then not line.startswith(<anything>). Yeah, checking for an empty line is probably slightly more efficient, but I'd be more tempted to simplify the statement here by dropping the first phrase.

And if we're uninterested in a line that doesn't start with "data:", I'm a bit curious why we're special-casing that by returning {} instead of None?

dbutenhof · 2026-02-19T20:53:37Z

Containerfile.vllm

+# nightly: increment to next minor, add 'a' with build iteration
+# alpha: increment to next minor, add 'a' with build iteration
+# dev: increment to next minor, add 'dev' with build iteration
+ARG GUIDELLM_BUILD_TYPE=dev


Do we really want dev default instead of release??

dbutenhof · 2026-02-19T20:55:29Z

Containerfile.vllm

+
+ENV GUIDELLM_BUILD_TYPE=$GUIDELLM_BUILD_TYPE
+
+# Copy repository and install GuideLLM from source with pip (no uv, to avoid


Hmm. "no uv" sounds stilted ... "not [with] uv" would sound a bit better.

dbutenhof · 2026-02-19T20:58:23Z

Containerfile.vllm

+# Ensure that the user home dir can be used by any user
+# (OpenShift Pods can't use the cache otherwise)


Similar comment that I made on Kevin's PR ... granted, he responded that this is essentially a quote from the OpenShift documentation, it still bugs me. It can be used by any user *in gid 0". That includes both the random uid in OpenShift and the root (0) uid in standalone podman, both in gid 0; but it's still not "any user".

dbutenhof · 2026-02-19T21:02:45Z

Containerfile.vllm

+VOLUME /results
+
+# Root group for k8s
+USER 1001:0


A constant uid? Is that ... wise?

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

jaredoconnell added 30 commits December 11, 2025 14:03

Initial POC for VLLM Python

46972a0

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Set proper process limit for VLLM python

cfcb09a

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Convert to using AsyncLLMEngine for VLLM Python

165477f

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Added option to log errors from backends

2f6c47a

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Refactor VLLM Py Backend to use same pattern as OpenAI backend

c2cc6d3

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Add way of specifying if target should be specified

6f818a0

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Add code for backend model requirements

ef60f15

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Initial support for audio requests

6707b1d

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Start switching over to VLLM's image as base image

afa992a

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Use base image's components

f82bf3b

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Fix missing line in Containerfile

5a945c3

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Remove conflicting packages

218fbb5

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Try fix for container error

69aa524

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Try fix for container error

48048e5

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Try fix for container error

6275e14

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Attempt to fix issues with copying venv

0c393c1

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Use separate Containerfile for vllm

40552d7

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Fixes to containerfiles

45dbafe

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Fix torchvision

5526048

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Revert "Fixes to containerfiles"

2d69656

This reverts commit 45dbafe.

Support CUDA in the containerfile

ea7b81b

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Fix errors

2f39e9e

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Update script to include extras

5223f3d

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Revert Containerfile to main's version

1838ef5

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Merge branch 'main' into feat/vllm-python

c0319b3

Updated vLLM python backend to use new backend design

0f86fac

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Default to streaming and fix token counts

e5aa345

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Attempt to fix token stats for vllm backend

8a35197

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Remove log message

ccecff3

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Improve timing data for request stats

dbcb13f

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

jaredoconnell added 23 commits February 10, 2026 23:49

Fix counting of tokens

f6e277c

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Fix max token count logic

f7d908c

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Help vllm python behavior match http backend

d8aa359

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Help vllm python behavior match http backend

f62abc9

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Help vllm python behavior match http backend

14b049c

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Revert "Help vllm python behavior match http backend"

d4ad846

This reverts commit 14b049c.

Work towards getting vllm-python backend working

c21e21d

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Try to have vllm backend match http

8a3e43e

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Fix vllm python token counting

6493e8e

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Simplify stream vs non-stream mode

f2b0533

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Improve error validation and messages

c123b77

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Remove unhelpful error message

1c2f7cd

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Remove default value for max model length in vllm python backend

056ef9f

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Cleanup to prepare branch for review

60fa86f

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Remove specified defaults for vllm inputs

8c58beb

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Update the vllm backend to use more shared code

8e7f8e4

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Refactor code to fix linter errors

084dea6

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Added tests for vllm_pthon backend.

774bcd8

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Added containerfile for vLLM Python

1c8cba8

Generated-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Fix vllm container and make audio dependency optional for vllm-python…

1d69eac

… backend Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Merge branch 'main' into feat/vllm-python

ade6c3e

Fix permission issues in home dir

c94235f

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Handle CPU-only case more gracefully

7cbff80

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

sjmonson requested changes Feb 17, 2026

View reviewed changes

Added documentation for vllm python backend

7cef453

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

dbutenhof reviewed Feb 18, 2026

View reviewed changes

jaredoconnell added 2 commits February 18, 2026 23:25

Revert merge error

92fc979

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Fix linter errors

b14c4d7

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

dbutenhof reviewed Feb 19, 2026

View reviewed changes

Simplify code and address review comments

32e4af3

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

		from .vllm_python.vllm import VLLMPythonBackend
		from .vllm_python.vllm_response import VLLMResponseHandler

		if token_delta > 0:
		state["total_output_tokens"] += token_delta


		return SamplingParams(**params) # type: ignore[misc]

		def _convert_vllm_output_to_openai_format(

		@@ -0,0 +1,68 @@
		# vLLM Python Backend

		The vLLM Python backend (`vllm_python`) runs inference in the same process as GuideLLM using vLLM's [AsyncLLMEngine](https://docs.vllm.ai/). No HTTP server is involved, reducing overheat and variables. This is useful for isolating performance bottlenecks or simplifying your benchmark setup. You do not pass `--target`; you must pass `--model`.


		## Basic example

		Run a benchmark with the vLLM Python backend (no `--target`):

		if engine is None:
		raise RuntimeError("Backend not started up.")


		ENV GUIDELLM_BUILD_TYPE=$GUIDELLM_BUILD_TYPE

		# Copy repository and install GuideLLM from source with pip (no uv, to avoid

		# Ensure that the user home dir can be used by any user
		# (OpenShift Pods can't use the cache otherwise)

Comments

Conversation

jaredoconnell commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Test Plan

Related Issues

Use of AI

Uh oh!

sjmonson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbutenhof left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbutenhof left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

jaredoconnell commented Feb 12, 2026 •

edited

Loading