-
Notifications
You must be signed in to change notification settings - Fork 10
Integration tests vs upstream lls. #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Integration tests vs upstream lls. #51
Conversation
…precated remote evaluation tests.
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 5 issues, and left some high level feedback:
- In
conftest.py, the option name is--no-mock-inferencebut several docstrings and help text still reference--mock-inference; update these references so the documentation matches the actual CLI flag. - In
tests/e2e/start-kind-cluster.sh, the script printsPORTFORWARD_PIDand refers to background port forwarding, but nokubectl port-forwardis started andPORTFORWARD_PIDis never set; either add the port-forward command and PID capture or remove these messages. - The unconditional
alias docker="podman"intests/e2e/start-kind-cluster.shmay break environments that use Docker without Podman; consider detecting which runtime is available or making this alias optional/configurable.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `conftest.py`, the option name is `--no-mock-inference` but several docstrings and help text still reference `--mock-inference`; update these references so the documentation matches the actual CLI flag.
- In `tests/e2e/start-kind-cluster.sh`, the script prints `PORTFORWARD_PID` and refers to background port forwarding, but no `kubectl port-forward` is started and `PORTFORWARD_PID` is never set; either add the port-forward command and PID capture or remove these messages.
- The unconditional `alias docker="podman"` in `tests/e2e/start-kind-cluster.sh` may break environments that use Docker without Podman; consider detecting which runtime is available or making this alias optional/configurable.
## Individual Comments
### Comment 1
<location> `distribution/run.yaml:97-105` </location>
<code_context>
model_type: llm
+ - metadata: {}
+ model_id: llama-3.1-8b-instruct
+ provider_id: vllm
+ provider_model_id: meta-llama/Llama-3.1-8B-Instruct
+ model_type: llm
+ - metadata:
+ embedding_dimension: 1024
+ model_id: nomic-embed-text
+ provider_id: vllm
+ provider_model_id: nomic-ai/nomic-embed-text-v1.5
+ model_type: embedding
shields: []
vector_dbs: []
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard `vllm` models so they don’t reference a provider that may not be configured.
The `vllm` provider is only created when `VLLM_URL` is set, but these `registered_resources` entries are always added. In setups without `VLLM_URL`, this can leave models referencing a non-existent provider. Please either register these models only when `VLLM_URL` is configured, or define a safe fallback to avoid configuration-time failures.
</issue_to_address>
### Comment 2
<location> `distribution/run.yaml:43-48` </location>
<code_context>
backend: kv_default
inference:
- - provider_id: ollama
+ - provider_id: ${env.VLLM_URL:+vllm}
+ provider_type: remote::vllm
+ config:
+ url: ${env.VLLM_URL}
+ max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
+ api_token: ${env.VLLM_API_TOKEN:=fake}
+ tls_verify: ${env.VLLM_TLS_VERIFY:=true}
+ - provider_id: ${env.OLLAMA_URL:+ollama}
</code_context>
<issue_to_address>
**question:** Double-check how an empty `provider_id` entry is handled when `VLLM_URL` is unset.
Using `${env.VLLM_URL:+vllm}` disables this provider when `VLLM_URL` is unset, which is likely intended. However, depending on how the config is rendered, this could still leave a list entry with an empty `provider_id` or an otherwise invalid provider block. Please verify that the templating engine omits the entire list item when `VLLM_URL` is unset, or update the template so the whole provider entry is conditional rather than just the `provider_id` field.
</issue_to_address>
### Comment 3
<location> `tests/test_inline_evaluation.py:274-280` </location>
<code_context>
- assert hasattr(job, "job_id")
- assert hasattr(job, "status")
assert job.job_id is not None
+ assert job.status == "in_progress"
+
+ job = library_client.alpha.eval.jobs.status(
+ benchmark_id=benchmark_id, job_id=job.job_id
+ )
+ assert job.status == "completed"
</code_context>
<issue_to_address>
**issue (testing):** Poll for job completion with timeout instead of assuming a single status call transitions from in_progress to completed
Right after `run_eval`, the job is correctly `in_progress`, but the test assumes the very next `jobs.status` call will already be `completed`. In a real (non-mocked) stack this will be flaky. Please poll `jobs.status` with a short sleep and an overall timeout, and assert that it reaches a terminal state (e.g. `completed`) within that window, so the integration test remains reliable with `--no-mock-inference`.
</issue_to_address>
### Comment 4
<location> `tests/test_inline_evaluation.py:166-172` </location>
<code_context>
+):
+ completion_text = mocked_llm_response
+
+ async def _fake_openai_embeddings(req): # noqa: ANN001
+ embedding_input = getattr(req, "input", None)
+ n = len(embedding_input) if isinstance(embedding_input, list) else 1
+ data = [
+ SimpleNamespace(embedding=[0.1, 0.2, 0.3], index=i, object="embedding")
+ for i in range(n)
+ ]
</code_context>
<issue_to_address>
**suggestion (testing):** Mocked embedding vectors should respect the configured embedding dimension to avoid hiding dimension-related issues
The `_fake_openai_embeddings` mock always returns 3-dimensional vectors, while the configured model uses `embedding_dimension: 384`. This mismatch can mask dimension-related issues or cause misleading test failures. Please derive the embedding length from the configured model metadata, or at least return vectors of length 384 so tests more accurately reflect real usage.
```suggestion
async def _fake_openai_embeddings(req): # noqa: ANN001
embedding_input = getattr(req, "input", None)
n = len(embedding_input) if isinstance(embedding_input, list) else 1
embedding_dim = 384 # match configured embedding dimension
data = [
SimpleNamespace(
embedding=[0.1] * embedding_dim,
index=i,
object="embedding",
)
for i in range(n)
]
```
</issue_to_address>
### Comment 5
<location> `tests/test_remote_wrappers.py:133-135` </location>
<code_context>
- assert isinstance(result, EvaluationResult)
- pandas_result = result.to_pandas()
- logger.info(render_dataframe_as_table(pandas_result))
- assert metric_to_test.name in pandas_result.columns
- assert len(pandas_result) == len(evaluation_dataset)
- assert pandas_result[metric_to_test.name].dtype == float
-
- # Use small tolerance for floating point comparisons
</code_context>
<issue_to_address>
**issue (testing):** Data type assertion on pandas column is too strict and may fail even when the series is a float dtype
`pandas_result[metric_to_test.name].dtype == float` will usually be false because pandas uses NumPy dtypes (e.g. `float64`), which don’t equal the built-in `float`. To assert the column is floating-point, use `pandas.api.types.is_float_dtype(pandas_result[metric_to_test.name])` or check `pandas_result[metric_to_test.name].dtype.kind == "f"` instead, to avoid spurious test failures.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
distribution/run.yaml
Outdated
| provider_id: vllm | ||
| provider_model_id: meta-llama/Llama-3.1-8B-Instruct | ||
| model_type: llm | ||
| - metadata: | ||
| embedding_dimension: 1024 | ||
| model_id: nomic-embed-text | ||
| provider_id: vllm | ||
| provider_model_id: nomic-ai/nomic-embed-text-v1.5 | ||
| model_type: embedding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Guard vllm models so they don’t reference a provider that may not be configured.
The vllm provider is only created when VLLM_URL is set, but these registered_resources entries are always added. In setups without VLLM_URL, this can leave models referencing a non-existent provider. Please either register these models only when VLLM_URL is configured, or define a safe fallback to avoid configuration-time failures.
distribution/run.yaml
Outdated
| - provider_id: ${env.VLLM_URL:+vllm} | ||
| provider_type: remote::vllm | ||
| config: | ||
| url: ${env.VLLM_URL} | ||
| max_tokens: ${env.VLLM_MAX_TOKENS:=4096} | ||
| api_token: ${env.VLLM_API_TOKEN:=fake} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Double-check how an empty provider_id entry is handled when VLLM_URL is unset.
Using ${env.VLLM_URL:+vllm} disables this provider when VLLM_URL is unset, which is likely intended. However, depending on how the config is rendered, this could still leave a list entry with an empty provider_id or an otherwise invalid provider block. Please verify that the templating engine omits the entire list item when VLLM_URL is unset, or update the template so the whole provider entry is conditional rather than just the provider_id field.
| async def _fake_openai_embeddings(req): # noqa: ANN001 | ||
| embedding_input = getattr(req, "input", None) | ||
| n = len(embedding_input) if isinstance(embedding_input, list) else 1 | ||
| data = [ | ||
| SimpleNamespace(embedding=[0.1, 0.2, 0.3], index=i, object="embedding") | ||
| for i in range(n) | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Mocked embedding vectors should respect the configured embedding dimension to avoid hiding dimension-related issues
The _fake_openai_embeddings mock always returns 3-dimensional vectors, while the configured model uses embedding_dimension: 384. This mismatch can mask dimension-related issues or cause misleading test failures. Please derive the embedding length from the configured model metadata, or at least return vectors of length 384 so tests more accurately reflect real usage.
| async def _fake_openai_embeddings(req): # noqa: ANN001 | |
| embedding_input = getattr(req, "input", None) | |
| n = len(embedding_input) if isinstance(embedding_input, list) else 1 | |
| data = [ | |
| SimpleNamespace(embedding=[0.1, 0.2, 0.3], index=i, object="embedding") | |
| for i in range(n) | |
| ] | |
| async def _fake_openai_embeddings(req): # noqa: ANN001 | |
| embedding_input = getattr(req, "input", None) | |
| n = len(embedding_input) if isinstance(embedding_input, list) else 1 | |
| embedding_dim = 384 # match configured embedding dimension | |
| data = [ | |
| SimpleNamespace( | |
| embedding=[0.1] * embedding_dim, | |
| index=i, | |
| object="embedding", | |
| ) | |
| for i in range(n) | |
| ] |
…_pipeline import to the test function scope.
Integration tests vs multiple versions of LLS
Add configurable mocking for Llama Stack inference & new ci job to run vs multiple versions of LLS including main.
New Features:
Provide e2e Kubernetes testing assets using Kind, including cluster lifecycle scripts and manifests for LlamaStackDistribution and a vLLM emulator.Enhancements: