Add embeddings endpoint support#501
Draft
maryamtahhan wants to merge 3 commits intovllm-project:mainfrom
Draft
Conversation
375eab2 to
02ad329
Compare
Collaborator
|
Sorry for the delay on review; I am hung up on some performance regression work this will probably be waiting a bit longer, possibly into next year. One note: we will be merging #478 first which will affect this PR. |
Contributor
Author
No problem. |
Contributor
Author
|
I will update this PR since #478 has been merged |
190d856 to
8292ba8
Compare
Contributor
Author
|
Still working on this - will post an update soon |
2dd66fc to
3cf0ffd
Compare
Implements full embeddings benchmarking capability including schemas, quality validation (cosine similarity, MTEB), output formatters (CSV, HTML, JSON, console), mock server handler, CLI integration, and comprehensive test suite. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
3cf0ffd to
267342f
Compare
- Auto-fix ruff violations (unused imports, import sorting, unused variables) - Remove tests/remote directory (manual testing code not suitable for CI) - Sync uv.lock with embeddings feature dependencies Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Remove unused variables and imports - Fix line length violations (E501) - Add noqa comments for acceptable complexity - Update isinstance calls to use modern X | Y syntax - Fix import sorting and __all__ ordering - Add None checks for division operations - Fix union type handling with proper type guards - Add explicit type annotations where needed - Fix incompatible return types with proper narrowing - Update type: ignore comments with correct error codes - Add pytest-httpx and respx to tox test dependencies - Skip audio tests when torchcodec unavailable - Skip MTEB/quality validator tests when sentence-transformers unavailable - Fix MockServerConfig import path - Fix test expectations for schema fields - Fix zero vector handling test to match implementation - Convert E2E test data strings to lists for proper deserialization - Add HTML templates to package data - Create __init__.py for html_outputs directory - Ensure embeddings template included in distribution All quality checks (ruff, mypy, pre-commit) and unit tests now pass. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds support for benchmarking the /v1/embeddings endpoint, enabling performance testing of text embedding models.
Add embeddings request type to schemas
Implement EmbeddingsResponseHandler for processing embedding responses
Add EmbeddingsRequestFormatter for request preparation
Implement mock server handler with synthetic embedding generation
Add e2e and unit tests for embeddings benchmarking
Add embeddings guide documentation
Summary
This PR adds support for benchmarking the
/v1/embeddingsendpoint, enabling performance testing of text embedding models.The implementation includes full integration with guidellm's existing infrastructure: request formatting, response handling, metrics collection, synthetic data generation, and the mock server for testing.
Details
Core Implementation:
embeddingstoGenerativeRequestTypeliteral in schemasEmbeddingsResponseHandlerfor processing embedding API responsesEmbeddingsRequestFormatterfor preparing embedding requests/v1/embeddings)EmbeddingsResponseHandlerfrom backends moduleMock Server Support:
EmbeddingsHandlerfor mock serverEmbeddingsRequest,EmbeddingObject, andEmbeddingsResponsemodels/v1/embeddingsendpoint in mock serverTesting:
EmbeddingsResponseHandlertest_embeddings_max_requests_benchmark,test_embeddings_max_seconds_benchmark,test_embeddings_rate_benchmarkDocumentation:
docs/guides/embeddings.md, 284 lines)Code Quality:
tox -e quality)tox -e types)Test Plan
Automated Tests:
tox -e test-unitTestEmbeddingsResponseHandlertests passtox -e test-integrationtox -e test-e2eManual Testing:
curl -X POST http://localhost:8000/v1/embeddings
-H "Content-Type: application/json"
-d '{"input":"Test sentence","model":"test"}'
guidellm benchmark run
--target http://localhost:8000
--request-type embeddings
--rate 5
--max-requests 20
--data '{"type":"synthetic","prompt_tokens":128,"output_tokens":1}'
vllm serve "BAAI/bge-small-en-v1.5"
guidellm benchmark run
--target http://localhost:8000
--request-type embeddings
--rate 10
--max-requests 100
--data '{"type":"synthetic","prompt_tokens":256,"output_tokens":1}'
Related Issues
Use of AI
## WRITTEN BY AI ##)