Add embeddings endpoint support by maryamtahhan · Pull Request #501 · vllm-project/guidellm

maryamtahhan · 2025-12-05T14:34:24Z

Adds support for benchmarking the /v1/embeddings endpoint, enabling performance testing of text embedding models.

Add embeddings request type to schemas
Implement EmbeddingsResponseHandler for processing embedding responses
Add EmbeddingsRequestFormatter for request preparation
Implement mock server handler with synthetic embedding generation
Add e2e and unit tests for embeddings benchmarking
Add embeddings guide documentation

Summary

This PR adds support for benchmarking the /v1/embeddings endpoint, enabling performance testing of text embedding models.

The implementation includes full integration with guidellm's existing infrastructure: request formatting, response handling, metrics collection, synthetic data generation, and the mock server for testing.

Details

Core Implementation:

Add embeddings to GenerativeRequestType literal in schemas
Implement EmbeddingsResponseHandler for processing embedding API responses
Implement EmbeddingsRequestFormatter for preparing embedding requests
Add embeddings route to OpenAI backend (/v1/embeddings)
Export EmbeddingsResponseHandler from backends module

Mock Server Support:

Implement EmbeddingsHandler for mock server
Add EmbeddingsRequest, EmbeddingObject, and EmbeddingsResponse models
Register /v1/embeddings endpoint in mock server
Generate synthetic normalized embedding vectors (configurable dimensions)
Apply realistic timing delays based on token count

Testing:

Add unit tests for EmbeddingsResponseHandler
Add e2e tests: test_embeddings_max_requests_benchmark, test_embeddings_max_seconds_benchmark, test_embeddings_rate_benchmark
Update existing tests to account for new request type
All tests pass (1717 unit tests, 2 integration tests)

Documentation:

Create comprehensive embeddings guide (docs/guides/embeddings.md, 284 lines)
Update backends guide with embeddings endpoint information
Update datasets guide with embeddings-specific synthetic data examples
Add docstrings to all new classes and methods

Code Quality:

All linting checks pass (tox -e quality)
All type checks pass (tox -e types)
Pre-commit hooks pass
Code properly formatted with ruff and mdformat

Test Plan

Automated Tests:

Run unit tests: tox -e test-unit
- Verify TestEmbeddingsResponseHandler tests pass
Run integration tests: tox -e test-integration
Run e2e tests: tox -e test-e2e
- Tests will pass in CI with vllm-sim binary

Manual Testing:

Start the mock server:
```
guidellm mock-server --port 8000
```
Test embeddings endpoint directly:
curl -X POST http://localhost:8000/v1/embeddings
-H "Content-Type: application/json"
-d '{"input":"Test sentence","model":"test"}'
Expected: Returns JSON with embedding vectors (1536 dimensions by default)
Run an embeddings benchmark:
guidellm benchmark run
--target http://localhost:8000
--request-type embeddings
--rate 5
--max-requests 20
--data '{"type":"synthetic","prompt_tokens":128,"output_tokens":1}'
Expected: Benchmark completes successfully with metrics for 20 requests
Test with vLLM serving an embedding model:
vllm serve "BAAI/bge-small-en-v1.5"
guidellm benchmark run
--target http://localhost:8000
--request-type embeddings
--rate 10
--max-requests 100
--data '{"type":"synthetic","prompt_tokens":256,"output_tokens":1}'

Related Issues

Addresses need for embedding model benchmarking support
Complements existing text completions, chat completions, and audio benchmarking

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

sjmonson · 2025-12-15T19:35:55Z

Sorry for the delay on review; I am hung up on some performance regression work this will probably be waiting a bit longer, possibly into next year. One note: we will be merging #478 first which will affect this PR.

maryamtahhan · 2025-12-16T09:05:33Z

Sorry for the delay on review; I am hung up on some performance regression work this will probably be waiting a bit longer, possibly into next year. One note: we will be merging #478 first which will affect this PR.

No problem.

maryamtahhan · 2026-02-05T12:15:41Z

I will update this PR since #478 has been merged

maryamtahhan · 2026-02-10T14:14:42Z

Still working on this - will post an update soon

Implements full embeddings benchmarking capability including schemas, quality validation (cosine similarity, MTEB), output formatters (CSV, HTML, JSON, console), mock server handler, CLI integration, and comprehensive test suite. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

- Auto-fix ruff violations (unused imports, import sorting, unused variables) - Remove tests/remote directory (manual testing code not suitable for CI) - Sync uv.lock with embeddings feature dependencies Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Remove unused variables and imports - Fix line length violations (E501) - Add noqa comments for acceptable complexity - Update isinstance calls to use modern X | Y syntax - Fix import sorting and __all__ ordering - Add None checks for division operations - Fix union type handling with proper type guards - Add explicit type annotations where needed - Fix incompatible return types with proper narrowing - Update type: ignore comments with correct error codes - Add pytest-httpx and respx to tox test dependencies - Skip audio tests when torchcodec unavailable - Skip MTEB/quality validator tests when sentence-transformers unavailable - Fix MockServerConfig import path - Fix test expectations for schema fields - Fix zero vector handling test to match implementation - Convert E2E test data strings to lists for proper deserialization - Add HTML templates to package data - Create __init__.py for html_outputs directory - Ensure embeddings template included in distribution All quality checks (ruff, mypy, pre-commit) and unit tests now pass. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

maryamtahhan force-pushed the feat-embedding-testing branch 2 times, most recently from 375eab2 to 02ad329 Compare December 5, 2025 14:38

dbutenhof mentioned this pull request Jan 28, 2026

Support embeddings API #562

Open

dbutenhof added this to the v0.6.0 milestone Jan 28, 2026

maryamtahhan force-pushed the feat-embedding-testing branch 4 times, most recently from 190d856 to 8292ba8 Compare February 10, 2026 11:11

maryamtahhan marked this pull request as draft February 11, 2026 09:30

maryamtahhan force-pushed the feat-embedding-testing branch 17 times, most recently from 2dd66fc to 3cf0ffd Compare February 13, 2026 12:57

maryamtahhan force-pushed the feat-embedding-testing branch from 3cf0ffd to 267342f Compare February 13, 2026 18:41

maryamtahhan and others added 2 commits February 16, 2026 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add embeddings endpoint support#501

Add embeddings endpoint support#501
maryamtahhan wants to merge 3 commits intovllm-project:mainfrom
maryamtahhan:feat-embedding-testing

maryamtahhan commented Dec 5, 2025 •

edited

Loading

Uh oh!

sjmonson commented Dec 15, 2025

Uh oh!

maryamtahhan commented Dec 16, 2025

Uh oh!

maryamtahhan commented Feb 5, 2026

Uh oh!

maryamtahhan commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

maryamtahhan commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Test Plan

Use of AI

Uh oh!

sjmonson commented Dec 15, 2025

Uh oh!

maryamtahhan commented Dec 16, 2025

Uh oh!

maryamtahhan commented Feb 5, 2026

Uh oh!

maryamtahhan commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maryamtahhan commented Dec 5, 2025 •

edited

Loading