Skip to content

Comments

Add embeddings endpoint support#501

Draft
maryamtahhan wants to merge 3 commits intovllm-project:mainfrom
maryamtahhan:feat-embedding-testing
Draft

Add embeddings endpoint support#501
maryamtahhan wants to merge 3 commits intovllm-project:mainfrom
maryamtahhan:feat-embedding-testing

Conversation

@maryamtahhan
Copy link
Contributor

@maryamtahhan maryamtahhan commented Dec 5, 2025

Adds support for benchmarking the /v1/embeddings endpoint, enabling performance testing of text embedding models.

  • Add embeddings request type to schemas

  • Implement EmbeddingsResponseHandler for processing embedding responses

  • Add EmbeddingsRequestFormatter for request preparation

  • Implement mock server handler with synthetic embedding generation

  • Add e2e and unit tests for embeddings benchmarking

  • Add embeddings guide documentation

    Summary

This PR adds support for benchmarking the /v1/embeddings endpoint, enabling performance testing of text embedding models.

The implementation includes full integration with guidellm's existing infrastructure: request formatting, response handling, metrics collection, synthetic data generation, and the mock server for testing.

Details

Core Implementation:

  • Add embeddings to GenerativeRequestType literal in schemas
  • Implement EmbeddingsResponseHandler for processing embedding API responses
  • Implement EmbeddingsRequestFormatter for preparing embedding requests
  • Add embeddings route to OpenAI backend (/v1/embeddings)
  • Export EmbeddingsResponseHandler from backends module

Mock Server Support:

  • Implement EmbeddingsHandler for mock server
  • Add EmbeddingsRequest, EmbeddingObject, and EmbeddingsResponse models
  • Register /v1/embeddings endpoint in mock server
  • Generate synthetic normalized embedding vectors (configurable dimensions)
  • Apply realistic timing delays based on token count

Testing:

  • Add unit tests for EmbeddingsResponseHandler
  • Add e2e tests: test_embeddings_max_requests_benchmark, test_embeddings_max_seconds_benchmark, test_embeddings_rate_benchmark
  • Update existing tests to account for new request type
  • All tests pass (1717 unit tests, 2 integration tests)

Documentation:

  • Create comprehensive embeddings guide (docs/guides/embeddings.md, 284 lines)
  • Update backends guide with embeddings endpoint information
  • Update datasets guide with embeddings-specific synthetic data examples
  • Add docstrings to all new classes and methods

Code Quality:

  • All linting checks pass (tox -e quality)
  • All type checks pass (tox -e types)
  • Pre-commit hooks pass
  • Code properly formatted with ruff and mdformat

Test Plan

Automated Tests:

  1. Run unit tests: tox -e test-unit
    • Verify TestEmbeddingsResponseHandler tests pass
  2. Run integration tests: tox -e test-integration
  3. Run e2e tests: tox -e test-e2e
    • Tests will pass in CI with vllm-sim binary

Manual Testing:

  1. Start the mock server:
    guidellm mock-server --port 8000
    
  2. Test embeddings endpoint directly:
    curl -X POST http://localhost:8000/v1/embeddings
    -H "Content-Type: application/json"
    -d '{"input":"Test sentence","model":"test"}'
  3. Expected: Returns JSON with embedding vectors (1536 dimensions by default)
  4. Run an embeddings benchmark:
    guidellm benchmark run
    --target http://localhost:8000
    --request-type embeddings
    --rate 5
    --max-requests 20
    --data '{"type":"synthetic","prompt_tokens":128,"output_tokens":1}'
  5. Expected: Benchmark completes successfully with metrics for 20 requests
  6. Test with vLLM serving an embedding model:
    vllm serve "BAAI/bge-small-en-v1.5"
    guidellm benchmark run
    --target http://localhost:8000
    --request-type embeddings
    --rate 10
    --max-requests 100
    --data '{"type":"synthetic","prompt_tokens":256,"output_tokens":1}'

Related Issues

  • Addresses need for embedding model benchmarking support
  • Complements existing text completions, chat completions, and audio benchmarking

  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

@maryamtahhan maryamtahhan force-pushed the feat-embedding-testing branch 2 times, most recently from 375eab2 to 02ad329 Compare December 5, 2025 14:38
@sjmonson
Copy link
Collaborator

Sorry for the delay on review; I am hung up on some performance regression work this will probably be waiting a bit longer, possibly into next year. One note: we will be merging #478 first which will affect this PR.

@maryamtahhan
Copy link
Contributor Author

Sorry for the delay on review; I am hung up on some performance regression work this will probably be waiting a bit longer, possibly into next year. One note: we will be merging #478 first which will affect this PR.

No problem.

@dbutenhof dbutenhof added this to the v0.6.0 milestone Jan 28, 2026
@maryamtahhan
Copy link
Contributor Author

I will update this PR since #478 has been merged

@maryamtahhan maryamtahhan force-pushed the feat-embedding-testing branch 4 times, most recently from 190d856 to 8292ba8 Compare February 10, 2026 11:11
@maryamtahhan
Copy link
Contributor Author

Still working on this - will post an update soon

@maryamtahhan maryamtahhan marked this pull request as draft February 11, 2026 09:30
@maryamtahhan maryamtahhan force-pushed the feat-embedding-testing branch 17 times, most recently from 2dd66fc to 3cf0ffd Compare February 13, 2026 12:57
Implements full embeddings benchmarking capability
including schemas, quality validation (cosine similarity,
MTEB), output formatters (CSV, HTML, JSON, console),
mock server handler, CLI integration, and comprehensive
test suite.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
@maryamtahhan maryamtahhan force-pushed the feat-embedding-testing branch from 3cf0ffd to 267342f Compare February 13, 2026 18:41
maryamtahhan and others added 2 commits February 16, 2026 16:14
- Auto-fix ruff violations (unused imports, import sorting, unused variables)
- Remove tests/remote directory (manual testing code not suitable for CI)
- Sync uv.lock with embeddings feature dependencies

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Remove unused variables and imports
- Fix line length violations (E501)
- Add noqa comments for acceptable complexity
- Update isinstance calls to use modern X | Y syntax
- Fix import sorting and __all__ ordering

- Add None checks for division operations
- Fix union type handling with proper type guards
- Add explicit type annotations where needed
- Fix incompatible return types with proper narrowing
- Update type: ignore comments with correct error codes

- Add pytest-httpx and respx to tox test dependencies
- Skip audio tests when torchcodec unavailable
- Skip MTEB/quality validator tests when sentence-transformers unavailable
- Fix MockServerConfig import path
- Fix test expectations for schema fields
- Fix zero vector handling test to match implementation
- Convert E2E test data strings to lists for proper deserialization

- Add HTML templates to package data
- Create __init__.py for html_outputs directory
- Ensure embeddings template included in distribution

All quality checks (ruff, mypy, pre-commit) and unit tests now pass.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants