Skip to content

101t/embedding_service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FastEmbed Service

Rust License: MIT Docker

A fast, local text embedding service built with Rust using FastEmbed and ONNX Runtime. Perfect for semantic search, recommendation systems, and RAG applications.

✨ Features

  • Fast & Local - No external API calls, runs entirely on your machine
  • ONNX Runtime - Optimized inference with ONNX for best performance
  • Multiple Models - Support for BGE, MiniLM, and other popular embedding models
  • REST API - Simple HTTP API with OpenAPI/Swagger documentation
  • Batch Processing - Efficient batch embedding for multiple texts
  • Docker Ready - Easy deployment with Docker and Docker Compose
  • Zero Config - Works out of the box with sensible defaults

πŸš€ Quick Start

Using Cargo

# Clone the repository
git clone https://github.com/101t/embedding_service.git
cd embedding_service

# Run with default settings
cargo run --release

# Or with custom model
EMBEDDING_MODEL="BAAI/bge-base-en-v1.5" cargo run --release

Using Docker

# Build and run with Docker Compose
docker compose up -d

# Or build manually
docker build -t embedding_service .
docker run -p 8001:8001 embedding_service

Using Make

make run          # Run locally
make docker-run   # Run with Docker
make help         # See all commands

πŸ“– API Usage

Embed Single Text

curl -X POST http://localhost:8001/api/v1/embed \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, world!"}'
{
  "embedding": [0.123, -0.456, ...],
  "dimension": 384
}

Batch Embed

curl -X POST http://localhost:8001/api/v1/embed/batch \
  -H "Content-Type: application/json" \
  -d '{"texts": ["First text", "Second text"]}'
{
  "embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]],
  "dimension": 384,
  "count": 2
}

Health Check

curl http://localhost:8001/api/v1/health

Model Info

curl http://localhost:8001/api/v1/model

List Available Models

curl http://localhost:8001/api/v1/models
{
  "models": [
    {"id": "BAAI/bge-small-en-v1.5", "dimension": 384, "description": "Small, fast English model (default)"},
    ...
  ],
  "count": 22
}

πŸ“š API Documentation

Interactive Swagger UI available at: http://localhost:8001/docs/

πŸ€– Available Models

You can also get the full list via API: GET /api/v1/models

BGE Models (English)

Model Dimension Description
BAAI/bge-small-en-v1.5 384 Small, fast model (default)
BAAI/bge-base-en-v1.5 768 Balanced performance
BAAI/bge-large-en-v1.5 1024 Best quality

BGE Models (Chinese)

Model Dimension Description
Xenova/bge-small-zh-v1.5 512 Small Chinese model
Xenova/bge-large-zh-v1.5 1024 Large Chinese model

MiniLM & MPNet Models

Model Dimension Description
sentence-transformers/all-MiniLM-L6-v2 384 Fast, lightweight model
sentence-transformers/all-MiniLM-L12-v2 384 Slightly larger MiniLM
sentence-transformers/all-mpnet-base-v2 768 MPNet base model

Multilingual Models

Model Dimension Description
Xenova/paraphrase-multilingual-MiniLM-L12-v2 384 Multilingual paraphrase
Xenova/paraphrase-multilingual-mpnet-base-v2 768 Multilingual MPNet
intfloat/multilingual-e5-small 384 E5 small multilingual
intfloat/multilingual-e5-base 768 E5 base multilingual
intfloat/multilingual-e5-large 1024 E5 large multilingual

Nomic Models

Model Dimension Description
nomic-ai/nomic-embed-text-v1 768 8192 context length
nomic-ai/nomic-embed-text-v1.5 768 v1.5, 8192 context length

Other Models

Model Dimension Description
mixedbread-ai/mxbai-embed-large-v1 1024 MxBai large English
Alibaba-NLP/gte-base-en-v1.5 768 GTE base English
Alibaba-NLP/gte-large-en-v1.5 1024 GTE large English
lightonai/modernbert-embed-large 1024 ModernBERT large
Qdrant/clip-ViT-B-32-text 512 CLIP text encoder
jinaai/jina-embeddings-v2-base-code 768 Code embeddings
onnx-community/embeddinggemma-300m-ONNX 768 Google EmbeddingGemma

βš™οΈ Configuration

Variable Default Description
HOST 0.0.0.0 Host to bind to
PORT 8001 Port to listen on
EMBEDDING_MODEL BAAI/bge-small-en-v1.5 Embedding model to use
RUST_LOG info Log level (debug, info, warn, error)

Copy .env.example to .env and customize as needed:

cp .env.example .env

πŸ› οΈ Development

# Install Rust (if not installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Build
make build

# Run tests
make test

# Run linter
make lint

# Format code
make fmt

# Run all checks
make check

🐳 Docker

Docker Compose (Recommended)

# Start service
docker compose up -d

# View logs
docker compose logs -f

# Stop service
docker compose down

Environment Variables with Docker

# Using docker-compose with custom model
EMBEDDING_MODEL="BAAI/bge-base-en-v1.5" docker compose up -d

πŸ“Š Performance

The service uses ONNX Runtime for optimized inference. Performance varies by model:

Model Latency (single) Throughput (batch of 32)
BGE-small ~5ms ~50ms
BGE-base ~10ms ~100ms
BGE-large ~20ms ~200ms

Benchmarks on Intel i7-12700K, actual performance may vary

πŸ”— Integration Example

Python

import requests

def get_embedding(text: str) -> list[float]:
    response = requests.post(
        "http://localhost:8001/api/v1/embed",
        json={"text": text}
    )
    return response.json()["embedding"]

embedding = get_embedding("Hello, world!")

JavaScript/TypeScript

async function getEmbedding(text: string): Promise<number[]> {
  const response = await fetch("http://localhost:8001/api/v1/embed", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ text }),
  });
  const data = await response.json();
  return data.embedding;
}

🀝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Run checks (make check)
  4. Commit your changes (git commit -m 'feat: add amazing feature')
  5. Push to the branch (git push origin feature/amazing-feature)
  6. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

About

A fast, local text embedding service using FastEmbed and ONNX Runtime

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published