FastEmbed Service

A fast, local text embedding service built with Rust using FastEmbed and ONNX Runtime. Perfect for semantic search, recommendation systems, and RAG applications.

✨ Features

Fast & Local - No external API calls, runs entirely on your machine
ONNX Runtime - Optimized inference with ONNX for best performance
Multiple Models - Support for BGE, MiniLM, and other popular embedding models
REST API - Simple HTTP API with OpenAPI/Swagger documentation
Batch Processing - Efficient batch embedding for multiple texts
Docker Ready - Easy deployment with Docker and Docker Compose
Zero Config - Works out of the box with sensible defaults

🚀 Quick Start

Using Cargo

# Clone the repository
git clone https://github.com/101t/embedding_service.git
cd embedding_service

# Run with default settings
cargo run --release

# Or with custom model
EMBEDDING_MODEL="BAAI/bge-base-en-v1.5" cargo run --release

Using Docker

# Build and run with Docker Compose
docker compose up -d

# Or build manually
docker build -t embedding_service .
docker run -p 8001:8001 embedding_service

Using Make

make run          # Run locally
make docker-run   # Run with Docker
make help         # See all commands

📖 API Usage

Embed Single Text

curl -X POST http://localhost:8001/api/v1/embed \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, world!"}'

{
  "embedding": [0.123, -0.456, ...],
  "dimension": 384
}

Batch Embed

curl -X POST http://localhost:8001/api/v1/embed/batch \
  -H "Content-Type: application/json" \
  -d '{"texts": ["First text", "Second text"]}'

{
  "embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]],
  "dimension": 384,
  "count": 2
}

Health Check

curl http://localhost:8001/api/v1/health

Model Info

curl http://localhost:8001/api/v1/model

List Available Models

curl http://localhost:8001/api/v1/models

{
  "models": [
    {"id": "BAAI/bge-small-en-v1.5", "dimension": 384, "description": "Small, fast English model (default)"},
    ...
  ],
  "count": 22
}

📚 API Documentation

Interactive Swagger UI available at: http://localhost:8001/docs/

🤖 Available Models

You can also get the full list via API: GET /api/v1/models

BGE Models (English)

Model	Dimension	Description
`BAAI/bge-small-en-v1.5`	384	Small, fast model (default)
`BAAI/bge-base-en-v1.5`	768	Balanced performance
`BAAI/bge-large-en-v1.5`	1024	Best quality

BGE Models (Chinese)

Model	Dimension	Description
`Xenova/bge-small-zh-v1.5`	512	Small Chinese model
`Xenova/bge-large-zh-v1.5`	1024	Large Chinese model

MiniLM & MPNet Models

Model	Dimension	Description
`sentence-transformers/all-MiniLM-L6-v2`	384	Fast, lightweight model
`sentence-transformers/all-MiniLM-L12-v2`	384	Slightly larger MiniLM
`sentence-transformers/all-mpnet-base-v2`	768	MPNet base model

Multilingual Models

Model	Dimension	Description
`Xenova/paraphrase-multilingual-MiniLM-L12-v2`	384	Multilingual paraphrase
`Xenova/paraphrase-multilingual-mpnet-base-v2`	768	Multilingual MPNet
`intfloat/multilingual-e5-small`	384	E5 small multilingual
`intfloat/multilingual-e5-base`	768	E5 base multilingual
`intfloat/multilingual-e5-large`	1024	E5 large multilingual

Nomic Models

Model	Dimension	Description
`nomic-ai/nomic-embed-text-v1`	768	8192 context length
`nomic-ai/nomic-embed-text-v1.5`	768	v1.5, 8192 context length

Other Models

Model	Dimension	Description
`mixedbread-ai/mxbai-embed-large-v1`	1024	MxBai large English
`Alibaba-NLP/gte-base-en-v1.5`	768	GTE base English
`Alibaba-NLP/gte-large-en-v1.5`	1024	GTE large English
`lightonai/modernbert-embed-large`	1024	ModernBERT large
`Qdrant/clip-ViT-B-32-text`	512	CLIP text encoder
`jinaai/jina-embeddings-v2-base-code`	768	Code embeddings
`onnx-community/embeddinggemma-300m-ONNX`	768	Google EmbeddingGemma

⚙️ Configuration

Variable	Default	Description
`HOST`	`0.0.0.0`	Host to bind to
`PORT`	`8001`	Port to listen on
`EMBEDDING_MODEL`	`BAAI/bge-small-en-v1.5`	Embedding model to use
`RUST_LOG`	`info`	Log level (debug, info, warn, error)

Copy .env.example to .env and customize as needed:

cp .env.example .env

🛠️ Development

# Install Rust (if not installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Build
make build

# Run tests
make test

# Run linter
make lint

# Format code
make fmt

# Run all checks
make check

🐳 Docker

Docker Compose (Recommended)

# Start service
docker compose up -d

# View logs
docker compose logs -f

# Stop service
docker compose down

Environment Variables with Docker

# Using docker-compose with custom model
EMBEDDING_MODEL="BAAI/bge-base-en-v1.5" docker compose up -d

📊 Performance

The service uses ONNX Runtime for optimized inference. Performance varies by model:

Model	Latency (single)	Throughput (batch of 32)
BGE-small	~5ms	~50ms
BGE-base	~10ms	~100ms
BGE-large	~20ms	~200ms

Benchmarks on Intel i7-12700K, actual performance may vary

🔗 Integration Example

Python

import requests

def get_embedding(text: str) -> list[float]:
    response = requests.post(
        "http://localhost:8001/api/v1/embed",
        json={"text": text}
    )
    return response.json()["embedding"]

embedding = get_embedding("Hello, world!")

JavaScript/TypeScript

async function getEmbedding(text: string): Promise<number[]> {
  const response = await fetch("http://localhost:8001/api/v1/embed", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ text }),
  });
  const data = await response.json();
  return data.embedding;
}

🤝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Run checks (make check)
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

FastEmbed-rs - Rust bindings for FastEmbed
ONNX Runtime - High-performance inference engine
Actix-web - Powerful web framework for Rust

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml

License

101t/embedding_service

Folders and files

Latest commit

History

Repository files navigation