A fast, local text embedding service built with Rust using FastEmbed and ONNX Runtime. Perfect for semantic search, recommendation systems, and RAG applications.
- Fast & Local - No external API calls, runs entirely on your machine
- ONNX Runtime - Optimized inference with ONNX for best performance
- Multiple Models - Support for BGE, MiniLM, and other popular embedding models
- REST API - Simple HTTP API with OpenAPI/Swagger documentation
- Batch Processing - Efficient batch embedding for multiple texts
- Docker Ready - Easy deployment with Docker and Docker Compose
- Zero Config - Works out of the box with sensible defaults
# Clone the repository
git clone https://github.com/101t/embedding_service.git
cd embedding_service
# Run with default settings
cargo run --release
# Or with custom model
EMBEDDING_MODEL="BAAI/bge-base-en-v1.5" cargo run --release# Build and run with Docker Compose
docker compose up -d
# Or build manually
docker build -t embedding_service .
docker run -p 8001:8001 embedding_servicemake run # Run locally
make docker-run # Run with Docker
make help # See all commandscurl -X POST http://localhost:8001/api/v1/embed \
-H "Content-Type: application/json" \
-d '{"text": "Hello, world!"}'{
"embedding": [0.123, -0.456, ...],
"dimension": 384
}curl -X POST http://localhost:8001/api/v1/embed/batch \
-H "Content-Type: application/json" \
-d '{"texts": ["First text", "Second text"]}'{
"embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]],
"dimension": 384,
"count": 2
}curl http://localhost:8001/api/v1/healthcurl http://localhost:8001/api/v1/modelcurl http://localhost:8001/api/v1/models{
"models": [
{"id": "BAAI/bge-small-en-v1.5", "dimension": 384, "description": "Small, fast English model (default)"},
...
],
"count": 22
}Interactive Swagger UI available at: http://localhost:8001/docs/
You can also get the full list via API: GET /api/v1/models
| Model | Dimension | Description |
|---|---|---|
BAAI/bge-small-en-v1.5 |
384 | Small, fast model (default) |
BAAI/bge-base-en-v1.5 |
768 | Balanced performance |
BAAI/bge-large-en-v1.5 |
1024 | Best quality |
| Model | Dimension | Description |
|---|---|---|
Xenova/bge-small-zh-v1.5 |
512 | Small Chinese model |
Xenova/bge-large-zh-v1.5 |
1024 | Large Chinese model |
| Model | Dimension | Description |
|---|---|---|
sentence-transformers/all-MiniLM-L6-v2 |
384 | Fast, lightweight model |
sentence-transformers/all-MiniLM-L12-v2 |
384 | Slightly larger MiniLM |
sentence-transformers/all-mpnet-base-v2 |
768 | MPNet base model |
| Model | Dimension | Description |
|---|---|---|
Xenova/paraphrase-multilingual-MiniLM-L12-v2 |
384 | Multilingual paraphrase |
Xenova/paraphrase-multilingual-mpnet-base-v2 |
768 | Multilingual MPNet |
intfloat/multilingual-e5-small |
384 | E5 small multilingual |
intfloat/multilingual-e5-base |
768 | E5 base multilingual |
intfloat/multilingual-e5-large |
1024 | E5 large multilingual |
| Model | Dimension | Description |
|---|---|---|
nomic-ai/nomic-embed-text-v1 |
768 | 8192 context length |
nomic-ai/nomic-embed-text-v1.5 |
768 | v1.5, 8192 context length |
| Model | Dimension | Description |
|---|---|---|
mixedbread-ai/mxbai-embed-large-v1 |
1024 | MxBai large English |
Alibaba-NLP/gte-base-en-v1.5 |
768 | GTE base English |
Alibaba-NLP/gte-large-en-v1.5 |
1024 | GTE large English |
lightonai/modernbert-embed-large |
1024 | ModernBERT large |
Qdrant/clip-ViT-B-32-text |
512 | CLIP text encoder |
jinaai/jina-embeddings-v2-base-code |
768 | Code embeddings |
onnx-community/embeddinggemma-300m-ONNX |
768 | Google EmbeddingGemma |
| Variable | Default | Description |
|---|---|---|
HOST |
0.0.0.0 |
Host to bind to |
PORT |
8001 |
Port to listen on |
EMBEDDING_MODEL |
BAAI/bge-small-en-v1.5 |
Embedding model to use |
RUST_LOG |
info |
Log level (debug, info, warn, error) |
Copy .env.example to .env and customize as needed:
cp .env.example .env# Install Rust (if not installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Build
make build
# Run tests
make test
# Run linter
make lint
# Format code
make fmt
# Run all checks
make check# Start service
docker compose up -d
# View logs
docker compose logs -f
# Stop service
docker compose down# Using docker-compose with custom model
EMBEDDING_MODEL="BAAI/bge-base-en-v1.5" docker compose up -dThe service uses ONNX Runtime for optimized inference. Performance varies by model:
| Model | Latency (single) | Throughput (batch of 32) |
|---|---|---|
| BGE-small | ~5ms | ~50ms |
| BGE-base | ~10ms | ~100ms |
| BGE-large | ~20ms | ~200ms |
Benchmarks on Intel i7-12700K, actual performance may vary
import requests
def get_embedding(text: str) -> list[float]:
response = requests.post(
"http://localhost:8001/api/v1/embed",
json={"text": text}
)
return response.json()["embedding"]
embedding = get_embedding("Hello, world!")async function getEmbedding(text: string): Promise<number[]> {
const response = await fetch("http://localhost:8001/api/v1/embed", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ text }),
});
const data = await response.json();
return data.embedding;
}Contributions are welcome! Please read CONTRIBUTING.md for guidelines.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Run checks (
make check) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- FastEmbed-rs - Rust bindings for FastEmbed
- ONNX Runtime - High-performance inference engine
- Actix-web - Powerful web framework for Rust