LLMRouter is an intelligent routing system designed to optimize LLM inference by dynamically selecting the most suitable model for each query. To achieve intelligent routing, it defines:
- 🚀 Smart Routing: Automatically routes queries to the optimal LLM based on task complexity, cost, and performance requirements.
- 📊 Multiple Router Models: Support for over 15 routing models, including KNN, SVM, MLP, Matrix Factorization, Elo Rating, Graph-based routers, BERT-based routers, Hybrid probabilistic routers, transformed-score routers, multi-round routers, and many additional advanced strategies.
- 🛠️ Unified CLI: Complete command-line interface for training, inference, and interactive chat with Gradio-based UI.
| Router | Training | Inference | Description |
|---|---|---|---|
knnrouter |
✅ | ✅ | K-Nearest Neighbors based routing |
svmrouter |
✅ | ✅ | Support Vector Machine based routing |
mlprouter |
✅ | ✅ | Multi-Layer Perceptron based routing |
mfrouter |
✅ | ✅ | Matrix Factorization based routing |
elorouter |
❌ | ✅ | Elo Rating based routing |
routerdc |
✅ | ✅ | Dual Contrastive learning based routing |
automix |
❌ | ✅ | Automatic model mixing |
hybrid_llm |
✅ | ✅ | Hybrid LLM routing strategy |
graphrouter |
✅ | ✅ | Graph-based routing |
causallm_router |
✅ | ✅ | Causal Language Model router |
smallest_llm |
❌ | ✅ | Always routes to smallest model |
largest_llm |
❌ | ✅ | Always routes to largest model |
| Router | Training | Inference | Description |
|---|---|---|---|
router_r1 |
❌ | ✅ | Pre-trained Router-R1 model for multi-turn conversations |
| Router | Training | Inference | Description |
|---|---|---|---|
knnmultiroundrouter |
✅ | ✅ | KNN-based agentic router for complex tasks |
llmmultiroundrouter |
❌ | ✅ | LLM-based agentic router for complex tasks |
Clone the repository and install from source using a virtual environment (e.g., with anaconda3):
# Clone the repository
git clone https://github.com/ulab-uiuc/LLMRouter.git
cd LLMRouter
# Create and activate virtual environment
conda create -n llmrouter python=3.10
conda activate llmrouter
# Install the package
pip install -e .Note: PyPI package coming soon! Once published, you'll be able to install directly with
pip install llmrouter.
Train various router models with your configuration:
# Train KNN router
llmrouter train --router knnrouter --config configs/model_config_train/knnrouter.yaml
# Train MLP router with GPU
llmrouter train --router mlprouter --config configs/model_config_train/mlprouter.yaml --device cuda
# Train MF router quietly
llmrouter train --router mfrouter --config configs/model_config_train/mfrouter.yaml --quietPerform inference with trained routers:
# Single query inference
llmrouter infer --router knnrouter --config config.yaml --query "What is machine learning?"
# Batch inference from file
llmrouter infer --router knnrouter --config config.yaml --input queries.txt --output results.json
# Route only (without calling LLM API)
llmrouter infer --router knnrouter --config config.yaml --query "Hello" --route-only
# Custom generation parameters
llmrouter infer --router knnrouter --config config.yaml --query "Explain AI" --temp 0.7 --max-tokens 2048 --verboseInput file formats supported: .txt (one query per line), .json (list of strings or objects with "query" field), .jsonl (one JSON object per line).
📱 Quick Preview: Animated overview of the LLMRouter chat interface showing real-time routing and model selection.
🎥 Full Demo: Complete walkthrough demonstrating the interactive chat interface, including query routing, model selection, and response generation.
Launch a Gradio-based chat interface:
# Basic chat interface
llmrouter chat --router knnrouter --config config.yaml
# Custom host and port
llmrouter chat --router knnrouter --config config.yaml --host 0.0.0.0 --port 7860
# With public sharing link
llmrouter chat --router knnrouter --config config.yaml --share
# Specify query mode
llmrouter chat --router knnrouter --config config.yaml --mode full_context --top_k 5Query Modes:
current_only: Routes based on current query only (default)full_context: Combines all chat history with current queryretrieval: Retrieves top-k similar historical queries for context
You can also run the CLI scripts directly:
# Training
python -m llmrouter.cli.router_train --router knnrouter --config config.yaml
# Inference
python -m llmrouter.cli.router_inference --router knnrouter --config config.yaml --query "Hello"
# Chat
python -m llmrouter.cli.router_chat --router knnrouter --config config.yamlLLMRouter supports a plugin system that allows you to add custom router implementations without modifying the core codebase. This makes it easy to experiment with new routing strategies or domain-specific routers.
1. Create your router directory:
mkdir -p custom_routers/my_router2. Implement your router (custom_routers/my_router/router.py):
from llmrouter.models.meta_router import MetaRouter
import torch.nn as nn
class MyRouter(MetaRouter):
"""Your custom router implementation."""
def __init__(self, yaml_path: str):
# Initialize with a model (can be nn.Identity() for simple routers)
model = nn.Identity()
super().__init__(model=model, yaml_path=yaml_path)
# Get available LLM names from config
self.llm_names = list(self.llm_data.keys())
def route_single(self, query_input: dict) -> dict:
"""Route a single query to the best LLM."""
query = query_input['query']
# Your custom routing logic here
# Example: route based on query length
selected_llm = (self.llm_names[0] if len(query) < 50
else self.llm_names[-1])
return {
"query": query,
"model_name": selected_llm,
"predicted_llm": selected_llm,
}
def route_batch(self, batch: list) -> list:
"""Route multiple queries."""
return [self.route_single(q) for q in batch]3. Create configuration (custom_routers/my_router/config.yaml):
data_path:
llm_data: 'data/example_data/llm_candidates/default_llm.json'
hparam:
# Your hyperparameters here
api_endpoint: 'https://integrate.api.nvidia.com/v1'4. Use your custom router (same as built-in routers!):
# Inference
llmrouter infer --router my_router \
--config custom_routers/my_router/config.yaml \
--query "What is machine learning?"
# List all routers (including custom ones)
llmrouter list-routersCustom routers are automatically discovered from:
./custom_routers/(recommended - project directory)~/.llmrouter/plugins/(user home directory)$LLMROUTER_PLUGINSenvironment variable (colon-separated paths)
LLMRouter includes example custom routers you can learn from:
RandomRouter - Simple baseline that randomly selects an LLM
llmrouter infer --router randomrouter \
--config custom_routers/randomrouter/config.yaml \
--query "Hello world"ThresholdRouter - Advanced trainable router with difficulty estimation
# Train the router
llmrouter train --router thresholdrouter \
--config custom_routers/thresholdrouter/config.yaml
# Use for inference
llmrouter infer --router thresholdrouter \
--config custom_routers/thresholdrouter/config.yaml \
--query "Explain quantum computing"For detailed guides on creating custom routers:
- 📖 Quick Start: custom_routers/README.md
- 📖 Detailed Tutorial: docs/CUSTOM_ROUTERS.md
- 📖 Implementation Summary: CUSTOM_ROUTER_SUMMARY.md
Rule-based routing:
def route_single(self, query_input):
query = query_input['query'].lower()
if 'code' in query:
return {"model_name": "code-specialist"}
elif len(query) < 50:
return {"model_name": "small-fast-model"}
else:
return {"model_name": "large-capable-model"}Embedding-based routing:
from llmrouter.utils import get_longformer_embedding
def route_single(self, query_input):
embedding = get_longformer_embedding(query_input['query'])
# Use embedding similarity to select best model
selected = self._find_best_model(embedding)
return {"model_name": selected}Cost-optimized routing:
def route_single(self, query_input):
difficulty = self._estimate_difficulty(query_input)
# Select cheapest model that can handle the difficulty
for model_name, info in sorted(self.llm_data.items(),
key=lambda x: x[1]['cost']):
if info['capability'] >= difficulty:
return {"model_name": model_name}
