Semantic Code Context For Your LLM
Indexter indexes your local git repositories, parses them semantically using tree-sitter, and provides a hybrid search interface for AI agents via the Model Context Protocol (MCP).
- Features
- Supported Languages
- Prerequisites
- Installation
- Quickstart
- Configuration
- CLI Usage
- MCP Usage
- Programmatic Usage
- Contributing
- 🌳 Semantic parsing using tree-sitter for:
- Python, JavaScript, TypeScript (including JSX/TSX), Rust
- HTML, CSS, JSON, YAML, TOML, Markdown
- Generic chunking fallback for other file types
- 📁 Respects .gitignore and configurable ignore patterns
- 🔄 Incremental updates sync changed files via content hash comparison
- 🔍 Hybrid search combining dense semantic vectors and sparse keyword vectors with reciprocal rank fusion (RRF)
- ⚡ Powered by Qdrant vector database with automatic embedding generation via FastEmbed
- ⌨️ CLI for indexing repositories, searching code and inspecting configuration from your terminal
- 🤖 MCP server for seamless AI agent integration via FastMCP
- 📦 Multi-repo support with separate collections per repository
- ⚙️ XDG-compliant configuration and data storage
Indexter uses hybrid search to combine the strengths of both semantic and keyword-based retrieval:
-
Dense Vectors: Semantic embeddings (default:
sentence-transformers/all-MiniLM-L6-v2) capture the meaning and context of code, enabling natural language queries like "authentication handler" to find relevant code even without exact keyword matches. -
Sparse Vectors: BM25 keyword embeddings (default:
Qdrant/bm25) provide traditional keyword-based search, ensuring exact matches for function names, variable names, and technical terms. -
Reciprocal Rank Fusion (RRF): Results from both dense and sparse searches are combined and re-ranked using RRF, which:
- Merges rankings from multiple retrieval methods
- Reduces the impact of outliers from any single method
- Provides more robust and relevant results than either approach alone
This hybrid approach ensures you get the best of both worlds: semantic understanding for conceptual queries and precision matching for specific identifiers.
Indexter uses tree-sitter for semantic parsing. Each parser extracts meaningful code units along with their documentation (docstrings, JSDoc, TSDoc, Rust doc comments, etc.):
| Language | Extensions | Semantic Units Extracted |
|---|---|---|
| Python | .py |
Functions (sync/async), classes, decorated definitions, module-level constants + docstrings |
| JavaScript | .js, .jsx |
Function declarations, generators, arrow functions, classes, methods + JSDoc comments |
| TypeScript | .ts, .tsx |
Functions, generators, arrow functions, classes, interfaces, type aliases + TSDoc comments |
| Rust | .rs |
Functions (sync/async/unsafe), structs, enums, traits, impl blocks + doc comments (///, //!) |
| HTML | .html |
Semantic elements: tables, lists, headers (<h1>–<h6>) |
| CSS | .css |
Rule sets, media queries, keyframes, imports, at-rules |
| JSON | .json |
Objects, arrays |
| YAML | .yaml, .yml |
Block mappings, block sequences |
| TOML | .toml |
Tables, array tables, top-level pairs |
| Markdown | .md, .mkd, .markdown |
ATX headings with section content |
| Fallback | * |
Fixed-size overlapping chunks (for unsupported file types) |
To install the full application (CLI + MCP server):
uv tool install "indexter[full]"Indexter is modular. You can install only the components you need:
- Full Application (CLI + MCP):
uv tool install "indexter[full]" - CLI Only:
uv tool install indexter[cli] - MCP Server Only:
uv tool install indexter[mcp] - Core Library Only:
uv add indexter[core](preferred: explicit > implicit) oruv add indexter- Useful for programmatic usage or building custom integrations.
pipx install "indexter[full]"git clone https://github.com/jdbadger/indexter.git
cd indexter
uv sync --all-extras# Initialize the Qdrant vector store (pulls Docker image and starts container)
indexter store init
# Initialize and index a repository (indexes automatically by default)
indexter init --path /path/to/your/repo/root
# Or initialize current directory
indexter init
# Search the indexed code
indexter search "function that handles authentication" your-repo-name
# Check status of all indexed repositories
indexter statusIndexter uses XDG-compliant paths for configuration and data storage:
| Type | Path |
|---|---|
| Config | ~/.config/indexter/indexter.toml |
| Data | ~/.local/share/indexter/ |
The global config controls embedding model, file processing settings, vector store, and MCP server:
# Show current configuration
indexter config show
# Get config file path
indexter config path
# Edit config manually
$EDITOR $(indexter config path)# ~/.config/indexter/indexter.toml
# Dense embedding model for semantic search (default: 384-dim sentence-transformers model)
embedding_model = "sentence-transformers/all-MiniLM-L6-v2"
# Sparse embedding model for keyword-based search (BM25 algorithm)
sparse_embedding_model = "Qdrant/bm25"
# File patterns to exclude from indexing (gitignore-style syntax)
# These are in addition to patterns from .gitignore files
ignore_patterns = [
".git/",
"__pycache__/",
"*.pyc",
".DS_Store",
"node_modules/",
".venv/",
"*.lock",
# etc...
]
# Maximum file size (in bytes) to process
max_file_size = 1048576 # 1 MB
# Maximum number of files to process in a repository
max_files = 1000
# Number of top similar documents to retrieve for queries
top_k = 10
# Number of documents to upsert in a single batch operation
upsert_batch_size = 100
[store]
# Vector Store connection mode: 'server' or 'memory'
mode = "server"
# Docker image for the Qdrant container (used by 'indexter store init')
image = "qdrant/qdrant:latest"
# Timeout in seconds for API operations (increase if experiencing timeouts)
timeout = 120
# Server mode settings (used when mode = "server"):
# host = "localhost" # Hostname of the Qdrant server
# port = 6333 # HTTP API port
# grpc_port = 6334 # gRPC port
# prefer_grpc = false # Whether to prefer gRPC over HTTP
# api_key = "" # API key for authentication
[mcp]
# MCP transport mode: 'stdio' or 'http'
transport = "stdio"
# HTTP mode settings (only used when transport = "http"):
# host = "localhost" # Hostname for the MCP HTTP server
# port = 8765 # Port for the MCP HTTP serverStore Modes:
server: Docker-managed Qdrant container (default, managed viaindexter storecommands)memory: In-RAM, ephemeral — useful for testing
MCP Transports:
stdio: Standard input/output streams (default for MCP server integrations)http: HTTP server mode — configurehostandport
Settings can also be overridden via environment variables:
| Variable | Default | Description |
|---|---|---|
INDEXTER_EMBEDDING_MODEL |
sentence-transformers/all-MiniLM-L6-v2 |
Dense embedding model for semantic search |
INDEXTER_SPARSE_EMBEDDING_MODEL |
Qdrant/bm25 |
Sparse embedding model for keyword search |
INDEXTER_MAX_FILE_SIZE |
1048576 |
Maximum file size in bytes |
INDEXTER_MAX_FILES |
1000 |
Maximum files per repository |
INDEXTER_TOP_K |
10 |
Number of search results |
INDEXTER_UPSERT_BATCH_SIZE |
100 |
Batch size for vector operations |
INDEXTER_STORE_MODE |
server |
Storage mode: server or memory |
INDEXTER_STORE_IMAGE |
qdrant/qdrant:latest |
Docker image for Qdrant container |
INDEXTER_STORE_HOST |
localhost |
Qdrant server host |
INDEXTER_STORE_PORT |
6333 |
Qdrant HTTP API port |
INDEXTER_STORE_GRPC_PORT |
6334 |
Qdrant gRPC port |
INDEXTER_STORE_PREFER_GRPC |
false |
Prefer gRPC over HTTP |
INDEXTER_STORE_API_KEY |
None |
Qdrant API key |
INDEXTER_STORE_TIMEOUT |
120 |
API operation timeout (seconds) |
INDEXTER_MCP_TRANSPORT |
stdio |
MCP transport: stdio or http |
INDEXTER_MCP_HOST |
localhost |
MCP HTTP server host |
INDEXTER_MCP_PORT |
8765 |
MCP HTTP server port |
Create an indexter.toml in your repository root, or add a [tool.indexter] section to pyproject.toml:
# indexter.toml (or [tool.indexter] in pyproject.toml)
# Dense embedding model for semantic search
# embedding_model = "sentence-transformers/all-MiniLM-L6-v2"
# Sparse embedding model for keyword-based search
# sparse_embedding_model = "Qdrant/bm25"
# Additional patterns to ignore (combined with .gitignore and global patterns)
ignore_patterns = [
"*.generated.*",
"vendor/",
]
# Maximum file size (in bytes) to process. Default: 1048576 (1 MB)
# max_file_size = 1048576
# Maximum number of files to process in this repository. Default: 1000
# max_files = 1000
# Number of top similar documents to retrieve for queries. Default: 10
# top_k = 10
# Number of documents to batch when upserting to vector store. Default: 50
# upsert_batch_size = 100indexter - Enhanced codebase context for AI agents via RAG.
Commands:
init Initialize a git repository for indexing
--path, -p Path to the git repository (defaults to current directory)
--no-index, -n Skip indexing after initialization
index <name> Sync a repository to the vector store
search <query> <name> Search indexed nodes in a repository
status Show status of indexed repositories
forget <name> Remove a repository from indexter
config View Indexter global settings
show Show global settings with syntax highlighting
path Print path to the settings config file
store Manage the Qdrant vector store container
init Initialize the store (pull image and start container)
start Start the store container
status Show store container status
stop Stop the store container
remove Remove the store container
--volumes, -v Also remove data volumes
Options:
--verbose, -v Enable verbose output
--version Show version
--help Show help
Indexter uses Qdrant as its vector database, running in a Docker container. The indexter store commands manage this container lifecycle:
# Initialize the store (pulls image, creates and starts container)
indexter store init
# Check container status
indexter store status
# Stop the container (data is preserved)
indexter store stop
# Start a stopped container
indexter store start
# Remove the container (data is preserved in ~/.local/share/indexter/qdrant)
indexter store remove
# Remove the container and all stored data
indexter store remove --volumesThe store data is persisted in ~/.local/share/indexter/qdrant via a bind mount, so your indexed data survives container restarts and removals (unless --volumes is used).
# Initialize and index a repository
indexter init --path ~/projects/my-repo
# Initialize without indexing (index later)
indexter init --path ~/projects/my-repo --no-index
indexter index my-repo
# Initialize current directory
indexter init
# Force full re-index (ignores incremental sync)
indexter index my-repo --full
# Search with result limit
indexter search "error handling" my-repo --limit 5
# Forget a repository (removes from indexter and deletes indexed data)
indexter forget my-repoIndexter provides an MCP server for AI agent integration. The server exposes:
| Type | Name | Description |
|---|---|---|
| Tool | list_repositories |
List all configured repositories with their indexing status |
| Tool | get_repository |
Get metadata for a specific repository |
| Tool | search_repository |
Semantic search across indexed code with filtering options |
| Prompt | search_workflow |
Guide for effectively searching code repositories |
Add to your claude_desktop_config.json (located at ~/Library/Application Support/Claude/ on macOS or %APPDATA%\Claude\ on Windows):
{
"mcpServers": {
"indexter": {
"command": "indexter-mcp"
}
}
}If installed with uv:
{
"mcpServers": {
"indexter": {
"command": "uv",
"args": ["tool", "run", "indexter-mcp"]
}
}
}Add to your VS Code settings (.vscode/settings.json in your workspace or user settings):
{
"github.copilot.chat.mcp.servers": {
"indexter": {
"command": "indexter-mcp"
}
}
}If installed with uv:
{
"github.copilot.chat.mcp.servers": {
"indexter": {
"command": "uv",
"args": ["tool", "run", "indexter-mcp"]
}
}
}Add to your Cursor MCP settings:
{
"mcpServers": {
"indexter": {
"command": "indexter-mcp"
}
}
}If installed with uv:
{
"mcpServers": {
"indexter": {
"command": "uv",
"args": ["tool", "run", "indexter-mcp"]
}
}
}For custom integrations, use the Repo class with a VectorStore context manager:
import asyncio
from pathlib import Path
from indexter import Repo
from indexter.store import VectorStore
async def main():
async with VectorStore() as store:
# Initialize a new repository (name derived from directory)
repo = await Repo.init(Path("/path/to/your/repo"), store)
# Index the repository
result = await repo.index(store)
print(f"Indexed {result.nodes_added} nodes")
# Search indexed code
results = await repo.search("authentication handler", store, limit=5)
for r in results.results:
print(f"{r.score:.3f}: {r.metadata['node_name']}")
# Retrieve an existing repository with status metadata
repo = await Repo.get_one("my-repo", store, with_metadata=True)
print(f"Stale: {repo.metadata.is_stale}")
# List all configured repositories
all_repos = await Repo.get_all(store)
# Remove a repository and its indexed data
await Repo.remove_one("my-repo", store)
asyncio.run(main())Key properties: repo.name, repo.path, repo.collection_name, repo.settings.
Contributions are welcome! Please fork the repository, create a feature branch, and submit a pull request.
# Clone your fork
git clone https://github.com/YOUR_USERNAME/indexter.git
cd indexter
# Install dependencies with all extras and test dependencies
uv sync --all-extras --group test
# Run tests
uv run --group test pytest
# Run tests against all supported python versions
uv run just testThis repository uses pre-commit to automatically run code quality checks before commits. The following hooks are configured:
- File validation: Check JSON, TOML, and YAML syntax, prevent large files
- Dependency locking: Keep
uv.locksynchronized withpyproject.toml - Code formatting: Format code with Ruff
- Linting: Lint and auto-fix issues with Ruff
- Testing: Run tests with pytest and testmon for fast incremental testing
- Type checking: Verify type hints with ty
First, install pre-commit if you haven't already:
uv tool install pre-commitThen initialize pre-commit for your clone:
pre-commit install
pre-commit install-hooksPre-commit hooks will now run automatically on git commit. To run all hooks manually:
# Run all hooks on all files
pre-commit run --all-files
# Run all hooks on staged files only
pre-commit run
# Run a specific hook
pre-commit run ruff-format --all-filesMIT License - See LICENSE for details.