Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 135 additions & 24 deletions backend/CLAUDE.md → backend/.claude/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,14 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

This is the FastAPI backend for the GitHub Knowledge Vault project. It provides:
- **REST API** for repository browsing and documentation retrieval
- **WebSocket Chat** for real-time AI conversations with Claude
- **WebSocket Chat** for real-time AI conversations with LLM
- **MCP Integration** for external documentation server communication
- **Streaming LLM** responses with automatic tool execution
- **Streaming LLM** responses with automatic tool execution via litellm
- **Multi-provider support** (OpenRouter, Anthropic, OpenAI, 100+ models)

The backend uses a 3-layer architecture:
1. **FastAPI Layer** (`app/main.py`) - HTTP/WebSocket endpoints
2. **LLM Layer** (`app/llm.py`) - Claude streaming + tool orchestration
2. **LLM Layer** (`app/llm.py`) - litellm streaming + tool orchestration
3. **MCP Layer** (`app/mcp.py`) - External MCP server communication

## Development Setup
Expand Down Expand Up @@ -42,8 +43,13 @@ uv add --dev package-name

### Environment Configuration
1. Copy `.env.example` to `.env`
2. Add your `ANTHROPIC_API_KEY` (required)
3. Configure optional settings (Claude model, MCP server URL, CORS origins)
2. Add your `OPENROUTER_API_KEY` (required - get at https://openrouter.ai/keys)
3. Configure optional settings (model name, MCP server URL, CORS origins)

**Free Models Available:**
- `openrouter/meta-llama/llama-3.3-70b-instruct` - Best overall (recommended)
- `openrouter/qwen/qwen-coder-turbo` - Best for code
- `openrouter/deepseek/deepseek-r1` - Best for reasoning

### Running the Application

Expand All @@ -65,11 +71,26 @@ Run tests:
uv run pytest
```

Run tests with verbose output:
```bash
uv run pytest -v
```

Run tests with coverage:
```bash
uv run pytest --cov=app
```

Run specific test file:
```bash
uv run pytest tests/test_rest_api.py -v
```

**Test Configuration:**
- Tests use a mock MCP server on port 3002
- pytest.ini is configured for asyncio auto mode
- All 24 tests must pass before committing

### Docker

Build the image:
Expand All @@ -92,11 +113,14 @@ docker run -p 3001:3001 --env-file .env github-knowledge-vault-backend
- Conversation management (in-memory storage)
- Application lifespan management (startup/shutdown)
- CORS middleware configuration
- Health check with service status

**2. LLM Layer** (`app/llm.py`)
- Claude API integration with streaming support
- Tool definitions for documentation operations
- **litellm** integration for multi-provider support
- Streaming chat with OpenAI-compatible API
- Tool definitions in OpenAI format (required by litellm)
- Automatic tool execution loop
- Event translation layer (maintains WebSocket compatibility)
- System prompt builder with repository context
- Singleton `llm_client` instance

Expand All @@ -110,7 +134,7 @@ docker run -p 3001:3001 --env-file .env github-knowledge-vault-backend

**REST Endpoints**
- `GET /` - Root endpoint
- `GET /health` - Health check with service status
- `GET /health` - Health check with service status (provider, model, MCP)
- `GET /config` - Current configuration (excluding secrets)
- `GET /api/repos` - List all repositories
- `GET /api/repos/{name}/tree` - Get repository file tree
Expand All @@ -119,7 +143,7 @@ docker run -p 3001:3001 --env-file .env github-knowledge-vault-backend
- `GET /api/conversations/{id}/messages` - Get conversation history

**WebSocket Endpoint**
- `WS /ws/chat/{conversation_id}` - Real-time chat with Claude
- `WS /ws/chat/{conversation_id}` - Real-time chat with streaming LLM

### Configuration Pattern
The application uses Pydantic Settings (`app/config.py`) for environment-based configuration:
Expand All @@ -131,10 +155,11 @@ The application uses Pydantic Settings (`app/config.py`) for environment-based c
### Key Dependencies
- **FastAPI**: Web framework
- **Uvicorn**: ASGI server
- **Anthropic SDK**: Claude API integration with streaming
- **litellm**: Multi-provider LLM library (OpenRouter, Anthropic, OpenAI, 100+ models)
- **httpx**: Async HTTP client for MCP server communication
- **Pydantic Settings**: Configuration management
- **WebSockets**: Real-time communication support
- **pytest**: Testing framework with asyncio support

## Common Patterns

Expand All @@ -143,18 +168,21 @@ The application uses Pydantic Settings (`app/config.py`) for environment-based c
from app.config import settings

# Access configuration
api_key = settings.ANTHROPIC_API_KEY
model = settings.CLAUDE_MODEL
api_key = settings.OPENROUTER_API_KEY
model = settings.MODEL_NAME
api_base = settings.API_BASE
max_tokens = settings.MAX_TOKENS
```

### Environment Variables
All configuration is environment-driven. See `.env.example` for available options:
- `ANTHROPIC_API_KEY` (required)
- `CLAUDE_MODEL` (optional, defaults to claude-sonnet-4-20250514)
- `CLAUDE_MAX_TOKENS` (optional, defaults to 4096)
- `MCP_SERVER_URL` (optional, defaults to http://mcp-server:3000)
- `MCP_TIMEOUT` (optional, defaults to 30 seconds)
- `CORS_ORIGINS` (optional, defaults to localhost development ports)
- `OPENROUTER_API_KEY` (required) - Get at https://openrouter.ai/keys
- `MODEL_NAME` (optional) - defaults to `openrouter/meta-llama/llama-3.3-70b-instruct`
- `API_BASE` (optional) - defaults to `https://openrouter.ai/api/v1`
- `MAX_TOKENS` (optional) - defaults to 4096
- `MCP_SERVER_URL` (optional) - defaults to `http://mcp-server:3000`
- `MCP_TIMEOUT` (optional) - defaults to 30 seconds
- `CORS_ORIGINS` (optional) - defaults to localhost development ports

## Module Documentation

Expand All @@ -164,11 +192,21 @@ Pydantic Settings for environment-based configuration:
from app.config import settings

# Access any setting
api_key = settings.ANTHROPIC_API_KEY
model = settings.CLAUDE_MODEL
api_key = settings.OPENROUTER_API_KEY
model = settings.MODEL_NAME
api_base = settings.API_BASE
cors = settings.CORS_ORIGINS
```

**Settings:**
- `OPENROUTER_API_KEY`: API key for OpenRouter
- `MODEL_NAME`: Model identifier (e.g., `openrouter/meta-llama/llama-3.3-70b-instruct`)
- `API_BASE`: API base URL for litellm
- `MAX_TOKENS`: Maximum tokens for LLM responses
- `MCP_SERVER_URL`: URL for MCP server
- `MCP_TIMEOUT`: Timeout for MCP requests in seconds
- `CORS_ORIGINS`: List of allowed CORS origins

### `app/mcp.py` - MCP Client
HTTP client for MCP server communication:
```python
Expand All @@ -195,7 +233,7 @@ await mcp_client.disconnect()
- `list_repo_docs` - List all docs in a repository (requires: `repo`)

### `app/llm.py` - LLM Client
Claude API client with streaming and tool execution:
Multi-provider LLM client using litellm with streaming and tool execution:
```python
from app.llm import llm_client

Expand All @@ -213,7 +251,7 @@ async for event in llm_client.chat_stream(messages, context):
```

**Event Types:**
- `{"type": "text", "content": "..."}` - Text chunk from Claude
- `{"type": "text", "content": "..."}` - Text chunk from LLM
- `{"type": "tool_use_start", "toolId": "...", "name": "...", "input": {...}}` - Tool execution begins
- `{"type": "tool_result", "toolId": "...", "name": "...", "result": {...}, "duration": 123}` - Tool execution completes

Expand All @@ -222,12 +260,39 @@ async for event in llm_client.chat_stream(messages, context):
- When `context.scope == "repo"`, adds repository-specific instructions
- Automatically built with `build_system_prompt(context)`

**Tool Format:**
- Tools are defined in OpenAI format (required by litellm)
- litellm handles conversion between provider formats automatically
- Tools are executed via MCP client

**Streaming:**
- Uses litellm's `acompletion` with `stream=True`
- Handles tool execution loop with continuation
- Maintains backward-compatible event format for WebSocket clients

### `app/main.py` - FastAPI Application
Main application with REST and WebSocket endpoints:
```python
# See API Endpoints section above for available routes
```

**Health Check:**
Returns provider, model, and service status:
```json
{
"status": "healthy",
"version": "1.0.0",
"services": {
"mcp_server": {"status": "connected"},
"llm_api": {
"status": "available",
"provider": "openrouter",
"model": "openrouter/meta-llama/llama-3.3-70b-instruct"
}
}
}
```

## WebSocket Protocol

### Connection
Expand Down Expand Up @@ -265,7 +330,7 @@ const ws = new WebSocket(`ws://localhost:3001/ws/chat/${conversationId}`);
```json
{
"type": "tool_use_start",
"toolId": "toolu_123",
"toolId": "tool_123",
"name": "list_repositories",
"input": {}
}
Expand All @@ -275,7 +340,7 @@ const ws = new WebSocket(`ws://localhost:3001/ws/chat/${conversationId}`);
```json
{
"type": "tool_result",
"toolId": "toolu_123",
"toolId": "tool_123",
"name": "list_repositories",
"result": [...],
"duration": 234
Expand All @@ -296,3 +361,49 @@ const ws = new WebSocket(`ws://localhost:3001/ws/chat/${conversationId}`);
```json
{"type": "pong"}
```

## Testing

### Test Structure
- `tests/test_rest_api.py` - REST endpoint tests (health, repos, conversations)
- `tests/test_websocket.py` - WebSocket chat tests with mocked LLM
- `tests/test_integration.py` - End-to-end workflow tests
- `tests/test_error_scenarios.py` - Error handling and edge cases
- `tests/conftest.py` - Pytest fixtures and configuration
- `tests/mock_mcp_server.py` - Mock MCP server for testing

### Running Tests in IDE
1. Configure pytest as test runner in IDE settings
2. Mark `tests` directory as Test Sources Root
3. Set environment variables in run configuration:
- `MCP_SERVER_URL=http://localhost:3002`
- `OPENROUTER_API_KEY=test-key-12345`
4. Run/debug individual tests by clicking play button in gutter

### Test Coverage
All 24 tests must pass:
- ✅ 4 error scenario tests
- ✅ 8 integration workflow tests
- ✅ 10 REST API tests
- ✅ 6 WebSocket tests

## Migration Notes

### From Anthropic SDK to litellm
The backend was migrated from direct Anthropic SDK to litellm for:
- **Multi-provider support**: OpenRouter, Anthropic, OpenAI, 100+ models
- **Cost reduction**: Access to free models via OpenRouter
- **Flexibility**: Easy model switching via environment variables

**Key Changes:**
- `ANTHROPIC_API_KEY` → `OPENROUTER_API_KEY`
- `CLAUDE_MODEL` → `MODEL_NAME`
- Tool format: Anthropic format → OpenAI format (required by litellm)
- Streaming: `anthropic.messages.stream()` → `litellm.acompletion(stream=True)`
- Event format: Maintained backward compatibility via translation layer

**Benefits:**
- Free tier models available (Llama 3.3 70B, Qwen Coder, DeepSeek)
- Can switch between providers without code changes
- Same WebSocket protocol for frontend compatibility
- All tests continue to pass
10 changes: 9 additions & 1 deletion backend/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,13 @@ OPENROUTER_API_KEY=sk-or-v1-your_key_here
MODEL_NAME=openrouter/meta-llama/llama-3.3-70b-instruct

MAX_TOKENS=4096
MCP_SERVER_URL=http://mcp-server:3000

# MCP Server Configuration
# For local development: http://localhost:8003/mcp
# For Docker: http://mcp-server:3000/mcp
# For Hugging Face: https://your-space.hf.space/mcp
MCP_SERVER_URL=http://localhost:8003/mcp
MCP_TIMEOUT=30

# GitHub organization to query via MCP server
GITHUB_ORG=your-org-name
3 changes: 3 additions & 0 deletions backend/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,6 @@ Thumbs.db

# Logs
*.log

# Claude Code local settings
.claude/settings.local.json
5 changes: 4 additions & 1 deletion backend/app/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,12 @@ class Settings(BaseSettings):
API_BASE: str = "https://openrouter.ai/api/v1"
MAX_TOKENS: int = 4096

MCP_SERVER_URL: str = "http://mcp-server:3000"
MCP_SERVER_URL: str = "https://sperva-github-mcp-server.hf.space"
MCP_TIMEOUT: int = 30

# GitHub organization for MCP server
GITHUB_ORG: str = "lifter-ai"

CORS_ORIGINS: List[str] = ["http://localhost:5173", "http://localhost:3000"]


Expand Down
Loading