SPerekrestova · SPerekrestova · Jan 16, 2026
diff --git a/backend/CLAUDE.md → backend/.claude/CLAUDE.md b/backend/CLAUDE.md → backend/.claude/CLAUDE.md
@@ -6,13 +6,14 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 This is the FastAPI backend for the GitHub Knowledge Vault project. It provides:
 - **REST API** for repository browsing and documentation retrieval
-- **WebSocket Chat** for real-time AI conversations with Claude
+- **WebSocket Chat** for real-time AI conversations with LLM
 - **MCP Integration** for external documentation server communication
-- **Streaming LLM** responses with automatic tool execution
+- **Streaming LLM** responses with automatic tool execution via litellm
+- **Multi-provider support** (OpenRouter, Anthropic, OpenAI, 100+ models)
 
 The backend uses a 3-layer architecture:
 1. **FastAPI Layer** (`app/main.py`) - HTTP/WebSocket endpoints
-2. **LLM Layer** (`app/llm.py`) - Claude streaming + tool orchestration
+2. **LLM Layer** (`app/llm.py`) - litellm streaming + tool orchestration
 3. **MCP Layer** (`app/mcp.py`) - External MCP server communication
 
 ## Development Setup
@@ -42,8 +43,13 @@ uv add --dev package-name
 
 ### Environment Configuration
 1. Copy `.env.example` to `.env`
-2. Add your `ANTHROPIC_API_KEY` (required)
-3. Configure optional settings (Claude model, MCP server URL, CORS origins)
+2. Add your `OPENROUTER_API_KEY` (required - get at https://openrouter.ai/keys)
+3. Configure optional settings (model name, MCP server URL, CORS origins)
+
+**Free Models Available:**
+- `openrouter/meta-llama/llama-3.3-70b-instruct` - Best overall (recommended)
+- `openrouter/qwen/qwen-coder-turbo` - Best for code
+- `openrouter/deepseek/deepseek-r1` - Best for reasoning
 
 ### Running the Application
 
@@ -65,11 +71,26 @@ Run tests:
 uv run pytest
 ```
 
+Run tests with verbose output:
+```bash
+uv run pytest -v
+```
+
 Run tests with coverage:
 ```bash
 uv run pytest --cov=app
 ```
 
+Run specific test file:
+```bash
+uv run pytest tests/test_rest_api.py -v
+```
+
+**Test Configuration:**
+- Tests use a mock MCP server on port 3002
+- pytest.ini is configured for asyncio auto mode
+- All 24 tests must pass before committing
+
 ### Docker
 
 Build the image:
@@ -92,11 +113,14 @@ docker run -p 3001:3001 --env-file .env github-knowledge-vault-backend
 - Conversation management (in-memory storage)
 - Application lifespan management (startup/shutdown)
 - CORS middleware configuration
+- Health check with service status
 
 **2. LLM Layer** (`app/llm.py`)
-- Claude API integration with streaming support
-- Tool definitions for documentation operations
+- **litellm** integration for multi-provider support
+- Streaming chat with OpenAI-compatible API
+- Tool definitions in OpenAI format (required by litellm)
 - Automatic tool execution loop
+- Event translation layer (maintains WebSocket compatibility)
 - System prompt builder with repository context
 - Singleton `llm_client` instance
 
@@ -110,7 +134,7 @@ docker run -p 3001:3001 --env-file .env github-knowledge-vault-backend
 
 **REST Endpoints**
 - `GET /` - Root endpoint
-- `GET /health` - Health check with service status
+- `GET /health` - Health check with service status (provider, model, MCP)
 - `GET /config` - Current configuration (excluding secrets)
 - `GET /api/repos` - List all repositories
 - `GET /api/repos/{name}/tree` - Get repository file tree
@@ -119,7 +143,7 @@ docker run -p 3001:3001 --env-file .env github-knowledge-vault-backend
 - `GET /api/conversations/{id}/messages` - Get conversation history
 
 **WebSocket Endpoint**
-- `WS /ws/chat/{conversation_id}` - Real-time chat with Claude
+- `WS /ws/chat/{conversation_id}` - Real-time chat with streaming LLM
 
 ### Configuration Pattern
 The application uses Pydantic Settings (`app/config.py`) for environment-based configuration:
@@ -131,10 +155,11 @@ The application uses Pydantic Settings (`app/config.py`) for environment-based c
 ### Key Dependencies
 - **FastAPI**: Web framework
 - **Uvicorn**: ASGI server
-- **Anthropic SDK**: Claude API integration with streaming
+- **litellm**: Multi-provider LLM library (OpenRouter, Anthropic, OpenAI, 100+ models)
 - **httpx**: Async HTTP client for MCP server communication
 - **Pydantic Settings**: Configuration management
 - **WebSockets**: Real-time communication support
+- **pytest**: Testing framework with asyncio support
 
 ## Common Patterns
 
@@ -143,18 +168,21 @@ The application uses Pydantic Settings (`app/config.py`) for environment-based c
 from app.config import settings
 
 # Access configuration
-api_key = settings.ANTHROPIC_API_KEY
-model = settings.CLAUDE_MODEL
+api_key = settings.OPENROUTER_API_KEY
+model = settings.MODEL_NAME
+api_base = settings.API_BASE
+max_tokens = settings.MAX_TOKENS
 ```
 
 ### Environment Variables
 All configuration is environment-driven. See `.env.example` for available options:
-- `ANTHROPIC_API_KEY` (required)
-- `CLAUDE_MODEL` (optional, defaults to claude-sonnet-4-20250514)
-- `CLAUDE_MAX_TOKENS` (optional, defaults to 4096)
-- `MCP_SERVER_URL` (optional, defaults to http://mcp-server:3000)
-- `MCP_TIMEOUT` (optional, defaults to 30 seconds)
-- `CORS_ORIGINS` (optional, defaults to localhost development ports)
+- `OPENROUTER_API_KEY` (required) - Get at https://openrouter.ai/keys
+- `MODEL_NAME` (optional) - defaults to `openrouter/meta-llama/llama-3.3-70b-instruct`
+- `API_BASE` (optional) - defaults to `https://openrouter.ai/api/v1`
+- `MAX_TOKENS` (optional) - defaults to 4096
+- `MCP_SERVER_URL` (optional) - defaults to `http://mcp-server:3000`
+- `MCP_TIMEOUT` (optional) - defaults to 30 seconds
+- `CORS_ORIGINS` (optional) - defaults to localhost development ports
 
 ## Module Documentation
 
@@ -164,11 +192,21 @@ Pydantic Settings for environment-based configuration:
 from app.config import settings
 
 # Access any setting
-api_key = settings.ANTHROPIC_API_KEY
-model = settings.CLAUDE_MODEL
+api_key = settings.OPENROUTER_API_KEY
+model = settings.MODEL_NAME
+api_base = settings.API_BASE
 cors = settings.CORS_ORIGINS
 ```
 
+**Settings:**
+- `OPENROUTER_API_KEY`: API key for OpenRouter
+- `MODEL_NAME`: Model identifier (e.g., `openrouter/meta-llama/llama-3.3-70b-instruct`)
+- `API_BASE`: API base URL for litellm
+- `MAX_TOKENS`: Maximum tokens for LLM responses
+- `MCP_SERVER_URL`: URL for MCP server
+- `MCP_TIMEOUT`: Timeout for MCP requests in seconds
+- `CORS_ORIGINS`: List of allowed CORS origins
+
 ### `app/mcp.py` - MCP Client
 HTTP client for MCP server communication:
 ```python
@@ -195,7 +233,7 @@ await mcp_client.disconnect()
 - `list_repo_docs` - List all docs in a repository (requires: `repo`)
 
 ### `app/llm.py` - LLM Client
-Claude API client with streaming and tool execution:
+Multi-provider LLM client using litellm with streaming and tool execution:
 ```python
 from app.llm import llm_client
 
@@ -213,7 +251,7 @@ async for event in llm_client.chat_stream(messages, context):
 ```
 
 **Event Types:**
-- `{"type": "text", "content": "..."}` - Text chunk from Claude
+- `{"type": "text", "content": "..."}` - Text chunk from LLM
 - `{"type": "tool_use_start", "toolId": "...", "name": "...", "input": {...}}` - Tool execution begins
 - `{"type": "tool_result", "toolId": "...", "name": "...", "result": {...}, "duration": 123}` - Tool execution completes
 
@@ -222,12 +260,39 @@ async for event in llm_client.chat_stream(messages, context):
 - When `context.scope == "repo"`, adds repository-specific instructions
 - Automatically built with `build_system_prompt(context)`
 
+**Tool Format:**
+- Tools are defined in OpenAI format (required by litellm)
+- litellm handles conversion between provider formats automatically
+- Tools are executed via MCP client
+
+**Streaming:**
+- Uses litellm's `acompletion` with `stream=True`
+- Handles tool execution loop with continuation
+- Maintains backward-compatible event format for WebSocket clients
+
 ### `app/main.py` - FastAPI Application
 Main application with REST and WebSocket endpoints:
 ```python
 # See API Endpoints section above for available routes
 ```
 
+**Health Check:**
+Returns provider, model, and service status:
+```json
+{
+  "status": "healthy",
+  "version": "1.0.0",
+  "services": {
+    "mcp_server": {"status": "connected"},
+    "llm_api": {
+      "status": "available",
+      "provider": "openrouter",
+      "model": "openrouter/meta-llama/llama-3.3-70b-instruct"
+    }
+  }
+}
+```
+
 ## WebSocket Protocol
 
 ### Connection
@@ -265,7 +330,7 @@ const ws = new WebSocket(`ws://localhost:3001/ws/chat/${conversationId}`);
 ```json
 {
   "type": "tool_use_start",
-  "toolId": "toolu_123",
+  "toolId": "tool_123",
   "name": "list_repositories",
   "input": {}
 }
@@ -275,7 +340,7 @@ const ws = new WebSocket(`ws://localhost:3001/ws/chat/${conversationId}`);
 ```json
 {
   "type": "tool_result",
-  "toolId": "toolu_123",
+  "toolId": "tool_123",
   "name": "list_repositories",
   "result": [...],
   "duration": 234
@@ -296,3 +361,49 @@ const ws = new WebSocket(`ws://localhost:3001/ws/chat/${conversationId}`);
 ```json
 {"type": "pong"}
 ```
+
+## Testing
+
+### Test Structure
+- `tests/test_rest_api.py` - REST endpoint tests (health, repos, conversations)
+- `tests/test_websocket.py` - WebSocket chat tests with mocked LLM
+- `tests/test_integration.py` - End-to-end workflow tests
+- `tests/test_error_scenarios.py` - Error handling and edge cases
+- `tests/conftest.py` - Pytest fixtures and configuration
+- `tests/mock_mcp_server.py` - Mock MCP server for testing
+
+### Running Tests in IDE
+1. Configure pytest as test runner in IDE settings
+2. Mark `tests` directory as Test Sources Root
+3. Set environment variables in run configuration:
+   - `MCP_SERVER_URL=http://localhost:3002`
+   - `OPENROUTER_API_KEY=test-key-12345`
+4. Run/debug individual tests by clicking play button in gutter
+
+### Test Coverage
+All 24 tests must pass:
+- ✅ 4 error scenario tests
+- ✅ 8 integration workflow tests
+- ✅ 10 REST API tests
+- ✅ 6 WebSocket tests
+
+## Migration Notes
+
+### From Anthropic SDK to litellm
+The backend was migrated from direct Anthropic SDK to litellm for:
+- **Multi-provider support**: OpenRouter, Anthropic, OpenAI, 100+ models
+- **Cost reduction**: Access to free models via OpenRouter
+- **Flexibility**: Easy model switching via environment variables
+
+**Key Changes:**
+- `ANTHROPIC_API_KEY` → `OPENROUTER_API_KEY`
+- `CLAUDE_MODEL` → `MODEL_NAME`
+- Tool format: Anthropic format → OpenAI format (required by litellm)
+- Streaming: `anthropic.messages.stream()` → `litellm.acompletion(stream=True)`
+- Event format: Maintained backward compatibility via translation layer
+
+**Benefits:**
+- Free tier models available (Llama 3.3 70B, Qwen Coder, DeepSeek)
+- Can switch between providers without code changes
+- Same WebSocket protocol for frontend compatibility
+- All tests continue to pass
diff --git a/backend/.env.example b/backend/.env.example
@@ -8,5 +8,13 @@ OPENROUTER_API_KEY=sk-or-v1-your_key_here
 MODEL_NAME=openrouter/meta-llama/llama-3.3-70b-instruct
 
 MAX_TOKENS=4096
-MCP_SERVER_URL=http://mcp-server:3000
+
+# MCP Server Configuration
+# For local development: http://localhost:8003/mcp
+# For Docker: http://mcp-server:3000/mcp
+# For Hugging Face: https://your-space.hf.space/mcp
+MCP_SERVER_URL=http://localhost:8003/mcp
 MCP_TIMEOUT=30
+
+# GitHub organization to query via MCP server
+GITHUB_ORG=your-org-name
diff --git a/backend/.gitignore b/backend/.gitignore
@@ -42,3 +42,6 @@ Thumbs.db
 
 # Logs
 *.log
+
+# Claude Code local settings
+.claude/settings.local.json
diff --git a/backend/app/config.py b/backend/app/config.py
@@ -15,9 +15,12 @@ class Settings(BaseSettings):
     API_BASE: str = "https://openrouter.ai/api/v1"
     MAX_TOKENS: int = 4096
 
-    MCP_SERVER_URL: str = "http://mcp-server:3000"
+    MCP_SERVER_URL: str = "https://sperva-github-mcp-server.hf.space"
     MCP_TIMEOUT: int = 30
 
+    # GitHub organization for MCP server
+    GITHUB_ORG: str = "lifter-ai"
+
     CORS_ORIGINS: List[str] = ["http://localhost:5173", "http://localhost:3000"]
-Original file line number
+Diff line change
@@ Expand Up / @@ -42,3 +42,6 @@ Thumbs.db @@
     # Logs
     *.log
+    # Claude Code local settings
+    .claude/settings.local.json