A production-ready Telegram bot that acts as a board game rules referee, using OpenAI's Agents SDK with a multi-stage conversational pipeline to search through PDF rulebooks and answer questions in multiple languages.
- Multi-Stage Conversational Pipeline: Interactive game selection and clarification flow
- Intelligent Rules Search: Uses OpenAI Agents SDK to understand and answer rules questions
- Schema-Guided Reasoning (SGR): Transparent reasoning chains with confidence scores
- Streaming Progress Updates: Fun, thematic progress messages during searches
- Fast PDF Search: Leverages
ugrepandpdftotextfor efficient text extraction - Multilingual Support: Handles queries in English, Russian, and other languages
- Smart Filename Translation: Automatically translates localized game names to English filenames
- Per-User Sessions: Isolated SQLite conversation history for each user
- Game Context Memory: Remembers current game across conversation turns
- Observability: Optional Langfuse integration via OpenTelemetry for agent tracing and monitoring
- Fuzzy Game Search:
/gamescommand with smart search and suggestions - Context-Aware Responses: Answers formatted with sources and related questions
- Progress Indicators: Real-time updates with creative status messages
-
Clone the repository:
git clone https://github.com/RomanShnurov/RulesLawyerBot.git cd RulesLawyerBot -
Set up environment variables:
cp .env.example .env # Edit .env with your actual tokens -
Build and run:
just build # or: make build just up # or: make up
-
View logs:
just logs # or: make logs
See docs/DOCKER_SETUP.md for detailed Docker documentation.
-
Install dependencies:
just install # or: make install # or directly: uv sync
-
Set up environment variables:
cp .env.example .env # Edit .env with your tokens -
Run the bot:
python -m src.rules_lawyer_bot.main
TELEGRAM_TOKEN: Your Telegram bot token (from @BotFather)OPENAI_API_KEY: Your OpenAI API key
OPENAI_BASE_URL: OpenAI API endpoint (default:https://api.openai.com/v1)OPENAI_MODEL: Model to use (default:gpt-4o-mini)PDF_STORAGE_PATH: PDF storage directory (default:./rules_pdfs)DATA_PATH: Data directory (default:./data)LOG_LEVEL: Logging level (default:INFO)MAX_REQUESTS_PER_MINUTE: Rate limiting (default:10)MAX_CONCURRENT_SEARCHES: Concurrent search limit (default:4)ADMIN_USER_IDS: Comma-separated list of admin Telegram user IDsLANGFUSE_PUBLIC_KEY: Langfuse public API key for observability (optional, leave empty to disable)LANGFUSE_SECRET_KEY: Langfuse secret API key for observability (optional)LANGFUSE_BASE_URL: Langfuse API endpoint (default:https://cloud.langfuse.com)ENABLE_TRACING: Enable OpenTelemetry tracing to Langfuse (default:false)LANGFUSE_ENVIRONMENT: Environment name for Langfuse traces (default:production)BGG_API_TOKEN: BoardGameGeek API token for auto-generating game metadata (optional, see BGG API Setup)
/start- Show welcome message and usage instructions/games- List all available games or search with/games <query>
/games # List all games
/games wingspan # Search for "Wingspan"
/games gloomy # Fuzzy search finds "Gloomhaven"
Automatically generate rules_pdfs/games_index.json from your PDF collection using BoardGameGeek API:
uv run python scripts/generate_games_index.pyThis script:
- Scans all PDF files in
rules_pdfs/directory - Fetches game metadata from BoardGameGeek (Russian names, categories, mechanics)
- Generates/updates the games index used by the bot
Requirements:
- BGG API token (see BGG API Setup)
- Add
BGG_API_TOKENto your.envfile
Features:
- Incremental updates (only queries new games)
- Automatic Russian name detection
- Fallback for games not found in BGG
See docs/BGG_API_SETUP.md for detailed setup instructions.
The project supports both just (recommended) and make command runners.
just # Show available commands
just install # Install dependencies with uv
just build # Build Docker image
just up # Start bot in Docker
just down # Stop bot
just logs # View logs
just restart # Restart bot (down + up)
just test # Run tests
just lint # Run ruff linter
just format # Format code with ruff
just clean # Remove cache files
just setup # Setup project from scratch
just validate # Validate Docker setupmake help # Show available commands
make install # Install dependencies with uv
make build # Build Docker image
make up # Start bot in Docker
make down # Stop bot
make logs # View logs
make test # Run tests
make lint # Run ruff linter
make format # Format code with ruff
make clean # Remove cache filesThe bot uses a conversational pipeline that adapts based on user input:
- User sends a message to the bot in any supported language
- Rate limiting checks if user hasn't exceeded request limits
- Game Identification - Determine which game to search:
- Uses session context if game was discussed before
- Shows inline keyboard buttons if multiple games match
- Asks clarification if game name is unclear
- Agent processes the query using OpenAI's Agents SDK with streaming:
search_filenames(query): Find PDF files by game namesearch_inside_file_ugrep(filename, keywords): Fast regex search inside PDFsread_full_document(filename): Fallback PDF reader (pypdf)- Progress updates shown with fun thematic messages
- Clarification (if needed) - Agent asks follow-up questions for complex queries
- Response generation - Structured answer with:
- Direct answer with quotes from rulebooks
- Sources and page references
- Confidence indicator (if < 80%)
- Related questions suggestions
- Full reasoning chain for admins
- Session persistence - Conversation and game context saved to user's SQLite database
- CLARIFICATION_NEEDED: Bot asks text question to clarify ambiguity
- GAME_SELECTION: Bot shows inline keyboard for game selection
- SEARCH_IN_PROGRESS: Bot reports search progress + asks question
- FINAL_ANSWER: Bot sends complete answer with reasoning
Multi-Stage Pipeline
- Conversational flow with game identification, clarification, search
- Inline keyboard buttons for interactive selection
- Context-aware responses using session history
- Structured outputs with
ActionTypediscriminator
Streaming & Progress
- Real-time progress updates during agent execution
- Fun, thematic status messages (fantasy RPG theme)
- Debounced message updates to avoid spam
- Progress message deleted after final response
Async-First Design
- All blocking operations wrapped in
asyncio.to_thread() - Non-blocking Telegram event loop
- Concurrent request handling
- Streaming agent execution with event processing
Per-User Isolation
- Separate SQLite session database for each user
- Prevents database locks
- Conversation history maintained per user
- Per-user state tracking for UI flow (game context, pending questions)
Schema-Guided Reasoning (SGR)
- Structured outputs with complete reasoning chains
- Confidence scores and source references
- Transparent decision-making process
- Verbose mode for admins shows full reasoning
Resource Management
- Semaphore limits concurrent ugrep processes (default: 4)
- In-memory rate limiting per user (default: 10 req/min)
- Automatic output truncation to prevent token overflow
Error Handling
- User-friendly error messages
- Detailed logging for debugging
- Graceful degradation on tool failures
list_directory_tree()
- Lists all available PDF files with structure
- Used for game identification stage
search_filenames(query)
- Searches for PDF files in
rules_pdfs/directory - Case-insensitive filename matching
- Limits results to 50 files to prevent token overflow
- Returns
GameCandidateobjects with confidence scores
search_inside_file_ugrep(filename, keywords)
- Fast regex search using
ugrepCLI - PDF text extraction via
pdftotextfilter - 2 lines of context around matches
- 30-second timeout protection
- Truncates output at 10,000 characters
- Semaphore-controlled for resource management
read_full_document(filename)
- Fallback PDF reader using
pypdflibrary - Extracts all pages with page markers
- Truncates at 100,000 characters
- Used when ugrep is unavailable or fails
Morphology Handling
- Uses regex patterns with word roots and synonyms
- Handles complex word endings automatically
- Example: For "movement", agent uses pattern
перемещ|движен|ход|бег - This matches: перемещение, переместить, движение, передвижение, ход, etc.
Filename Translation
- PDFs named using original English game titles
- Agent translates localized names before searching
- Example translations stored in agent instructions:
- "Схватка в стиле фэнтези" → "Super Fantasy Brawl"
- "Время приключений" → "Time of Adventure"
- "Ужас Аркхэма" → "Arkham Horror"
- Python 3.13: Modern Python with latest features
- OpenAI Agents SDK: Agent framework with tool support
- python-telegram-bot: Async Telegram integration
- uv: Fast Python package manager
- ugrep: High-performance regex search (10x faster than grep)
- poppler-utils: PDF text extraction (
pdftotext) - pypdf: Pure Python PDF reader (fallback)
- Docker: Multi-stage containerization
- SQLite: Conversation persistence (per-user databases)
- pydantic-settings: Type-safe configuration
- Logfire: Pydantic's observability platform with OpenTelemetry
- Langfuse: LLM application monitoring via OpenTelemetry OTLP
- pytest: Testing framework with async support
- ruff: Fast Python linter and formatter
- mypy: Static type checking
- just/make: Command runners for development tasks
-
Add PDF file to
rules_pdfs/using English name:rules_pdfs/Wingspan.pdf -
Add entry to games index (
rules_pdfs/games_index.json):{ "english_name": "Wingspan", "russian_names": ["Крылья", "Вингспан"], "pdf_files": ["Wingspan.pdf"], "tags": ["engine-building", "птицы"] } -
Done! Users can now search in Russian or English:
- "Как играть в Крылья?" → finds Wingspan.pdf
- "How to play Wingspan?" → finds Wingspan.pdf
The rules_pdfs/games_index.json file enables accurate Russian ↔ English game name matching for 100+ games.
Benefits:
- ✅ Reliable bilingual search (no LLM guessing)
- ✅ Multiple name variants per game (official, transliteration, slang)
- ✅ Support for games with multiple PDFs (expansions, FAQ)
- ✅ Fast lookup without token usage
See docs/GAMES_INDEX.md for complete documentation.
Use the original English game name for PDF files:
- ✅
Wingspan.pdf - ✅
Super Fantasy Brawl.pdf - ✅
Gloomhaven - Forgotten Circles.pdf - ❌
Крылья.pdf(Russian name - use games_index.json instead)
The project includes comprehensive testing:
just test # All tests with pytest
# or
pytest -v # Verbose output
pytest tests/test_tools.py # Specific test file- Unit Tests (
tests/test_tools.py): Test individual agent tools - Integration Tests (
tests/test_integration.py): End-to-end workflow testing - Load Tests (
tests/load_test.py): Performance and concurrency testing
The tests/conftest.py provides shared fixtures:
- Temporary PDF files for testing
- Mock environment variables
- Isolated test directories
-
Set up environment:
cp .env.example .env # Edit .env with production tokens -
Build image:
just build
-
Start bot:
just up
-
Monitor logs:
just logs
See docs/DOCKER_SETUP.md for advanced deployment configurations.
Complete documentation is available in the docs/ directory:
- docs/INDEX.md - Documentation overview and navigation
- docs/QUICKSTART.md - Step-by-step getting started guide
- docs/DOCKER_SETUP.md - Docker deployment and troubleshooting
- docs/SGR_ARCHITECTURE.md - Schema-Guided Reasoning implementation guide
- docs/GAMES_INDEX.md - Multilingual games index setup and usage
- docs/BGG_API_SETUP.md - BoardGameGeek API setup for automatic game metadata
- Check
.envfile exists with validTELEGRAM_TOKENandOPENAI_API_KEY - Verify Docker is running:
docker ps - Check logs:
just logs
- Install ugrep:
apt-get install ugrep(Linux) orbrew install ugrep(macOS) - Or use Docker deployment (ugrep pre-installed)
- Ensure each user has isolated session DB (implemented in current version)
- Check
data/sessions/directory permissions
- Adjust
MAX_REQUESTS_PER_MINUTEin.env - Increase for trusted users or decrease for public bots
This project is licensed under the MIT License - see the LICENSE file for details.
For issues and questions, please open an issue on GitHub.
Made with ♥ for board game enthusiasts
