A comprehensive AI research agent with web API that generates detailed reports using multiple LLMs and specialized research tools. Features intelligent agent orchestration, domain-specific research capabilities, and comprehensive data collection from multiple platforms.
- 4-stage pipeline: Query Enhancer → Orchestrator → Specialized Agents → Summarizer
- Multi-LLM setup: Groq (Llama 3.1) for planning, Gemini for report writing, OpenAI/Anthropic support
- Intelligent orchestration: Automatically routes queries to specialized agents based on content
- Concurrent execution: Multiple agents run simultaneously for faster research
- 🎮 Gaming Agent: Video games, gaming industry, esports, Steam data analysis
- ₿ Crypto Agent: Cryptocurrency, blockchain, DeFi, Web3, market analysis
- 📚 Academic Agent: Scholarly research, papers, Google Scholar integration
- 📱 Social Media Agent: Social trends, sentiment analysis, platform insights
- 🔍 General Research: Comprehensive web research and content extraction
- Web Search: Serper API for Google search results
- Content Crawling: Exa API for full article extraction
- Social Media: Twitter, LinkedIn, Reddit scraping
- Video Content: YouTube transcript extraction
- Cryptocurrency: CoinGecko API integration
- Gaming Data: Steam API integration
- Academic: Google Scholar research
- Web Scraping: Bright Data and FireCrawl integration
- FastAPI endpoint:
/research?q=queryreturns markdown reports - Auto file saving: Saves reports to
output/folder with timestamps - Intermediate outputs: Saves agent outputs for debugging
- Token optimization: Prevents context overflow with smart chunking
- Error recovery: Robust error handling and retry mechanisms
- Observability: LangSmith and Langfuse integration for monitoring
- Python 3.13+
- API Keys (see setup section for complete list):
- GROQ_API_KEY - Groq Console (Required)
- SERPER_API_KEY - Serper.dev (Required)
- EXA_API_KEY - Exa.ai (Required)
- GOOGLE_API_KEY - Google AI Studio (Required)
- OPENAI_API_KEY - OpenAI Platform (Optional)
- ANTHROPIC_API_KEY - Anthropic Console (Optional)
- COINGECKO_API_KEY - CoinGecko (Optional)
- BRIGHT_DATA_TOKEN - Bright Data (Optional)
- LANGSMITH_API_KEY - LangSmith (Optional)
- LANGFUSE_PUBLIC_KEY - Langfuse (Optional)
git clone <repo-url>
cd langgraph-agent
uv syncgit clone <repo-url>
cd langgraph-agent
pip install -r requirements.txt# add api dependencies
uv add fastapi uvicorn
# or with pip
pip install fastapi uvicornCreate a .env file with your API keys. Here's the complete list of supported APIs:
# Required API Keys
GROQ_API_KEY="your_groq_api_key_here"
SERPER_API_KEY="your_serper_api_key_here"
EXA_API_KEY="your_exa_api_key_here"
GOOGLE_API_KEY="your_google_api_key_here"
# Optional LLM Providers
OPENAI_API_KEY="your_openai_api_key_here"
ANTHROPIC_API_KEY="your_anthropic_api_key_here"
# Optional Data Sources
COINGECKO_API_KEY="your_coingecko_api_key_here"
AVES_API_KEY="your_aves_api_key_here"
BRIGHT_DATA_TOKEN="your_bright_data_token_here"
BRIGHTDATA_API_KEY="your_brightdata_api_key_here"
FIRECRAWL_API_KEY="your_firecrawl_api_key_here"
# Optional Observability
LANGSMITH_TRACING="true"
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGSMITH_API_KEY="your_langsmith_api_key_here"
LANGSMITH_PROJECT="your_langsmith_project_name_here"
LANGFUSE_PUBLIC_KEY="your_langfuse_public_key_here"
LANGFUSE_SECRET_KEY="your_langfuse_secret_key_here"
LANGFUSE_HOST="https://cloud.langfuse.com"-
Required Keys (must have for basic functionality):
- Get Groq API key from console.groq.com
- Get Serper API key from serper.dev
- Get Exa API key from exa.ai
- Get Google API key from aistudio.google.com
-
Optional Keys (enhance functionality):
- Cryptocurrency data: CoinGecko API
- Social media scraping: Bright Data
- Web crawling: FireCrawl
- Monitoring: LangSmith, Langfuse
python api.pyThen visit: http://localhost:8000/research?q=your-query
API Endpoints:
GET /- Health checkGET /research?q=query- Generate markdown research reportGET /docs- Interactive API documentation
python main.pyThe CLI will prompt for your Groq API key if not set in environment variables.
from main import graph_builder
# Initialize the research agent
agent = graph_builder()
# Run research query
result = agent.invoke({"user_input": "ai trends 2024"})
print(result["report_markdown"])Gaming Research:
- "The Witcher 3 game review"
- "Steam sales analysis 2024"
- "Esports industry trends"
Cryptocurrency Research:
- "Bitcoin price analysis"
- "DeFi protocols comparison"
- "NFT market trends"
Academic Research:
- "Machine learning research papers"
- "Climate change studies"
- "Quantum computing developments"
Social Media Research:
- "Twitter trending topics"
- "Instagram influencer marketing"
- "TikTok viral content analysis"
General Research:
- "AI trends 2024"
- "Electric vehicle market"
- "Remote work statistics"
langgraph-agent/
├── agents/ # Research pipeline agents
│ ├── __init__.py # Agent module initialization
│ ├── orchestrator_agent.py # Intelligent agent routing & supervision
│ ├── query_enhancer.py # Query improvement & enhancement
│ ├── planner.py # Search planning & URL selection
│ ├── summarizer.py # Report generation & synthesis
│ ├── scraper_agent.py # Content extraction & processing
│ ├── gaming_agent.py # Gaming research & Steam data
│ ├── crypto_agent.py # Cryptocurrency & blockchain research
│ ├── academic_agent.py # Academic research & Google Scholar
│ └── social_media_agent.py # Social media trends & sentiment
├── tools/ # Data collection & API tools
│ ├── __init__.py # Tools module initialization
│ ├── serper_search.py # Google search via Serper API
│ ├── exa_search.py # Content crawling via Exa API
│ ├── coingecko_search.py # Cryptocurrency data & market info
│ ├── steam_api.py # Gaming data & Steam integration
│ ├── google_scholar.py # Academic research & citations
│ ├── reddit_scraper.py # Reddit content & discussions
│ ├── twitter_scraper.py # Twitter/X scraping & analysis
│ ├── linkedin_scraper.py # LinkedIn posts & professional content
│ ├── youtube_transcript.py # YouTube video transcripts
│ ├── web_crawler.py # Web crawling via FireCrawl
│ ├── web_scraper_api.py # Social media API integration
│ ├── query_enhancer.py # Query enhancement via AVES API
│ └── llama_feed.py # Web3 news & DeFi data
├── utils/ # Shared utilities & helpers
│ ├── __init__.py # Utils module initialization
│ ├── llm.py # LLM clients (Groq, Gemini, OpenAI, Anthropic)
│ ├── output_manager.py # File saving & output management
│ ├── prompts.py # Agent prompts & system messages
│ └── tool_wrappers.py # Tool utilities & decorators
├── output/ # Generated reports & outputs
│ ├── intermediate/ # Agent outputs & debugging files
│ │ ├── research_*/ # Research run directories
│ │ ├── *_agent_*.md # Agent output files
│ │ └── get_*_*.md # Tool output files
│ └── *.md # Final research reports
├── main.py # CLI workflow & entry point
├── api.py # FastAPI web server & endpoints
├── pyproject.toml # Project dependencies & metadata
├── requirements.txt # Pip dependencies fallback
├── uv.lock # UV lock file for dependency resolution
├── .python-version # Python version specification
├── .gitignore # Git ignore patterns
└── README.md # Project documentation
GET /- Health check and statusGET /research?q=query- Generate comprehensive markdown research reportGET /docs- Interactive API documentation (Swagger UI)
- Query Enhancer (Gemini) - Improves user query, generates research questions and context
- Orchestrator (Groq) - Analyzes query and routes to appropriate specialized agents
- Specialized Agents (Concurrent) - Domain-specific research using relevant tools:
- Gaming Agent: Steam API, gaming websites, Reddit gaming communities
- Crypto Agent: CoinGecko API, DeFi data, crypto news sources
- Academic Agent: Google Scholar, academic databases, research papers
- Social Media Agent: Twitter, LinkedIn, Reddit, sentiment analysis
- General Agent: Web search, content extraction, comprehensive analysis
- Scraper Agent - Extracts and processes content from selected URLs
- Summarizer (Gemini) - Synthesizes all collected data into comprehensive report
The orchestrator uses intelligent routing based on:
- Keyword matching: Identifies domain-specific terms
- Content analysis: Determines research intent
- Multi-agent coordination: Combines results from multiple agents when needed
- Confidence scoring: Selects agents with highest relevance
- Concurrent execution: Multiple agents run simultaneously
- Error recovery: Automatic retries and fallback mechanisms
- Content validation: Ensures quality and relevance of collected data
- Source tracking: Maintains attribution and citation information
- Smart chunking: Groq processes lightweight search results (not full articles)
- External crawling: Exa crawling happens outside LLM context
- Content distribution: Gemini handles large content for final reports
- Context overflow prevention: Prevents 413 token limit errors
- Concurrent processing: Multiple agents run simultaneously
- Timeout management: Configurable timeouts for each agent
- Retry mechanisms: Automatic retries with exponential backoff
- Resource monitoring: Tracks execution time and success rates
- Intermediate saves: Each agent output is saved for debugging
- Run tracking: Unique run IDs for each research session
- Metadata storage: Execution metrics and timing information
- Error logging: Comprehensive error tracking and reporting
The research agent works on any Python hosting platform:
Cloud Platforms:
- Railway - Easy deployment with automatic scaling
- Render - Free tier available, automatic deployments
- Heroku - Traditional Python hosting
- Vercel - Serverless functions with edge deployment
- Netlify - Functions for serverless execution
Cloud Providers:
- AWS Lambda - Serverless with high scalability
- Google Cloud Run - Containerized deployment
- Azure Functions - Microsoft's serverless platform
- DigitalOcean App Platform - Simple container deployment
Local Development:
- Docker - Containerized deployment
- Local with ngrok - Expose local server for testing
# Environment variables for production
export GROQ_API_KEY="your_production_key"
export SERPER_API_KEY="your_production_key"
export EXA_API_KEY="your_production_key"
export GOOGLE_API_KEY="your_production_key"
# Optional: Enable monitoring
export LANGSMITH_TRACING="true"
export LANGSMITH_API_KEY="your_monitoring_key"FROM python:3.13-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]The agent supports multiple LLM providers. Configure in utils/llm.py:
# Available models
MODEL_CONFIGS = {
"groq_llama": "llama-3.1-8b-instant",
"groq_mixtral": "mixtral-8x7b-32768",
"gpt4o_mini": "gpt-4o-mini",
"claude_35_sonnet": "claude-3-5-sonnet-20241022",
"gemini_flash": "gemini-2.0-flash"
}Each specialized agent can be customized:
- Temperature settings: Adjust creativity vs consistency
- Tool selection: Enable/disable specific data sources
- Prompt engineering: Modify agent behavior and output format
- Timeout configuration: Set execution time limits
- Report format: Customize markdown structure
- Content filtering: Set relevance thresholds
- Source inclusion: Configure citation format
- File naming: Customize output file patterns
Enable tracing and monitoring:
export LANGSMITH_TRACING="true"
export LANGSMITH_API_KEY="your_key"
export LANGSMITH_PROJECT="research_agent"Track LLM performance and costs:
export LANGFUSE_PUBLIC_KEY="your_key"
export LANGFUSE_SECRET_KEY="your_key"
export LANGFUSE_HOST="https://cloud.langfuse.com"The agent provides comprehensive logging:
- Execution traces: Step-by-step agent execution
- Error tracking: Detailed error messages and stack traces
- Performance metrics: Execution time and success rates
- API usage: Track external API calls and costs
- Clone the repository
- Install dependencies:
uv sync - Set up environment variables
- Run tests:
python -m pytest - Start development server:
python api.py
- Create agent file in
agents/directory - Implement required interface methods
- Add agent to orchestrator routing logic
- Update documentation and examples
- Create tool file in
tools/directory - Implement tool interface with proper error handling
- Add tool to relevant agent configurations
- Update API key documentation
This project is licensed under the MIT License - see the LICENSE file for details.
- LangGraph - For the powerful agent orchestration framework
- Groq - For ultra-fast LLM inference
- Google Gemini - For high-quality text generation
- Serper - For reliable web search capabilities
- Exa - For advanced content extraction