LangGraph Research Agent

A comprehensive AI research agent with web API that generates detailed reports using multiple LLMs and specialized research tools. Features intelligent agent orchestration, domain-specific research capabilities, and comprehensive data collection from multiple platforms.

🚀 Features

Core Research Pipeline

4-stage pipeline: Query Enhancer → Orchestrator → Specialized Agents → Summarizer
Multi-LLM setup: Groq (Llama 3.1) for planning, Gemini for report writing, OpenAI/Anthropic support
Intelligent orchestration: Automatically routes queries to specialized agents based on content
Concurrent execution: Multiple agents run simultaneously for faster research

Specialized Research Agents

🎮 Gaming Agent: Video games, gaming industry, esports, Steam data analysis
₿ Crypto Agent: Cryptocurrency, blockchain, DeFi, Web3, market analysis
📚 Academic Agent: Scholarly research, papers, Google Scholar integration
📱 Social Media Agent: Social trends, sentiment analysis, platform insights
🔍 General Research: Comprehensive web research and content extraction

Data Collection Tools

Web Search: Serper API for Google search results
Content Crawling: Exa API for full article extraction
Social Media: Twitter, LinkedIn, Reddit scraping
Video Content: YouTube transcript extraction
Cryptocurrency: CoinGecko API integration
Gaming Data: Steam API integration
Academic: Google Scholar research
Web Scraping: Bright Data and FireCrawl integration

Output & Management

FastAPI endpoint: /research?q=query returns markdown reports
Auto file saving: Saves reports to output/ folder with timestamps
Intermediate outputs: Saves agent outputs for debugging
Token optimization: Prevents context overflow with smart chunking
Error recovery: Robust error handling and retry mechanisms
Observability: LangSmith and Langfuse integration for monitoring

📋 Requirements

Python 3.13+
API Keys (see setup section for complete list):
- GROQ_API_KEY - Groq Console (Required)
- SERPER_API_KEY - Serper.dev (Required)
- EXA_API_KEY - Exa.ai (Required)
- GOOGLE_API_KEY - Google AI Studio (Required)
- OPENAI_API_KEY - OpenAI Platform (Optional)
- ANTHROPIC_API_KEY - Anthropic Console (Optional)
- COINGECKO_API_KEY - CoinGecko (Optional)
- BRIGHT_DATA_TOKEN - Bright Data (Optional)
- LANGSMITH_API_KEY - LangSmith (Optional)
- LANGFUSE_PUBLIC_KEY - Langfuse (Optional)

🛠️ installation

using uv (recommended)

git clone <repo-url>
cd langgraph-agent
uv sync

using pip

git clone <repo-url>
cd langgraph-agent
pip install -r requirements.txt

for api usage

# add api dependencies
uv add fastapi uvicorn
# or with pip
pip install fastapi uvicorn

⚙️ Setup

Environment Configuration

Create a .env file with your API keys. Here's the complete list of supported APIs:

# Required API Keys
GROQ_API_KEY="your_groq_api_key_here"
SERPER_API_KEY="your_serper_api_key_here"
EXA_API_KEY="your_exa_api_key_here"
GOOGLE_API_KEY="your_google_api_key_here"

# Optional LLM Providers
OPENAI_API_KEY="your_openai_api_key_here"
ANTHROPIC_API_KEY="your_anthropic_api_key_here"

# Optional Data Sources
COINGECKO_API_KEY="your_coingecko_api_key_here"
AVES_API_KEY="your_aves_api_key_here"
BRIGHT_DATA_TOKEN="your_bright_data_token_here"
BRIGHTDATA_API_KEY="your_brightdata_api_key_here"
FIRECRAWL_API_KEY="your_firecrawl_api_key_here"

# Optional Observability
LANGSMITH_TRACING="true"
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGSMITH_API_KEY="your_langsmith_api_key_here"
LANGSMITH_PROJECT="your_langsmith_project_name_here"
LANGFUSE_PUBLIC_KEY="your_langfuse_public_key_here"
LANGFUSE_SECRET_KEY="your_langfuse_secret_key_here"
LANGFUSE_HOST="https://cloud.langfuse.com"

API Key Setup Guide

Required Keys (must have for basic functionality):
- Get Groq API key from console.groq.com
- Get Serper API key from serper.dev
- Get Exa API key from exa.ai
- Get Google API key from aistudio.google.com
Optional Keys (enhance functionality):
- Cryptocurrency data: CoinGecko API
- Social media scraping: Bright Data
- Web crawling: FireCrawl
- Monitoring: LangSmith, Langfuse

🚀 Usage

API Mode (Recommended)

python api.py

Then visit: http://localhost:8000/research?q=your-query

API Endpoints:

GET / - Health check
GET /research?q=query - Generate markdown research report
GET /docs - Interactive API documentation

Command Line Mode

python main.py

The CLI will prompt for your Groq API key if not set in environment variables.

Programmatic Usage

from main import graph_builder

# Initialize the research agent
agent = graph_builder()

# Run research query
result = agent.invoke({"user_input": "ai trends 2024"})
print(result["report_markdown"])

Example Queries by Domain

Gaming Research:

"The Witcher 3 game review"
"Steam sales analysis 2024"
"Esports industry trends"

Cryptocurrency Research:

"Bitcoin price analysis"
"DeFi protocols comparison"
"NFT market trends"

Academic Research:

"Machine learning research papers"
"Climate change studies"
"Quantum computing developments"

Social Media Research:

"Twitter trending topics"
"Instagram influencer marketing"
"TikTok viral content analysis"

General Research:

"AI trends 2024"
"Electric vehicle market"
"Remote work statistics"

🏗️ Architecture

langgraph-agent/
├── agents/                    # Research pipeline agents
│   ├── __init__.py           # Agent module initialization
│   ├── orchestrator_agent.py  # Intelligent agent routing & supervision
│   ├── query_enhancer.py      # Query improvement & enhancement
│   ├── planner.py             # Search planning & URL selection
│   ├── summarizer.py          # Report generation & synthesis
│   ├── scraper_agent.py       # Content extraction & processing
│   ├── gaming_agent.py        # Gaming research & Steam data
│   ├── crypto_agent.py        # Cryptocurrency & blockchain research
│   ├── academic_agent.py      # Academic research & Google Scholar
│   └── social_media_agent.py  # Social media trends & sentiment
├── tools/                     # Data collection & API tools
│   ├── __init__.py           # Tools module initialization
│   ├── serper_search.py       # Google search via Serper API
│   ├── exa_search.py          # Content crawling via Exa API
│   ├── coingecko_search.py    # Cryptocurrency data & market info
│   ├── steam_api.py           # Gaming data & Steam integration
│   ├── google_scholar.py      # Academic research & citations
│   ├── reddit_scraper.py      # Reddit content & discussions
│   ├── twitter_scraper.py     # Twitter/X scraping & analysis
│   ├── linkedin_scraper.py    # LinkedIn posts & professional content
│   ├── youtube_transcript.py  # YouTube video transcripts
│   ├── web_crawler.py         # Web crawling via FireCrawl
│   ├── web_scraper_api.py     # Social media API integration
│   ├── query_enhancer.py      # Query enhancement via AVES API
│   └── llama_feed.py          # Web3 news & DeFi data
├── utils/                     # Shared utilities & helpers
│   ├── __init__.py           # Utils module initialization
│   ├── llm.py                # LLM clients (Groq, Gemini, OpenAI, Anthropic)
│   ├── output_manager.py     # File saving & output management
│   ├── prompts.py            # Agent prompts & system messages
│   └── tool_wrappers.py      # Tool utilities & decorators
├── output/                   # Generated reports & outputs
│   ├── intermediate/         # Agent outputs & debugging files
│   │   ├── research_*/       # Research run directories
│   │   ├── *_agent_*.md      # Agent output files
│   │   └── get_*_*.md        # Tool output files
│   └── *.md                 # Final research reports
├── main.py                   # CLI workflow & entry point
├── api.py                    # FastAPI web server & endpoints
├── pyproject.toml           # Project dependencies & metadata
├── requirements.txt         # Pip dependencies fallback
├── uv.lock                  # UV lock file for dependency resolution
├── .python-version          # Python version specification
├── .gitignore               # Git ignore patterns
└── README.md                # Project documentation

🌐 API Endpoints

GET / - Health check and status
GET /research?q=query - Generate comprehensive markdown research report
GET /docs - Interactive API documentation (Swagger UI)

🔧 How It Works

Research Pipeline Flow

Query Enhancer (Gemini) - Improves user query, generates research questions and context
Orchestrator (Groq) - Analyzes query and routes to appropriate specialized agents
Specialized Agents (Concurrent) - Domain-specific research using relevant tools:
- Gaming Agent: Steam API, gaming websites, Reddit gaming communities
- Crypto Agent: CoinGecko API, DeFi data, crypto news sources
- Academic Agent: Google Scholar, academic databases, research papers
- Social Media Agent: Twitter, LinkedIn, Reddit, sentiment analysis
- General Agent: Web search, content extraction, comprehensive analysis
Scraper Agent - Extracts and processes content from selected URLs
Summarizer (Gemini) - Synthesizes all collected data into comprehensive report

Agent Selection Logic

The orchestrator uses intelligent routing based on:

Keyword matching: Identifies domain-specific terms
Content analysis: Determines research intent
Multi-agent coordination: Combines results from multiple agents when needed
Confidence scoring: Selects agents with highest relevance

Data Collection Strategy

Concurrent execution: Multiple agents run simultaneously
Error recovery: Automatic retries and fallback mechanisms
Content validation: Ensures quality and relevance of collected data
Source tracking: Maintains attribution and citation information

📊 Performance & Optimization

Token Management

Smart chunking: Groq processes lightweight search results (not full articles)
External crawling: Exa crawling happens outside LLM context
Content distribution: Gemini handles large content for final reports
Context overflow prevention: Prevents 413 token limit errors

Execution Optimization

Concurrent processing: Multiple agents run simultaneously
Timeout management: Configurable timeouts for each agent
Retry mechanisms: Automatic retries with exponential backoff
Resource monitoring: Tracks execution time and success rates

Output Management

Intermediate saves: Each agent output is saved for debugging
Run tracking: Unique run IDs for each research session
Metadata storage: Execution metrics and timing information
Error logging: Comprehensive error tracking and reporting

🚀 Deployment

Supported Platforms

The research agent works on any Python hosting platform:

Cloud Platforms:

Railway - Easy deployment with automatic scaling
Render - Free tier available, automatic deployments
Heroku - Traditional Python hosting
Vercel - Serverless functions with edge deployment
Netlify - Functions for serverless execution

Cloud Providers:

AWS Lambda - Serverless with high scalability
Google Cloud Run - Containerized deployment
Azure Functions - Microsoft's serverless platform
DigitalOcean App Platform - Simple container deployment

Local Development:

Docker - Containerized deployment
Local with ngrok - Expose local server for testing

Deployment Configuration

# Environment variables for production
export GROQ_API_KEY="your_production_key"
export SERPER_API_KEY="your_production_key"
export EXA_API_KEY="your_production_key"
export GOOGLE_API_KEY="your_production_key"

# Optional: Enable monitoring
export LANGSMITH_TRACING="true"
export LANGSMITH_API_KEY="your_monitoring_key"

Docker Deployment

FROM python:3.13-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 8000

CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]

🔧 Configuration & Customization

Model Configuration

The agent supports multiple LLM providers. Configure in utils/llm.py:

# Available models
MODEL_CONFIGS = {
    "groq_llama": "llama-3.1-8b-instant",
    "groq_mixtral": "mixtral-8x7b-32768", 
    "gpt4o_mini": "gpt-4o-mini",
    "claude_35_sonnet": "claude-3-5-sonnet-20241022",
    "gemini_flash": "gemini-2.0-flash"
}

Agent Customization

Each specialized agent can be customized:

Temperature settings: Adjust creativity vs consistency
Tool selection: Enable/disable specific data sources
Prompt engineering: Modify agent behavior and output format
Timeout configuration: Set execution time limits

Output Customization

Report format: Customize markdown structure
Content filtering: Set relevance thresholds
Source inclusion: Configure citation format
File naming: Customize output file patterns

📈 Monitoring & Observability

LangSmith Integration

Enable tracing and monitoring:

export LANGSMITH_TRACING="true"
export LANGSMITH_API_KEY="your_key"
export LANGSMITH_PROJECT="research_agent"

Langfuse Integration

Track LLM performance and costs:

export LANGFUSE_PUBLIC_KEY="your_key"
export LANGFUSE_SECRET_KEY="your_key"
export LANGFUSE_HOST="https://cloud.langfuse.com"

Logging

The agent provides comprehensive logging:

Execution traces: Step-by-step agent execution
Error tracking: Detailed error messages and stack traces
Performance metrics: Execution time and success rates
API usage: Track external API calls and costs

🤝 Contributing

Development Setup

Clone the repository
Install dependencies: uv sync
Set up environment variables
Run tests: python -m pytest
Start development server: python api.py

Adding New Agents

Create agent file in agents/ directory
Implement required interface methods
Add agent to orchestrator routing logic
Update documentation and examples

Adding New Tools

Create tool file in tools/ directory
Implement tool interface with proper error handling
Add tool to relevant agent configurations
Update API key documentation

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

LangGraph - For the powerful agent orchestration framework
Groq - For ultra-fast LLM inference
Google Gemini - For high-quality text generation
Serper - For reliable web search capabilities
Exa - For advanced content extraction

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
agents		agents
tools		tools
utils		utils
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
api.py		api.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

sachinarora707/deep-research-agent

Folders and files

Latest commit

History

Repository files navigation