AI-powered production alert analysis system using n8n, MCP servers, and OpenAI GPT-4o-mini.
This project provides an intelligent alert analysis workflow that:
- π Monitors Slack channels for production alerts
- π€ Analyzes alerts using AI with context from GitHub code and Kibana logs
- π Provides root cause analysis and actionable recommendations
- π Integrates multiple data sources via MCP (Model Context Protocol) servers
βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β Slack βββββββΆβ n8n βββββββΆβ OpenAI β
β Alerts β β Workflow β β GPT-4o-mini β
βββββββββββββββ ββββββββ¬ββββββββ βββββββββββββββββββ
β
βββββββββββββ΄ββββββββββββ
β β
ββββββββΌβββββββ βββββββΌβββββββ
β GitHub MCP β β Kibana MCP β
β (Port 3000)β β (Port 3001)β
βββββββββββββββ ββββββββββββββ
Location: n8n-workflow/
- Production Alert Analyzer - Enhanced workflow with AI-powered analysis
- Slack Integration - Monitors alerts and posts responses
- AI Agent - Routes by severity (CRITICAL vs HIGH/MEDIUM)
- Code Context - Optionally includes relevant code snippets
Features:
- β Alert parsing and structured data extraction
- β Severity-based routing (different analysis depth)
- β Immediate acknowledgment responses
- β Threaded Slack replies
- β Metrics logging
π Documentation: See n8n-workflow/WORKFLOW_GUIDE.md
Location: dockers/
Multi-MCP server setup with HTTP bridges for n8n integration:
- Code search across repositories
- File contents retrieval
- Commit history and PR management
- 40+ GitHub tools available
- Log search and analysis
- Visualization management
- Saved objects access
- Real-time error tracking
π Documentation:
dockers/MULTI_MCP_SETUP.md- Complete setup guidedockers/QUICK_START.md- Quick start guide
Location: sample-app/
Spring Boot Order Service that simulates realistic production failures:
- Database timeouts and connection pool exhaustion
- Payment gateway failures
- Inventory service unavailability
- High error rates and memory issues
Location: sample-logs/
15 realistic log entries for testing:
- Elasticsearch/Kibana import ready
- Matches alert scenarios
- Includes stack traces with code references
- Structured JSON format
π Documentation: sample-logs/README.md
Location: sample-alerts/
10 pre-formatted Slack alerts covering:
- Database timeouts (CRITICAL)
- Payment gateway failures (HIGH)
- Inventory issues (MEDIUM)
- High error rates (CRITICAL)
- Memory and performance issues
- Docker and Docker Compose
- GitHub Personal Access Token
- Kibana server access (URL, username, password)
- n8n instance (cloud or self-hosted)
- OpenAI API key
cd dockers
# Copy and configure environment
cp .env.multi-mcp.example .env
nano .env # Add your credentials
# Update repository path in multi-mcp-docker-compose.yml
# Edit the volumes section to point to your repo
# Start services
docker compose -f multi-mcp-docker-compose.yml up -d
# Verify
curl http://localhost:3000/health # GitHub MCP
curl http://localhost:3001/health # Kibana MCPcd sample-logs
# Generate bulk import file
node convert-to-bulk.js
# Import to Elasticsearch (replace with your details)
curl -X POST "https://YOUR_KIBANA_URL:9200/_bulk" \
-H "Content-Type: application/x-ndjson" \
-H "Authorization: ApiKey YOUR_API_KEY" \
--data-binary @kibana-bulk-import.ndjson
# Create index pattern in Kibana UI: logs-order-service-*- Open n8n UI
- Go to Workflows β Import from File
- Select:
n8n-workflow/production-alert-analyzer.json - Configure credentials:
- Slack API: Add your Slack workspace credentials
- OpenAI API: Add your OpenAI API key
- Configure MCP clients:
- GitHub MCP:
http://localhost:3000/message(orhttp://host.docker.internal:3000/messageif n8n in Docker) - Kibana MCP:
http://localhost:3001/message(orhttp://host.docker.internal:3001/messageif n8n in Docker)
- GitHub MCP:
- Update channel IDs for your Slack workspace
- Activate workflow
# Option 1: Start sample application
cd sample-app
mvn clean install
mvn spring-boot:run
# Generate traffic
cd ../sample-alerts
./test-workflow.sh
# Option 2: Post sample alert to Slack
# Copy any alert from sample-alerts/sample-slack-alerts.md
# Paste into your #mcp-testing channel- Open Slack and navigate to your configured input channel (e.g.,
#mcp-testing) - Copy an alert from
sample-alerts/sample-slack-alerts.md - Paste the alert into the channel
- Wait 1-2 seconds for acknowledgment
- Check output channel (e.g.,
#n8n-output) for AI analysis (5-10 seconds)
Use the Python script to post Datadog alert configurations to Slack:
cd sample-alerts
# List all available alerts
python post_alerts_to_slack.py --list
# Post specific alerts by index
python post_alerts_to_slack.py --token YOUR_TOKEN --channel #alerts --alerts 1,3,5
# Post a range of alerts
python post_alerts_to_slack.py --token YOUR_TOKEN --channel #alerts --alerts 1-3
# Interactive mode - select which alerts to post
python post_alerts_to_slack.py --token YOUR_TOKEN --channel #alerts --interactive
# Using environment variables
export SLACK_TOKEN=xoxb-your-token
export SLACK_CHANNEL=#alerts
python post_alerts_to_slack.py --alerts 1,9Get Slack OAuth Token:
- Go to https://api.slack.com/apps
- Create App β OAuth & Permissions
- Add scopes:
chat:write,chat:write.public - Install to workspace β Copy Bot User OAuth Token
The AI will provide:
- Root Cause Analysis: Identifies the primary issue
- Immediate Actions: Actionable next steps
- Investigation Checklist: What to check
- Long-term Prevention: Recommendations to prevent recurrence
- Code References: Specific files and line numbers (if code context enabled)
- Log Analysis: Related error patterns from Kibana
Edit in n8n workflow:
- Input Channel: Where alerts are posted (default:
#mcp-testing) - Output Channel: Where analysis is sent (default:
#n8n-output)
Adjust in OpenAI node:
0.1-0.3: More deterministic, consistent responses0.4-0.7: More creative, varied responses
Edit "Route by Severity" node to change which alerts get detailed analysis:
- CRITICAL: Full root cause analysis
- HIGH/MEDIUM: Quick analysis
hackathon-2025/
βββ dockers/ # MCP server setup
β βββ multi-mcp-docker-compose.yml
β βββ mcp-bridge/ # HTTP bridge for MCP servers
β βββ kibana-bridge/ # Kibana-specific bridge
β βββ MULTI_MCP_SETUP.md
β βββ QUICK_START.md
βββ n8n-workflow/ # n8n workflows
β βββ production-alert-analyzer.json
β βββ WORKFLOW_GUIDE.md
β βββ CODE_CONTEXT_GUIDE.md
βββ sample-app/ # Spring Boot test application
β βββ src/main/java/...
βββ sample-logs/ # Sample Kibana logs
β βββ kibana-sample-logs.json
β βββ convert-to-bulk.js
β βββ README.md
βββ sample-alerts/ # Sample Slack alerts
β βββ sample-slack-alerts.md
β βββ test-workflow.sh
βββ TESTING_GUIDE.md # Complete testing guide
- TESTING_GUIDE.md - Complete testing guide with scenarios
- n8n-workflow/WORKFLOW_GUIDE.md - n8n workflow details
- n8n-workflow/CODE_CONTEXT_GUIDE.md - Adding code context
- dockers/MULTI_MCP_SETUP.md - Multi-MCP server setup
- sample-logs/README.md - Log import and testing
- sample-logs/KIBANA_IMPORT_GUIDE.md - Kibana import details
# Check logs
docker compose -f multi-mcp-docker-compose.yml logs
# Verify environment variables
cat .env
# Restart services
docker compose -f multi-mcp-docker-compose.yml restart- Verify workflow is Active (toggle in top-right)
- Check Slack credentials are valid
- Verify channel IDs match your workspace
- Test with simple message: "test CRITICAL alert"
- Add more context to alerts (service, metrics, stack traces)
- Enable code context in workflow
- Lower AI temperature (0.1-0.3)
- Enhance system prompt with examples
# Test bridges
curl http://localhost:3000/health
curl http://localhost:3001/health
# If n8n in Docker, use:
# http://host.docker.internal:3000/message
# http://host.docker.internal:3001/message- Average Response Time: 5-10 seconds
- Token Usage: 500-1500 tokens per alert
- Cost: ~$0.001-0.003 per alert (GPT-4o-mini)
- Throughput: 100+ alerts/hour
- Never commit
.envfiles to version control - Use read-only repository mounts when possible
- Restrict network access to MCP bridge ports
- Use API keys with minimal required permissions
- Consider adding authentication to HTTP bridges in production
- β Test with all sample alerts
- β Customize prompts for your services
- β Add incident ticket creation (Jira/ServiceNow)
- β Integrate with monitoring systems (Datadog/Prometheus)
- β Build runbook database for AI reference
- β Add historical incident matching
- β Create metrics dashboard
- β Set up on-call escalation
This is a hackathon project. Feel free to:
- Add more MCP servers
- Enhance AI prompts
- Add new alert types
- Improve error handling
- Add more test scenarios
MIT License - See LICENSE file for details
For issues or questions:
- Check relevant documentation in subdirectories
- Review Docker logs:
docker compose logs - Test MCP bridges:
curl http://localhost:3000/health - Verify n8n execution logs in UI