A containerized ML service for intelligent code review using GenAI and traditional ML techniques.
- Traditional ML: Scikit-learn models for complexity scoring
- Semantic Analysis: Code embeddings for similarity analysis
- GenAI Integration: LLM-powered contextual feedback
- Multi-language Support: Python, JavaScript, TypeScript
- Docker Optimized: Multi-stage builds, security scanning, cross-platform
- Production Ready: Health checks, monitoring, caching
# Clone and setup
git clone <repository>
cd code-review-agent
# Generate sample training data
python scripts/generate_sample_data.py
# Train the complexity model
python scripts/train_model.py
# Start all services
docker-compose up -d
# Test the API
curl http://localhost:8000/health# Install dependencies
pip install -r requirements/base.txt -r requirements/ml.txt
# Generate training data and train model
python scripts/generate_sample_data.py
python scripts/train_model.py
# Start the API
python src/api.pycurl -X POST http://localhost:8000/review \
-H "Content-Type: application/json" \
-d '{
"code": "def hello(): print(\"Hello World\")",
"language": "python",
"review_level": "standard",
"include_llm_review": true
}'{
"complexity_score": 0.15,
"similarity_score": null,
"issues": [],
"suggestions": ["Great job! Your code looks clean and well-structured"],
"overall_rating": "excellent",
"llm_feedback": "This is a simple, clean function...",
"analysis_time_ms": 45
}- FastAPI: High-performance web framework
- scikit-learn: Traditional ML for complexity analysis
- Transformers: Code embeddings and similarity
- Docker: Containerization and orchestration
- Redis: Caching layer
- Local LLM: Optional local language model
# Install development dependencies
pip install -r requirements/dev.txt
# Run tests
pytest tests/ -v
# Code formatting
black src/ tests/
isort src/ tests/
# Type checking
mypy src/
# Start development server
uvicorn src.api:app --reload --host 0.0.0.0 --port 8000- Multi-stage builds for optimized images
- Cross-platform support (AMD64, ARM64)
- Security scanning with Docker Scout
- SBOM generation for compliance
- Non-root containers for security
- Health checks for reliability
The service includes a trainable complexity model:
# Generate sample data
python scripts/generate_sample_data.py
# Train the model
python scripts/train_model.py --data-dir data/sample_code --output-dir models
# Retrain via API
curl -X POST http://localhost:8000/models/train# LLM Configuration
LLM_ENDPOINT=local # local, openai, anthropic
LLM_API_KEY=your_api_key # Required for external LLMs
# Model Configuration
MODEL_DIR=models # Directory for saved models
CACHE_DIR=.cache # Cache directory
# Development
DEBUG=false # Enable debug mode
LOG_LEVEL=INFO # Logging level# Enable BuildKit
export DOCKER_BUILDKIT=1
# Build multi-platform images
docker buildx create --use --name ml-builder
docker buildx build \
--platform linux/amd64,linux/arm64 \
--target production \
--tag myregistry/code-review-agent:latest \
--push .# Generate SBOM
docker sbom myregistry/code-review-agent:latest
# Vulnerability scanning
docker scout cves myregistry/code-review-agent:latestapiVersion: apps/v1
kind: Deployment
metadata:
name: code-review-agent
spec:
replicas: 3
selector:
matchLabels:
app: code-review-agent
template:
metadata:
labels:
app: code-review-agent
spec:
containers:
- name: api
image: myregistry/code-review-agent:latest
ports:
- containerPort: 8000
env:
- name: LLM_ENDPOINT
value: "local"
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 60
periodSeconds: 30
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 5- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
MIT License - see LICENSE file for details