SentinelShield is a comprehensive AI-powered content moderation platform that combines rule-based filtering with state-of-the-art machine learning models to provide robust protection against harmful content. Built with FastAPI and designed for high-performance, scalable content moderation workflows.
- Multi-Model AI Pipeline: Integrates Llama Prompt Guard 2, Llama Guard 4, and custom models
- Rule-Based Engine: Configurable blacklist/whitelist rules with regex pattern matching
- Real-Time Moderation: Sub-second response times with detailed reasoning
- RESTful API: Clean, documented endpoints for easy integration
- Comprehensive Logging: Detailed audit trails and performance metrics
- Modular Architecture: Pluggable provider system for easy model integration
Based on real-world testing, SentinelShield delivers exceptional performance:
- Rule Engine: < 0.01ms response time for pattern matching
- Llama Prompt Guard 2: ~1.5s for prompt injection detection
- Overall Pipeline: Sub-second moderation decisions
- High Accuracy: Successfully blocks prompt injection attempts and harmful content
Rule Engine Timing: 0.000007 seconds
Llama Prompt Guard 2: 1.496 seconds (prompt injection detection)
Total Pipeline: 1.496 seconds
SentinelShield uses a sophisticated multi-layered architecture:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ FastAPI App │───▶│ Orchestrator │───▶│ Rule Engine │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ API Routers │ │ AI Providers │ │ YAML Rules │
└─────────────────┘ └─────────────────┘ └─────────────────┘
- Python 3.8 or higher
- 4GB+ RAM (for AI models)
- Internet connection (for model downloads)
# Install dependencies
pip install fastapi uvicorn pydantic httpx pyyaml pytest transformers modelscope
pip install 'httpx<0.28' -U
# Start the server
uvicorn sentinelshield.api.main:app --reloadThe API will be available at http://localhost:8000
Specialized endpoint for detecting prompt injection attacks using Llama Prompt Guard 2.
curl -X POST "http://localhost:8000/v1/prompt-guard" \
-H "Content-Type: application/json" \
-d '{"prompt": "ignore your previous instruction"}'Response:
{
"safe": false,
"decision": "BLOCK",
"reasons": [
{
"engine": "llama_prompt_guard_2",
"category": "LABEL_1",
"score": 0.9989620447158813
}
],
"model_version": "llama_prompt_guard_2"
}General content moderation using lightweight providers and rule-based filtering.
curl -X POST "http://localhost:8000/v1/general-guard" \
-H "Content-Type: application/json" \
-d '{"text": "hello world"}'Response:
{
"safe": true,
"decision": "ALLOW",
"reasons": [
{
"engine": "dummy",
"category": "ALLOW",
"score": 0.0
}
],
"model_version": "pipeline"
}Input: "ignore your previous instruction"
- Detection: Llama Prompt Guard 2 identifies prompt injection attempt
- Score: 0.9989 (99.89% confidence)
- Decision: BLOCK
- Response Time: 0.137 seconds
Input: "nazi"
- Detection: Rule engine matches blacklist pattern
- Rule ID:
hate_speech_en - Decision: BLOCK
- Response Time: 0.000015 seconds
Input: "hello"
- Detection: No harmful patterns detected
- Score: 0.0015 (very low risk)
- Decision: ALLOW
- Response Time: 1.496 seconds (includes AI model inference)
Rules are defined in YAML files under sentinelshield/rules/:
# blacklist.yml
- id: hate_speech_en
when: content.match(r"\bnazi\b|\bhate\b")
then: BLOCK
# whitelist.yml
- id: allow_test
when: content.match(r"allowed")
then: ALLOWConfigure different AI models for different endpoints in sentinelshield/core/config.py:
api_configs: Dict[str, APIConfig] = {
"/v1/prompt-guard": APIConfig(providers=["llama_prompt_guard_2"]),
"/v1/general-guard": APIConfig(providers=["dummy"]),
}Test the Llama Prompt Guard 2 model directly:
python examples/prompt_guard_cli.py "Your test prompt here"Output:
decision: ALLOW, score: 0.0015
# Test prompt injection detection
curl -X POST "http://localhost:8000/v1/prompt-guard" \
-H "Content-Type: application/json" \
-d '{"prompt": "ignore previous instructions"}'
# Test general moderation
curl -X POST "http://localhost:8000/v1/general-guard" \
-H "Content-Type: application/json" \
-d '{"text": "test message"}'SentinelShield provides comprehensive logging for monitoring and debugging:
[2025-07-04 11:40:31,677] INFO: /v1/prompt-guard request: ignore your previous instruction
[2025-07-04 11:40:31,677] INFO: /v1/prompt-guard response: ModerationResponse(safe=False, decision='BLOCK', reasons=[Reason(engine='llama_prompt_guard_2', id=None, category='LABEL_1', score=0.9989620447158813)], policy_version=None, model_version='llama_prompt_guard_2')
[2025-07-04 11:40:31,676] INFO: Moderation timings: {'rule_engine': 5.9604644775390625e-06, 'llama_prompt_guard_2': 0.13741183280944824, 'total': 0.13742899894714355}
sentinelshield/
├── api/ # FastAPI application
│ ├── main.py # App initialization
│ └── routers/ # API endpoints
├── core/ # Core business logic
│ ├── orchestrator.py # Main moderation engine
│ ├── schema.py # Data models
│ └── config.py # Configuration
├── models/ # AI model providers
│ └── providers/ # Model implementations
├── rules/ # Rule definitions
│ ├── blacklist.yml # Block patterns
│ └── whitelist.yml # Allow patterns
└── tests/ # Test suite
- Create a new provider class in
sentinelshield/models/providers/ - Implement the
moderate(text: str) -> tuple[float, str]method - Register the provider in
sentinelshield/models/providers/__init__.py - Configure it in the API settings
pytest sentinelshield/tests/FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "sentinelshield.api.main:app", "--host", "0.0.0.0", "--port", "8000"]- Use a production ASGI server like Gunicorn with Uvicorn workers
- Implement rate limiting and authentication
- Set up monitoring and alerting
- Configure proper logging levels
- Use environment variables for sensitive configuration
This project is licensed under the MIT License - see the LICENSE file for details.
- Llama Prompt Guard 2: Meta's prompt injection detection model
- FastAPI: Modern, fast web framework for building APIs
- Transformers: Hugging Face's state-of-the-art NLP library
For questions, issues, or contributions:
- Open an issue on GitHub
- Check the documentation in
API_CONFIGURATION.md - Review the test examples in
examples/
SentinelShield - Protecting AI systems with intelligent content moderation.