This project tackles the challenge of moderating toxic content in internet memes, which combine images and text to spread potentially harmful or offensive messages. We propose a multimodal detection system that uses OCR, NLP, and Vision-Language Fusion via SigLIP to classify memes as toxic or non-toxic, ensuring safer social media environments.
Architecture Diagram: Fusion-based Toxic Meme Classifier with PaddleOCR, Kosmos-2, Vision Transformer, and SigLIP.
-
Input Meme: Raw image meme with visual and textual content.
-
Text Preprocessing:
- Uses
PaddleOCRandKOSMOS-2for text extraction.
- Uses
-
Image Preprocessing:
- Image resized and transformed into pixel tensors.
-
Feature Embedding:
- Text and image inputs are separately embedded.
- Passed into SigLIP model for multimodal fusion.
-
Classification:
- Uses Sigmoid activation + Cross Entropy Loss.
- Optimized to predict
ToxicorNon-Toxic.
- 🔍 Detects toxicity in memes using a deep learning fusion approach.
- 🔤 Supports OCR from meme text using PaddleOCR & KOSMOS-2.
- 👁️🗨️ Uses SigLIP (Google) for image-text fusion and classification.
- 📈 Provides performance metrics and visualizations.
| Component | Technology |
|---|---|
| Language | Python 3.8+ |
| OCR Engine | PaddleOCR, KOSMOS-2 |
| Text Encoder | BERT (transformers) |
| Image Encoder | Vision Transformer (ViT) |
| Fusion Model | SigLIP |
| DL Framework | PyTorch |
| Visualization | Matplotlib, Seaborn |
🔮 Future Scope
🌍 Multilingual toxic meme detection
🎥 Video meme frame-based detection
🌐 Web portal for real-time moderation
📦 Deploy as browser extension / REST API
📄 License
This project is licensed under the MIT License. See the LICENSE file for more details.
⭐ Support
If you found this project useful, please consider giving it a ⭐ and sharing it with others!
