Face Detection, Encoding, and Similarity Analysis with DeepFace + MediaPipe
FaceSim is a lightweight deep-learning pipeline for analyzing and comparing faces.
It uses DeepFace (ArcFace) for 512-D facial embeddings and MediaPipe for landmark detection.
The notebook produces similarity matrices, “same vs different” labels, and a color-coded heatmap — all with simple, reproducible Python code.
✅ Face detection (RetinaFace backend)
✅ Facial landmarks via MediaPipe (eyes, lips, jawline)
✅ Face encoding with ArcFace (512-dimensional embeddings)
✅ Cosine similarity computation
✅ Automatic “same / different” labeling
✅ Top-N similarity report + heatmap visualization
Install dependencies:
pip install -r requirements.txtRequirements
deepface==0.0.93
mediapipe==0.10.14
retina-face==0.0.14
opencv-python==4.10.0.84
matplotlib==3.9.2
pandas
numpy
- Upload your JPG/PNG images (auto-detects local images).
- Open and run all cells in FaceSim.ipynb.
- Generated outputs:
face_embeddings_per_face.csv→ 512-D embeddings per detected facetop_pairs.csv→ Top-N similar pairs above thresholdpair_labels_matrix.csv→ Full “same/different” label matrix- Heatmap visualization of cosine similarity
| Face A | Face B | Cosine Similarity | Label |
|---|---|---|---|
| ABC.jpg#face1 | ABC.jpg#face2 | 0.86 | ✅ Same |
| Park.jpg#face1 | Macbook.jpg#face3 | 0.24 | ❌ Different |
The notebook also displays a heatmap highlighting identity clusters.
FaceSim/
│
├── FaceSim.ipynb # Main notebook
├── requirements.txt # Dependency list
├── README.md # Documentation
└── cropped_faces/ (optional) # Saved face crops
- Default model: ArcFace (512-D)
- Default detector: RetinaFace
- Works best for human faces — not optimized for anime or stylized art.
- Use
detector_backend="skip"for pre-cropped faces.
- Identity similarity search
- Duplicate face detection
- Group photo analysis
- Facial embedding dataset generation
Minh Nguyen (@minh1608)
📘 Kaggle Notebook
📦 GitHub Repository
MIT License © 2025 Minh Nguyen
💬 If you use FaceSim for research or learning, please star ⭐ the repo — it helps visibility and encourages further improvements.