A comprehensive deep learning project that implements an advanced face detection system combining transfer learning with human feedback for continuous improvement. While the model achieves perfect metrics in controlled environments, real-world applications present diverse challenges that require adaptive learning. This project implements RLHF (Reinforcement Learning from Human Feedback) to bridge this gap, creating a system that learns from real-world usage patterns.
The implementation leverages MobileNetV2's architecture as its backbone and performs dual tasks: face detection with confidence scoring and precise bounding box prediction. Through a custom-built GUI application, users can detect faces in real-time and provide feedback on the model's performance. This feedback is systematically collected and analyzed through a two-phase training approach that prioritizes challenging cases, ensuring continuous improvement in real-world scenarios such as varying lighting conditions, different face angles, and occlusions.
What sets this project apart is its end-to-end implementation of the RLHF concept in computer vision, specifically designed to enhance model generalization. While traditional face detection models remain static after training, this system creates a continuous improvement loop where human feedback directly influences model behavior. The implementation includes comprehensive metrics tracking, automated parameter adjustment based on feedback patterns, and a structured approach to model enhancement through grid search optimization and RLHF-based fine-tuning.
The results demonstrate significant improvements in model generalization, with the RLHF-improved model showing enhanced performance particularly in bounding box precision (57% improvement in MSE) and overall loss reduction (64% improvement), while maintaining perfect classification metrics. This approach effectively bridges the gap between laboratory performance and real-world application, creating a more robust and adaptable face detection system.
- Overview
- Project Structure
- Dataset
- Model Architecture
- Training with Grid Search
- RLHF Implementation
- Results
- GUI Application
- Installation
- Usage
- License
This project implements a sophisticated face detection system that combines transfer learning with human feedback for continuous improvement. Built on MobileNetV2's architecture, the system performs dual tasks: face detection with confidence scoring and precise bounding box prediction, achieving robust performance through a carefully designed training pipeline.
The implementation features three key components:
- Transfer Learning Model: Leverages MobileNetV2's pre-trained weights, adapted for face detection through a custom dual-head architecture for classification and bounding box regression.
- RLHF Pipeline: Implements a systematic approach to collect and utilize human feedback, enabling continuous model improvement through a two-phase training strategy.
- Interactive GUI: Provides a user-friendly interface for real-time face detection and feedback collection, creating a seamless loop between model predictions and user interactions.
The project is trained on a balanced dataset of 11,985 images, with a comprehensive evaluation system that tracks both traditional metrics and user feedback. Through the RLHF implementation, the model adapts to challenging cases and improves its performance based on real-world usage.
face_detection/ ├── Data/ │ ├── Test/ │ ├── Images/ # Test set images │ ├── x_y.jpg │ └── ... │ └── Labels/ # Test set annotations │ ├── x_y.json │ └── ... │ ├── Train/ │ ├── Images/ # Training set images │ ├── x_y.jpg │ └── ... │ └── Labels/ # Training set annotations │ ├── x_y.json │ └── ... │ ├── Validation/ │ ├── Images/ # Validation set images │ ├── x_y.jpg │ └── ... │ └── Labels/ # Validation set annotations │ ├── x_y.json │ └── ... │ └── Data.csv # Dataset metadata and specifications │ ├── feedback/ # Feedback system │ ├── criteria.txt # Feedback evaluation criteria │ ├── feedback_data.json # Collected feedback data │ ├── feedback_metrics.json # Feedback analysis metrics │ └── verify_feedback.py # Feedback verification tools │ ├── grid_search_results/ # Hyperparameter optimization │ ├── combination_1.json # Individual trial results │ ├── grid_search_results.csv # Results summary │ └── grid_search.log # Training logs │ ├── models/ # Trained models │ ├── face_detection_XXXXXX/ # Model versions │ ├── best_weights.weights.h5 # Best model weights │ ├── evaluation_results.png # Performance visualizations │ ├── parameters.json # Model parameters │ └── training_history.json # Training metrics │ ├── results/ # Evaluation results │ ├── best_model_improved_results/ # RLHF-improved model results │ ├── orignal_dataset/ # Results on original data │ ├── real_world_dataset/ # Results on real-world tests │ └── rlhf_dataset/ # Results on RLHF data │ └── rlhf/ # RLHF analysis │ └── analysis_feedback.png # Feedback visualizations │ ├── rlhf/ # RLHF implementation │ ├── data/ # RLHF training data │ ├── augmentation.py # Data augmentation │ ├── dataset_creator.py # Dataset management │ ├── model_improver.py # Model improvement │ └── utils.py # Utility functions │ ├── scripts/ # Training scripts │ ├── train_gridSearch.py # Grid search implementation │ └── train.py # Base training script │ ├── src/ # Core implementation │ ├── feedback/ # Feedback collection │ ├── gui/ # GUI implementation │ ├── model/ # Model architecture │ └── utils/ # Utility functions │ └── requirements.txt # Project dependencies
-
Data Organization
- Structured dataset splits with images and labels
- Comprehensive metadata tracking
- Standardized annotation format
-
Model Development
- Grid search optimization
- Multiple model versions
- Training and evaluation scripts
- Performance tracking
-
RLHF System
- Feedback collection and analysis
- Model improvement pipeline
- Results visualization
- Data augmentation
-
User Interface
- Interactive GUI application
- Real-time detection
- Feedback submission
- Result visualization
This structure ensures modular development, easy maintenance, and systematic tracking of experiments and improvements.
The project utilizes a carefully curated dataset combining images from two renowned sources: Labeled Faces in the Wild (LFW) and Jack Dataset, creating a balanced collection of 11,985 images for face detection training.
| Set | Total Images | % of Dataset | Face Images | % Faces | No Face Images | % No Faces |
|---|---|---|---|---|---|---|
| Train | 9588 | 80.00% | 4794 | 50.00% | 4794 | 50.00% |
| Test | 1197 | 9.99% | 598 | 49.96% | 599 | 50.04% |
| Validation | 1200 | 10.01% | 600 | 50.00% | 600 | 50.00% |
| Total | 11985 | 100.00% | 5992 | 50.00% | 5993 | 50.00% |
- Image Format: JPG
- Dimensions: 250x250 pixels (standardized)
- Color Space: RGB
- Class Balance: Near-perfect (50% faces, 50% non-faces)
- Split Ratio: 80% train, 10% validation, 10% test
- Annotations: Bounding box coordinates in JSON format
The dataset's balanced nature and diverse composition provide a solid foundation for training a robust face detection model, while its standardized format ensures consistent processing throughout the pipeline.
The face detection model is built using transfer learning with MobileNetV2 as the backbone, implementing a dual-head architecture for simultaneous face classification and bounding box regression. The model is designed to be efficient while maintaining high accuracy in both tasks.
- Backbone: MobileNetV2 (pre-trained on ImageNet)
- Input Shape: 224Ă—224Ă—3 (RGB images)
- Feature Extraction: Global Max Pooling on backbone output
- Trainable Base: False (frozen weights for transfer learning)
-
Classification Branch:
# First Dense Block Dense(1024) → BatchNorm → ReLU → Dropout # Second Dense Block Dense(512) → BatchNorm → ReLU → Dropout # Output Dense(1, sigmoid) # Face/No-Face Classification
-
Regression Branch:
# First Dense Block Dense(1024) → BatchNorm → ReLU → Dropout # Second Dense Block Dense(512) → BatchNorm → ReLU → Dropout # Output Dense(4, sigmoid) # Bounding Box Coordinates [x1, y1, x2, y2]
-
Classification Loss:
- Binary Cross-Entropy with label smoothing (0.1)
- Helps prevent overconfident predictions
-
Regression Loss:
def regression_loss(y_true, y_pred): # Coordinate difference delta_coord = tf.reduce_sum(tf.square(y_true[:,:2] - y_pred[:,:2])) # Size difference h_true = y_true[:,3] - y_true[:,1] w_true = y_true[:,2] - y_true[:,0] h_pred = y_pred[:,3] - y_pred[:,1] w_pred = y_pred[:,2] - y_pred[:,0] delta_size = tf.reduce_sum(tf.square(w_true - w_pred) + tf.square(h_true - h_pred)) return delta_coord + delta_size
# Parameters
learning_rate
batch_size
epochs
class_weight
reg_weight
dropout_rate -
Optimizer: Adam with learning rate decay
lr_schedule = learning_rate * (decay_rate ^ epoch)
-
Regularization:
- L2 regularization (0.02) on dense layers
- Dropout (0.5) after each dense layer
- Batch Normalization for stable training
-
Early Stopping:
- Monitor: validation total loss
- Patience: 5 epochs
- Restore best weights
- Early Stopping: Prevents overfitting
- Model Checkpoint: Saves best weights
- Learning Rate Scheduler: Implements decay
- CSV Logger: Tracks training metrics
- Training Time Tracker: Monitors duration
def predict(image_path, threshold=0.5):
# Image preprocessing
img = load_and_preprocess(image_path)
# Model prediction
class_pred, bbox_pred = model(img)
# Threshold-based detection
has_face = class_pred >= threshold
return {
'has_face': has_face,
'confidence': class_prob,
'bbox': bbox_coords if has_face else None
}The architecture is designed to balance accuracy and efficiency, making it suitable for real-time face detection while maintaining robust performance in both classification and localization tasks.
You're right. Let me revise the Training with Grid Search section with the correct number of combinations and include the best model results:
To improve the model's performance, we implemented a comprehensive grid search over key hyperparameters. The search explored 24 different combinations (2Ă—3Ă—1Ă—2Ă—2Ă—1Ă—1Ă—1Ă—1 = 24) of parameters to find the most effective configuration.
param_grid = {
'class_weight': [0.1, 0.2], # Balance between classification and regression
'reg_weight': [1.7, 1.8, 1.9], # Importance of bounding box accuracy
'learning_rate': [0.0001], # Initial learning rate
'batch_size': [96, 128], # Training batch size
'dropout_rate': [0.6, 0.7], # Regularization strength
'epochs': [25], # Maximum training epochs
'early_stopping_patience': [7], # Epochs before early stopping
'reduce_lr_patience': [4], # Epochs before LR reduction
'lr_decay_rate': [0.9] # Learning rate decay factor
}The grid search identified the optimal configuration (Combination ID: 13):
best_params = {
'class_weight': 0.2,
'reg_weight': 1.7,
'learning_rate': 0.0001,
'batch_size': 96,
'dropout_rate': 0.6,
'epochs': 25,
'early_stopping_patience': 7,
'reduce_lr_patience': 4,
'lr_decay_rate': 0.9
}The best model achieved exceptional results:
-
Classification Performance:
- Accuracy: 1.0000
- Loss: 0.2028
- Precision: 1.0000
- Recall: 1.0000
- F1 Score: 1.0000
-
Regression Performance:
- MAE: 0.1476
- MSE: 0.0653
- RMSE: 0.2556
- Total Loss: 1.4716
-
Training Characteristics:
- Training Time: 7.38 minutes
- Early Stopping: Yes (at epoch 24)
- Final Validation Loss: 1.4733
-
Parameter Sensitivity:
- Higher class_weight (0.2) improved detection stability
- Moderate reg_weight (1.7) provided best bbox accuracy
- Lower batch size (96) offered better generalization
-
Model Behavior:
- Perfect classification accuracy on validation set
- Strong bounding box prediction (MAE: 0.1476)
- Efficient training convergence (early stopping at 24/25 epochs)
-
Test Set Performance:
- Maintained perfect classification (Accuracy: 1.0)
- Strong regression metrics (MAE: 0.1500)
- Robust F1 Score (1.0)
These results demonstrate the effectiveness of the grid search in finding a balanced configuration that excels in both face detection and bounding box regression tasks.
Thank you for providing these details. Let me update the Analysis section with the actual metrics:
You're right. Let me revise the RLHF Implementation section to include the complete process:
Despite achieving excellent performance in controlled environments through transfer learning and grid search optimization (accuracy: 1.000, precision: 1.000), the model faced challenges in real-world scenarios. The implementation of Reinforcement Learning from Human Feedback (RLHF) aims to bridge this gap, creating a continuous improvement loop that adapts to real-world conditions such as varying lighting, different face angles, diverse image qualities, and occlusions.
The RLHF process follows three stages:
-
Feedback Collection: Through a GUI interface where users evaluate model predictions, provide correct bounding boxes, rate performance, and add comments.
-
Feedback Analysis: Systematic evaluation of performance patterns, failure modes, user ratings, detection confidence, and bounding box accuracy.
-
Model Improvement: Targeted enhancement through automatic strategy determination and priority-based training.
- GUI Interface:
- Custom interface built with CustomTkinter
- Model and image selection capabilities
- Real-time face detection visualization
- Interactive bounding box correction tool
- Rating system (0-5 scale)
- Comments section for additional feedback
- Feedback Collection Process:
- Used best model from Grid Search (Combination ID: 13)
- Selected 100 external images for evaluation
- For each image:
- Model makes prediction
- User draws correct bounding box
- Provides rating and comments
- Feedback saved in JSON format
- Feedback Structure:
feedback_data = {
'image_path': str,
'model_name': str,
'model_prediction': {
'has_face': bool,
'confidence': float,
'bbox': List[float]
},
'human_correction': List[float],
'rating': float,
'comments': str,
'timestamp': str,
'image_size': Tuple[int, int]
}After collecting feedback from 100 images through the GUI interface, the analysis revealed significant insights about the model's performance:
- Overall Performance Metrics:
metrics = {
'total_feedback': 100,
'average_rating': 2.0, # Below average performance
'average_confidence': 0.622, # Moderate confidence
'average_iou': 0.317 # Low IoU score
}- IoU Distribution by Quality:
iou_ranges = {
'excellent': 15, # IoU >= 0.4
'good': 24, # 0.3 <= IoU < 0.4
'fair': 20, # 0.2 <= IoU < 0.3
'poor': 10 # IoU < 0.2
}- Classification Performance:
classification_metrics = {
'true_positives': 39,
'false_positives': 30,
'false_negatives': 31,
'precision': 0.565,
'recall': 0.557,
'f1_score': 0.561
}The feedback was collected using a standardized 5-point scale:
| Rating | Criteria |
|---|---|
| 5 | Perfect detection and bbox (100% of face) |
| 4 | Good detection, minor bbox issues (>66% of face) |
| 3 | Correct detection, noticeable bbox issues (>33% of face) |
| 2 | Correct detection, poor bbox (<33% of face) |
| 1 | Poor detection and bbox (<33% face, <50% detection) |
| 0 | Completely wrong (<25% detection, wrong bbox) |
-
Detection Issues:
- High false negative rate (31%)
- Significant false positives (30%)
- Balanced but low precision-recall trade-off
-
Bounding Box Quality:
- Only 15% achieved excellent IoU
- 44% good or excellent performance
- 30% fair or poor performance
-
Temporal Trends:
- Initial average rating: 2.7
- Final average rating: 2.2
- Declining performance on challenging cases
- IoU fluctuation between 0.25-0.45
These findings led to specific strategy adjustments in the improvement phase, particularly focusing on:
- Reducing false negatives
- Improving bounding box precision
- Enhancing confidence calibration
The RLHF implementation employs a systematic approach to improve model performance through feedback analysis and targeted training. The process consists of three main components: automatic strategy determination, phased training implementation, and performance evaluation.
- Automatic Strategy Determination
The system analyzes failure patterns in the feedback data to automatically determine the optimal training strategy. It considers four potential scenarios:
- Poor bounding box performance (>40% of failures): Emphasizes regression with higher reg_weight
- Low confidence issues (>40% of failures): Focuses on classification with higher class_weight
- High confidence errors (>40% of failures): Addresses false positives with adjusted learning rate
- Balanced issues: Uses moderate parameters across all aspects
In this case, the analysis revealed a distributed pattern of issues:
failure_patterns = {
'low_confidence': 31,
'high_confidence_wrong': 9,
'poor_bbox': 25,
'false_positives': 1,
'false_negatives': 31
}Since no single failure pattern exceeded the 40% threshold, the system selected a balanced approach with the following parameters:
strategy = {
'epochs': 40,
'batch_size': 48,
'early_stopping_patience': 12,
'reduce_lr_patience': 5,
'lr_decay_rate': 0.98,
'class_weight': 0.5,
'reg_weight': 2.0,
'learning_rate': 1e-4,
'dropout_rate': 0.6
}- Two-Phase Training Implementation
The improvement process implements a two-phase training approach to maximize the impact of feedback:
Phase 1 (Priority Training):
- Focuses on samples with ratings ≤ 2 (41 original samples)
- Applies aggressive augmentation (7 variations per sample)
- Emphasizes learning from problematic cases
- Uses 18 validation samples for performance monitoring
Phase 2 (Comprehensive Training):
- Includes all feedback samples (82 original samples)
- Applies standard augmentation (5 variations per sample)
- Ensures balanced learning from all feedback types
- Maintains consistent validation set for comparison
- Performance Results
The implementation achieved significant improvements across all metrics:
final_metrics = {
# Classification Metrics
'class_accuracy': 1.000,
'class_precision': 1.000,
'class_recall': 1.000,
'f1_score': 1.000,
# Regression Metrics
'reg_mae': 0.114,
'reg_mse': 0.028,
'reg_rmse': 0.167,
# Overall Performance
'total_loss': 3.110,
'class_loss': 0.240,
'reg_loss': 1.495
}These results demonstrate the effectiveness of our RLHF implementation in:
- Achieving perfect classification performance
- Significantly improving bounding box precision
- Maintaining balanced overall performance
- Successfully addressing identified failure patterns
The balanced strategy, automatically determined through feedback analysis, proved highly effective in improving both the classification accuracy and bounding box precision of the model.
The evaluation demonstrates the model's evolution through grid search optimization and RLHF improvement.
Grid search optimization (Combination ID: 13) achieved optimal performance with parameters:
best_params = {
'class_weight': 0.2,
'reg_weight': 1.7,
'learning_rate': 0.0001,
'batch_size': 96,
'dropout_rate': 0.6,
'epochs': 25,
'early_stopping_patience': 7,
'reduce_lr_patience': 4,
'lr_decay_rate': 0.9
}Performance metrics:
grid_search_metrics = {
# Classification Performance
'test_class_accuracy': 1.000,
'test_class_precision': 1.000,
'test_class_recall': 1.000,
'test_f1_score': 1.000,
# Regression Performance
'test_reg_mae': 0.150,
'test_reg_mse': 0.065,
'test_reg_rmse': 0.255,
# Overall Performance
'test_total_loss': 8.684,
'test_class_loss': 0.207,
'test_reg_loss': 5.084
}After RLHF implementation with balanced strategy:
rlhf_strategy = {
'epochs': 40,
'batch_size': 48,
'early_stopping_patience': 12,
'reduce_lr_patience': 5,
'lr_decay_rate': 0.98,
'class_weight': 0.5,
'reg_weight': 2.0,
'learning_rate': 1e-4,
'dropout_rate': 0.6
}Final performance:
rlhf_metrics = {
# Classification Performance
'test_class_accuracy': 1.000,
'test_class_precision': 1.000,
'test_class_recall': 1.000,
'test_f1_score': 1.000,
# Regression Performance
'test_reg_mae': 0.114, # 24% improvement
'test_reg_mse': 0.028, # 57% improvement
'test_reg_rmse': 0.167, # 34% improvement
# Overall Performance
'test_total_loss': 3.110, # 64% improvement
'test_class_loss': 0.240,
'test_reg_loss': 1.495 # 71% improvement
}The RLHF implementation significantly improved the model's performance, particularly in bounding box precision and overall loss reduction, while maintaining perfect classification metrics.
The GUI application serves as the interface for both face detection and feedback collection, built using CustomTkinter for a modern, user-friendly experience.
-
Model Selection and Configuration
- Model loading functionality
- Detection threshold adjustment (0-1)
- Real-time parameter updates
-
Image Processing
- Image loading and display
- Real-time face detection
- Bounding box visualization
- Detection confidence display
-
Feedback Collection
- Toggle feedback mode
- Interactive bounding box drawing
- Rating system (0-5 scale)
- Comments section
- Automatic feedback storage
-
Model Loading:
- Click "Select Model"
- Choose model directory containing weights
- Model name and status displayed
-
Image Processing:
- Click "Select Image"
- Adjust detection threshold if needed
- View detection results:
- Face detection status
- Confidence score
- Bounding box coordinates
-
Feedback Submission:
- Enable feedback mode
- Draw correction box
- Rate model performance (0-5)
- Add optional comments
- Submit feedback
The interface provides a seamless workflow for both model evaluation and continuous improvement through user feedback.
- Python 3.8+ (3.10.15 recommended for this project)
- CUDA-capable GPU
- Git
- Clone the repository:
git clone https://github.com/AlvaroVasquezAI/Face_Detection.git
cd Face_Detection- Create and activate a virtual environment:
# Windows
python -m venv venv
venv\Scripts\activate
# Linux/Mac
python3 -m venv venv
source venv/bin/activate- Install dependencies:
pip install -r requirements.txt# Test GPU support
python -c "import tensorflow as tf; print('GPU Available:', tf.config.list_physical_devices('GPU'))"For any installation issues, please refer to:
- Grid Search Training:
python -m scripts.train_gridSearchThis will:
- Load and preprocess the dataset
- Perform hyperparameter optimization
- Save results in
grid_search_results_xxxx/
- Collect Feedback Data (Required First Step):
python -m src.gui.appUsing the GUI:
- Load best model from grid search
- Process multiple images (recommended: 100+)
- For each image:
- Draw correction boxes
- Rate model performance (0-5)
- Provide feedback
- Feedback saved in
feedback/feedback_data.json
- Verify Collected Feedback:
python -m feedback.verify_feedbackThis will:
- Display collected feedback visualizations
- Show bounding box comparisons
- Present rating distributions
- RLHF Training:
python -m rlhf.analysis_and_retrainThis will:
- Analyze collected feedback data
- Determine optimal strategy
- Retrain model with feedback
- Launch GUI:
python -m src.gui.app- Load Model:
- Click "Select Model"
- Navigate to
models/directory - Select model folder containing
best_weights.weights.h5
- Process Images:
- Click "Select Image"
- Adjust detection threshold if needed (default: 0.5)
- View results in real-time
- Provide Feedback:
- Enable feedback mode
- Draw correction box if needed
- Rate performance (0-5)
- Add comments (optional)
- Submit feedback
MIT License
Copyright (c) 2024 Alvaro Vasquez
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software.
The Software is provided "AS IS", without warranty of any kind. For the full license text, please see the LICENSE file in the repository.
The Labeled Faces in the Wild dataset used in this project is subject to its own licensing terms. Please refer to the LFW dataset website.






































