This project implements semantic segmentation of road scenes, classifying each pixel into semantic classes. Semantic segmentation is essential for autonomous driving and scene understanding.
The model used is U-Net with an EfficientNetB0 encoder pretrained on ImageNet, trained on the CamVid dataset with 32 semantic classes.
- Implemented U-Net with EfficientNetB0 encoder pretrained on ImageNet.
- Built a custom data generator for efficient batch loading, augmentation, and one-hot encoding.
- Achieved IoU: 0.6100, F1: 0.6450 on CamVid test set.
- Automated training with Early Stopping, LR Scheduling, and Model Checkpointing.
- Generated training curves and sample prediction visualizations for model interpretability.
- Name: CamVid (Cambridge-driving Labeled Video Database)
- Source: Kaggle
- Link: CamVid Dataset
- Classes: 32 semantic classes
Note: You may need to create a Kaggle account and accept terms to download the dataset.
- Languages: Python
- Frameworks/Libraries:
- TensorFlow / Keras
- segmentation_models
- OpenCV
- Albumentations
- Numpy, pandas, Scikit-learn
- Matplotlib
road-scene-segmentation/
│
├── assets/
│ ├── training_curves.png
│ └── predictions.png
├── CamVid/
│ ├── test/
│ ├── test_labels/
│ ├── val/
│ └── ...
├── environment.yml
├── README.md
├── requirements.txt
├── road_scene_segmentation.ipynb
├── unet_efficientnetb0.keras
└── unet_efficientnetb0.weights.h5
-
Data Preparation
- Load images & masks
- Convert RGB color to class indices
- Apply augmentations using albumentations (resize, flips, brightness, rotations, noise)
-
Custom Data Generator
- Efficient batch loading with augmentation
- One-hot encoded masks
-
Model
- U-Net with an EfficientNetB0 backbone
- Loss:
Categorical CrossEntropy + Focal Loss - Metrics: IoU, F1-score
-
Training
- Early stopping
- Learning rate reduction on plateau
- Model checkpointing
-
Evaluation
- Metrics (IoU, F1, loss)
- Training curves visualization
- Prediction visualization
| Metric | Score |
|---|---|
| IoU | 0.6100 |
| F1 Score | 0.6450 |
| Test Loss | 0.0181 |
Loss, IoU and F1-score progression across epochs:

Input image vs. Ground Truth vs. Model Prediction:

git clone https://github.com/Dhanvika27/road-scene-segmentation.git
cd road-scene-segmentationIf using a virtual environment (recommended with pip):
# Create environment
python -m venv venv
# Activate environment (macOS/Linux)
source venv/bin/activate
# Activate environment (Windows)
venv\Scripts\activate- Using pip:
pip install -r requirements.txt- Using conda:
conda env create -f environment.yml
conda activate semantic-segNote: Use either
pipORcondanot both to avoid environment conflicts.
Download the CamVid dataset from Kaggle and place it in the following structure:
CamVid/
│
├── train/
├── train_labels/
├── val/
├── val_labels/
├── test/
├── test_labels/
└── class_dict.csv
jupyter notebook road_scene_segmentation.ipynb- Final trained model:
unet_efficientnetb0.keras
- Best weights:
unet_efficientnetb0.weights.h5
Load model:
from tensorflow.keras.models import load_model
import segmentation_models as sm
model = load_model(
"unet_efficientnetb0.keras",
custom_objects={
"CategoricalCELoss": sm.losses.CategoricalCELoss(),
"CategoricalFocalLoss": sm.losses.CategoricalFocalLoss(),
"iou_score": sm.metrics.IOUScore(threshold=0.5),
"f1-score": sm.metrics.FScore(threshold=0.5)
}
)This project demonstrates the effectiveness of U-Net with an EfficientNetB0 backbone for road scene segmentation on the CamVid dataset. The model achieves a strong balance between accuracy and efficiency, making it suitable for real-world applications such as autonomous driving and advanced driver assistance systems (ADAS).
Note: Training was performed on CPU due to hardware constraints.
EfficientNetB0 was chosen as the backbone because it provides a good trade-off between accuracy and efficiency, making it practical for resource-limited environments.
This project is released under the MIT License. You are free to use, modify, and distribute it with proper attribution.