Work developed for the Deep Learning course in the Master's in Data Science and Advanced Analytics at NOVA IMS (Spring Semester 2024-2025).
This project applies advanced Deep Learning techniques to tackle the challenge of rare species classification from images. Using the BioCLIP dataset, sourced from the Encyclopedia of Life (EOL), which contains over 11,000 images across 202 animal families and associated taxonomic metadata (kingdom, phylum, family), we developed a robust pipeline to preprocess imbalanced and noisy data, train multiple neural network architectures, and deploy an innovative zero-shot classification approach to improve model performance. The ultimate goal is to create a tool that can aid in biodiversity conservation through automated species identification.
The primary objective is to develop a highly accurate image classification model by:
- Exploring the complex BioCLIP dataset to understand its structure and inherent challenges, such as severe class imbalance.
- Preprocessing images and implementing data augmentation strategies to create a robust training pipeline.
- Developing and evaluating multiple deep learning models, from a baseline CNN to state-of-the-art pre-trained architectures.
- Innovating with a zero-shot classification pre-filtering step to remove noisy data and enhance model accuracy.
This project was developed for the Deep Learning course in the Master's in Data Science and Advanced Analytics program at NOVA IMS, during the 2nd Semester of the 2024/2025 academic year.
The dataset is derived from the BioCLIP project, with images and metadata sourced from the Encyclopedia of Life (EOL).
- Dataset: 11,983 images of rare species.
- Target: Classification across 202 unique
familylabels within theAnimaliakingdom. - Source Links: BioCLIP Project
The project follows the CRISP-DM framework, adapted for deep learning, guiding the process from problem understanding to deployment.
Figure 1: Project Flowchart.
- Business Understanding: π‘
- Problem: Classify rare species images into their
familybased on visual features. - Importance: Automate species identification to aid biodiversity conservation.
- Data Source: BioCLIP dataset with
familyas the target variable.
- Problem: Classify rare species images into their
- Data Understanding: π
- Dataset: 11,983 images, 7 metadata features, 202 families, all within
Animalia. - Challenges: High class imbalance (Figure B2), potential non-animal outliers (Figure B3).
- Exploration: Verified data types, checked for missing values/duplicates, and visualized family distribution.
- Splitting: Stratified split into 80% training, 10% validation, 10% test sets.
- Dataset: 11,983 images, 7 metadata features, 202 families, all within
-
Data Preparation: π οΈ
- Image Preprocessing: Resized to 224x224, maintained RGB mode, preserved aspect ratios.
- Class Imbalance: Applied SMOTE-inspired augmentation (Keras
RandAugment, Figure B6) and class weighting. - Transformations: Explored grayscale, contrast, and saturation adjustments (Figure B5).
- Modeling: π€
-
Baseline CNN: Built a custom CNN using Keras Functional API (Figure C1).
-
Transfer Learning: Tested pre-trained models (VGG19, ResNet152V2, ConvNeXtBase, EfficientNetV2B0) with frozen base layers and custom classification heads (AnnexβA).
-
Experiments: Evaluated combinations of preprocessing (original, contrast, saturation) and imbalance handling (original, SMOTE, class weights; Tables C1 & C2).
-
Hyperparameter Tuning: Used Keras Tuner (Hyperband strategy, Annex B) to optimize the best model (
ConvNeXtBase), tuning learning rate, optimizer, and dropout (Table D1).
-
-
Evaluation: β
- Metrics: Macro F1-Score (primary due to imbalance), Accuracy, Precision, Recall, AUROC.
- Analysis: Learning curves (Figure F1) assessed generalization; confusion matrices (Figure F4) and qualitative examples (Figures F2 & F3) identified misclassification patterns (e.g., visually similar species, poor image quality).
- Callbacks: Used
ModelCheckpoint,CSVLogger,LearningRateScheduler,EarlyStopping.
-
Deployment: π
- Deliverables: Code, notebooks, and a comprehensive report detailing methodology and findings.
- Method: Applied CLIP (
clip-vit-base-patch16) for zero-shot classification to filter non-animal images (~15% of the dataset, Figure E2; Annex E). - Impact: Retrained the best model (
ConvNeXtBasewith SMOTE) on the filtered "OnlyAnimals" dataset, improving robustness and reducing overfitting.
The ConvNeXtBase model, trained on the CLIP-filtered "OnlyAnimals" dataset with SMOTE-inspired augmentation, emerged as the top-performing solution. It achieved a final Accuracy of 83.1% and a Macro F1-Score of 78.7% on the hold-out test set. This project demonstrates that a combination of advanced transfer learning, innovative data cleaning with zero-shot models, and robust imbalance handling can create a powerful and scalable solution for automated species classification, directly supporting biodiversity conservation efforts.
Feel free to explore the notebooks to see the implementation details of each phase!
-
Data & Image Preparation
-
Baseline Model - CNN
-
Pre-trained Models
-
Tuning Best Model
-
Innovative Approach
- AndrΓ© Silvestre, 20240502
- Diogo Duarte, 20240525
- Filipa Pereira, 20240509
- Maria Cruz, 20230760
- Umeima Adam Mahomed, 20240543
