Skip to content

πŸ” Master's project comparing 6 Neural & Evolutionary Learning models (GP, NEAT, NNs) to predict protein in chicken carcasses using a 10x10 Nested CV. Developed for the NEL course at NOVA IMS.

License

Notifications You must be signed in to change notification settings

Silvestre17/NEL_ChickenProteinPrediction_MasterProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” Neural & Evolutionary Learning: Predicting Protein in Chicken Carcasses 🧬

NEL Project Banner

GitHub Repo

πŸ“ Description

This project delves into the challenging regression task of predicting crude protein weight in chicken carcasses using a variety of physical measurements. The core of this work is a rigorous comparative analysis of six distinct Neural and Evolutionary Learning (NEL) models. By applying a robust 10x10 Nested Cross-Validation methodology, we evaluate the behavior, performance, and applicability of each algorithm on a small but complex dataset, aiming to identify the most effective predictive models.

✨ Objective

The primary objectives of this project are to:

  • Implement and compare six different neural and evolutionary algorithms for a regression task.
  • Develop a robust evaluation pipeline using Nested Cross-Validation to ensure unbiased performance estimation and fair model comparison.
  • Perform hyperparameter tuning for each model to identify its optimal configuration.
  • Analyze and interpret the results to determine the most suitable models for predicting protein content, considering both accuracy and stability.

πŸŽ“ Project Context

This project was developed for the Neural and Evolutionary Learning (NEL) course as part of the Master's in Data Science and Advanced Analytics program at NOVA IMS. The work was completed during the 2nd Semester of the 2024-2025 academic year.

Dataset Note: The sustavianfeed.xlsx dataset used for this project is private and cannot be distributed. All analyses and conclusions are presented within the project notebooks and the final report.

πŸ› οΈ Technology Stack & Models

This project was implemented entirely in Python, leveraging a powerful stack of libraries for evolutionary computation, deep learning, and statistical analysis.

Python Pandas NumPy
Scikit-learn PyTorch neat-python slim_gsgp
Plotly Matplotlib Seaborn


πŸ—οΈ Project Methodology & Models

The project involved a structured approach to model development, hyperparameter tuning, and comparative evaluation.

Learning Curves

Figure 1: Median and Interquartile Range (IQR) of Learning vs Test RMSE Across All Outer Folds for the six models.

1. Models Compared

The following six models were implemented and compared:

  1. Genetic Programming (GP): Evolves tree-based symbolic expressions.

    • Implementation: slim_gsgp library.
    • Tuned Hyperparameters: max_depth, p_xo, prob_const, tournament_size.
  2. Geometric Semantic Geometric Programming (GSGP): A variant of GP using geometric semantic operators.

    • Implementation: slim_gsgp library.
    • Tuned Hyperparameters: init_depth, p_xo, prob_const, tournament_size.
  3. Semantic Learning algorithm with Inflate and deflate Mutations (SLIM): Another semantic GP variant focusing on specific mutation operators.

    • Implementation: slim_gsgp library.
    • Tuned Hyperparameters: tournament_size, slim_versions, copy_parent.

    slim_gsgp

  4. Neural Network (NN): A standard feedforward neural network trained with backpropagation.

    • Implementation: PyTorch.
    • Tuned Hyperparameters: hidden_layers, nodes_per_layer, learning_rate, optimizer.

    PyTorch

  5. NeuroEvolution of Augmenting Topologies (NEAT): Evolves both network topology and weights from minimal structures.

    • Implementation: neat-python library.
    • Tuned Hyperparameters: compatibility_threshold, node_add_prob, weight_mutate_rate.

    neat-python

  6. Neural Network optimized with Genetic Algorithm (NN&GA) (Optional Exercise): A hybrid model where a GA optimizes the weights of a fixed-architecture PyTorch NN.

    • Implementation: Custom GA with PyTorch NN.
    • Focus: Implementing crossover and mutation operators for NN weights.

    PyTorch

Note: The NN&GA model was included as an extra exercise and not fully tuned due to computational constraints.


2. Evaluation Strategy

  • Nested Cross-Validation (NCV): A 10x10 NCV scheme was used for robust hyperparameter tuning (inner loop) and unbiased generalization performance estimation (outer loop). The same random seed and data splits were used across all models for fair comparison.
  • Performance Metric: Root Mean Squared Error (RMSE) was the primary metric, penalizing larger errors more heavily. Median RMSE from inner validation sets guided hyperparameter selection.
  • Statistical Analysis: A non-parametric Friedman test followed by the Nemenyi post-hoc test was conducted on the outer fold test RMSE values to determine statistically significant performance differences between models.

3. Weekly Deliverables & Development Process

The project was developed iteratively through weekly deliverables, each focusing on one of the core algorithms:

  1. GP: Implemented and tuned; analyzed bloat, overfitting, and premature convergence.
  2. GSGP: Implemented and tuned; analyzed similar characteristics.
  3. SLIM: Implemented and tuned; analyzed similar characteristics.
  4. NN: Implemented and tuned; focused on overfitting and premature convergence.
  5. NEAT: Implemented and tuned; focused on overfitting and premature convergence.
  6. NN&GA (Extra): Implemented as part of the final submission.

πŸ“Š Key Results & Findings

Boxplot Results

Figure 2: Learning and Test RMSE Across Models (10-Fold Cross-Validation).
Note: Y-axis limited for readability.



Table 1: Median and Interquartile Range (IQR) of Test RMSE for All Models Across 10 Folds.

Table Results

  • Best Performing Models: Genetic Programming (GP) and NEAT demonstrated superior and stable predictive performance.
    • GP achieved the lowest median test RMSE (4.545).
    • NEAT followed closely with a median test RMSE of 5.690 and a notably small test IQR, indicating stable performance.
  • Neural Networks (NN): Proved competitive with a median test RMSE of 7.211.
  • Semantic GP Variants (GSGP & SLIM): Showed higher test RMSEs and some underfitting/premature convergence.
    • GSGP median test RMSE: ~10.517
    • SLIM median test RMSE: ~10.618
  • NN&GA: Exhibited the highest median test RMSE (19.967) and clear overfitting, likely due to limited hyperparameter tuning (due to computational cost) and the complexities of optimizing NN weights with a GA on this dataset.
  • Statistical Significance: The Friedman test (p-value β‰ˆ 0.00) confirmed significant performance differences. The Nemenyi post-hoc test revealed that GP and NEAT significantly outperformed NN&GA, and SLIM performed significantly worse than GP and NEAT. No significant differences were found between GP, NEAT, and NN, or between GSGP and SLIM.
  • Challenges: The small dataset size (96 samples) was a primary challenge, amplifying performance variability and making hyperparameter tuning very sensitive. Computational intensity, especially for NEAT and NN&GA, constrained the extent of hyperparameter exploration. Evolutionary algorithms often faced premature convergence or bloat.

πŸš€ Deliverables & Outputs

  • Weekly Notebooks: Submitted throughout the course, detailing the implementation and tuning of each algorithm.
  • Final Report: A comprehensive report (max 4 pages + references) focusing on results, discussion, and model comparison. (The PDF provided serves as this).
  • Final Code: Aggregation of all weekly codes and final analysis scripts.

λ§Ί Conclusion

This project provided a thorough comparative analysis of six Neural and Evolutionary Learning models for predicting crude protein content. GP and NEAT emerged as the most robust and accurate predictors for this specific regression task and dataset. The study highlighted the critical role of robust evaluation methodologies like NCV, especially with small datasets, and underscored the unique challenges and strengths of different NEL paradigms. Future work could involve applying these models to larger datasets, exploring more extensive hyperparameter tuning, and investigating different architectural or evolutionary strategies to further enhance performance and generalization.


For detailed implementation and results, please refer to the project notebooks and the final report. Good luck with your own NEL projects! πŸš€


πŸ“„ References

[1] A Vanneschi, L., & Silva, S. (2023). Lectures on Intelligent Systems. Springer Nature.

[2] Vanneschi, L. (2024). SLIM_GSGP: The Non-bloating Geometric Semantic Genetic Programming. Lecture Notes in Computer Science, 125–141. https://doi.org/10.1007/978-3-031-56957-9_8

[3] Stanley, K. O., & Miikkulainen, R. (2002). Evolving Neural Networks through Augmenting Topologies. Evolutionary Computation, 10(2), 99–127. https://doi.org/10.1162/106365602320169811

[4] Rainio, O., Teuho, J., & KlΓ©n, R. (2024). Evaluation metrics and statistical tests for machine learning. Scientific Reports, 14(1), 1–14. Nature. https://doi.org/10.1038/s41598-024-56706-x


πŸ‘₯ Team Members (Group 1)

  • AndrΓ© Silvestre (20240502)
  • Filipa Pereira (20240509)
  • Umeima Mahomed (20240543)

About

πŸ” Master's project comparing 6 Neural & Evolutionary Learning models (GP, NEAT, NNs) to predict protein in chicken carcasses using a 10x10 Nested CV. Developed for the NEL course at NOVA IMS.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •