🐔 Neural & Evolutionary Learning: Predicting Protein in Chicken Carcasses 🧬

📝 Description

This project delves into the challenging regression task of predicting crude protein weight in chicken carcasses using a variety of physical measurements. The core of this work is a rigorous comparative analysis of six distinct Neural and Evolutionary Learning (NEL) models. By applying a robust 10x10 Nested Cross-Validation methodology, we evaluate the behavior, performance, and applicability of each algorithm on a small but complex dataset, aiming to identify the most effective predictive models.

✨ Objective

The primary objectives of this project are to:

Implement and compare six different neural and evolutionary algorithms for a regression task.
Develop a robust evaluation pipeline using Nested Cross-Validation to ensure unbiased performance estimation and fair model comparison.
Perform hyperparameter tuning for each model to identify its optimal configuration.
Analyze and interpret the results to determine the most suitable models for predicting protein content, considering both accuracy and stability.

🎓 Project Context

This project was developed for the Neural and Evolutionary Learning (NEL) course as part of the Master's in Data Science and Advanced Analytics program at NOVA IMS. The work was completed during the 2nd Semester of the 2024-2025 academic year.

Dataset Note: The sustavianfeed.xlsx dataset used for this project is private and cannot be distributed. All analyses and conclusions are presented within the project notebooks and the final report.

🛠️ Technology Stack & Models

This project was implemented entirely in Python, leveraging a powerful stack of libraries for evolutionary computation, deep learning, and statistical analysis.

🏗️ Project Methodology & Models

The project involved a structured approach to model development, hyperparameter tuning, and comparative evaluation.

Figure 1: Median and Interquartile Range (IQR) of Learning vs Test RMSE Across All Outer Folds for the six models.

1. Models Compared

The following six models were implemented and compared:

Genetic Programming (GP): Evolves tree-based symbolic expressions.
- Implementation: slim_gsgp library.
- Tuned Hyperparameters: max_depth, p_xo, prob_const, tournament_size.
Geometric Semantic Geometric Programming (GSGP): A variant of GP using geometric semantic operators.
- Implementation: slim_gsgp library.
- Tuned Hyperparameters: init_depth, p_xo, prob_const, tournament_size.
Semantic Learning algorithm with Inflate and deflate Mutations (SLIM): Another semantic GP variant focusing on specific mutation operators.
- Implementation: slim_gsgp library.
- Tuned Hyperparameters: tournament_size, slim_versions, copy_parent.
Neural Network (NN): A standard feedforward neural network trained with backpropagation.
- Implementation: PyTorch.
- Tuned Hyperparameters: hidden_layers, nodes_per_layer, learning_rate, optimizer.
NeuroEvolution of Augmenting Topologies (NEAT): Evolves both network topology and weights from minimal structures.
- Implementation: neat-python library.
- Tuned Hyperparameters: compatibility_threshold, node_add_prob, weight_mutate_rate.
Neural Network optimized with Genetic Algorithm (NN&GA) (Optional Exercise): A hybrid model where a GA optimizes the weights of a fixed-architecture PyTorch NN.
- Implementation: Custom GA with PyTorch NN.
- Focus: Implementing crossover and mutation operators for NN weights.

Note: The NN&GA model was included as an extra exercise and not fully tuned due to computational constraints.

2. Evaluation Strategy

Nested Cross-Validation (NCV): A 10x10 NCV scheme was used for robust hyperparameter tuning (inner loop) and unbiased generalization performance estimation (outer loop). The same random seed and data splits were used across all models for fair comparison.
Performance Metric: Root Mean Squared Error (RMSE) was the primary metric, penalizing larger errors more heavily. Median RMSE from inner validation sets guided hyperparameter selection.
Statistical Analysis: A non-parametric Friedman test followed by the Nemenyi post-hoc test was conducted on the outer fold test RMSE values to determine statistically significant performance differences between models.

3. Weekly Deliverables & Development Process

The project was developed iteratively through weekly deliverables, each focusing on one of the core algorithms:

GP: Implemented and tuned; analyzed bloat, overfitting, and premature convergence.
GSGP: Implemented and tuned; analyzed similar characteristics.
SLIM: Implemented and tuned; analyzed similar characteristics.
NN: Implemented and tuned; focused on overfitting and premature convergence.
NEAT: Implemented and tuned; focused on overfitting and premature convergence.
NN&GA (Extra): Implemented as part of the final submission.

📊 Key Results & Findings

Figure 2: Learning and Test RMSE Across Models (10-Fold Cross-Validation).
Note: Y-axis limited for readability.

Table 1: Median and Interquartile Range (IQR) of Test RMSE for All Models Across 10 Folds.

Best Performing Models: Genetic Programming (GP) and NEAT demonstrated superior and stable predictive performance.
- GP achieved the lowest median test RMSE (4.545).
- NEAT followed closely with a median test RMSE of 5.690 and a notably small test IQR, indicating stable performance.
Neural Networks (NN): Proved competitive with a median test RMSE of 7.211.
Semantic GP Variants (GSGP & SLIM): Showed higher test RMSEs and some underfitting/premature convergence.
- GSGP median test RMSE: ~10.517
- SLIM median test RMSE: ~10.618
NN&GA: Exhibited the highest median test RMSE (19.967) and clear overfitting, likely due to limited hyperparameter tuning (due to computational cost) and the complexities of optimizing NN weights with a GA on this dataset.
Statistical Significance: The Friedman test (p-value ≈ 0.00) confirmed significant performance differences. The Nemenyi post-hoc test revealed that GP and NEAT significantly outperformed NN&GA, and SLIM performed significantly worse than GP and NEAT. No significant differences were found between GP, NEAT, and NN, or between GSGP and SLIM.
Challenges: The small dataset size (96 samples) was a primary challenge, amplifying performance variability and making hyperparameter tuning very sensitive. Computational intensity, especially for NEAT and NN&GA, constrained the extent of hyperparameter exploration. Evolutionary algorithms often faced premature convergence or bloat.

🚀 Deliverables & Outputs

Weekly Notebooks: Submitted throughout the course, detailing the implementation and tuning of each algorithm.
Final Report: A comprehensive report (max 4 pages + references) focusing on results, discussion, and model comparison. (The PDF provided serves as this).
Final Code: Aggregation of all weekly codes and final analysis scripts.

맺 Conclusion

This project provided a thorough comparative analysis of six Neural and Evolutionary Learning models for predicting crude protein content. GP and NEAT emerged as the most robust and accurate predictors for this specific regression task and dataset. The study highlighted the critical role of robust evaluation methodologies like NCV, especially with small datasets, and underscored the unique challenges and strengths of different NEL paradigms. Future work could involve applying these models to larger datasets, exploring more extensive hyperparameter tuning, and investigating different architectural or evolutionary strategies to further enhance performance and generalization.

For detailed implementation and results, please refer to the project notebooks and the final report. Good luck with your own NEL projects! 🚀

📄 References

[1] A Vanneschi, L., & Silva, S. (2023). Lectures on Intelligent Systems. Springer Nature.

[2] Vanneschi, L. (2024). SLIM_GSGP: The Non-bloating Geometric Semantic Genetic Programming. Lecture Notes in Computer Science, 125–141. https://doi.org/10.1007/978-3-031-56957-9_8

[3] Stanley, K. O., & Miikkulainen, R. (2002). Evolving Neural Networks through Augmenting Topologies. Evolutionary Computation, 10(2), 99–127. https://doi.org/10.1162/106365602320169811

[4] Rainio, O., Teuho, J., & Klén, R. (2024). Evaluation metrics and statistical tests for machine learning. Scientific Reports, 14(1), 1–14. Nature. https://doi.org/10.1038/s41598-024-56706-x

👥 Team Members (Group 1)

André Silvestre (20240502)
Filipa Pereira (20240509)
Umeima Mahomed (20240543)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
EDA_Outputs		EDA_Outputs
FinalResults		FinalResults
NEATconfig/NEAT_Chicken		NEATconfig/NEAT_Chicken
img		img
log		log
.gitignore		.gitignore
0_EDA_NELProject.ipynb		0_EDA_NELProject.ipynb
1_GP_NELProject.ipynb		1_GP_NELProject.ipynb
2_GSGP_NELProject.ipynb		2_GSGP_NELProject.ipynb
3_SLIM_NELProject.ipynb		3_SLIM_NELProject.ipynb
4_NN_NELProject.ipynb		4_NN_NELProject.ipynb
5_NEAT_NELProject.ipynb		5_NEAT_NELProject.ipynb
6_Evaluation_NELProject.ipynb		6_Evaluation_NELProject.ipynb
Extra_NN_GA_NELProject.ipynb		Extra_NN_GA_NELProject.ipynb
LICENSE		LICENSE
NEL_Group1_Report.pdf		NEL_Group1_Report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🐔 Neural & Evolutionary Learning: Predicting Protein in Chicken Carcasses 🧬

📝 Description

✨ Objective

🎓 Project Context

🛠️ Technology Stack & Models

🏗️ Project Methodology & Models

1. Models Compared

2. Evaluation Strategy

3. Weekly Deliverables & Development Process

📊 Key Results & Findings

🚀 Deliverables & Outputs

맺 Conclusion

📄 References

👥 Team Members (Group 1)

About

Uh oh!

Contributors 3

Uh oh!

Languages

License

Silvestre17/NEL_ChickenProteinPrediction_MasterProject

Folders and files

Latest commit

History

Repository files navigation

🐔 Neural & Evolutionary Learning: Predicting Protein in Chicken Carcasses 🧬

📝 Description

✨ Objective

🎓 Project Context

🛠️ Technology Stack & Models

🏗️ Project Methodology & Models

1. Models Compared

2. Evaluation Strategy

3. Weekly Deliverables & Development Process

📊 Key Results & Findings

🚀 Deliverables & Outputs

맺 Conclusion

📄 References

👥 Team Members (Group 1)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 3

Uh oh!

Languages