This project developed predictive models to estimate processing times using real historical data from an ETO company specialized in the production of unique and highly complex industrial equipment. A complete data science pipeline was implemented, encompassing data preprocessing, exploratory analysis, and the modeling of predictive algorithms. In total, 60 models were generated by combining five data encoding strategies with twelve different machine learning algorithms. The modeling process was supported by statistical analyses of data distribution, employing visualization tools and goodness-of-fit tests. Two distinct modeling approaches were explored: one using a dataset encompassing all product types, and another focused specifically on a single product category (boilers). Model performance was assessed using various metrics, including the coefficient of determination (R²) and multiple error-based indicators.
This project has parts written in R and parts written in Python, developed in different cycles and organized in different folders. The aforementioned model containing combinations of regression algorithms and encoding techniques was developed entirely in Python (needs Scikit-Learn 1.5.2 to work).
The figure bellow summarizes the workflow of the pipeline development process.
This project was funded by the Rio de Janeiro State Foundation to Support Research (FAPERJ - E26/0.10.001920/2019).
