This is an open-source machine learning repository in Python developed by research students at at Colgate University. This package imploys some of the most popular feature selection and classification methods to build predictive and analytical models for any dataset.
The package impute missing data, performs grid search to find the best predictive model, and oversample data to get rid of any imbalance in the dataset which makes it easy for people with no coding experience to implement in their own field of research.
The package is run from main.py where you can enter your file name, specify the dependent variable, and configure the predictive models. In addition, all the functions in the package are provided with detailed reference for easy navigation.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
Beside the Anaconda package, these libraries will be needed:
- pandas
- os
- numpy
- imblearn
- scikit-learn
- pyreadstat
- statistics
- math
- statsmodels.api
- joblib
- xlsxwriter
- openpyxl
- glob
- seaborn
- matplotlib
- ReliefF
- xgboost
You can install these libraries by running the "conda install -c conda-forge" or "pip install -U" commands followed by the name of the library in the command line for Windows/MacOs, and "conda install -c conda-forge" or "pip3 install -U" for Linux.
Please refer to the Anaconda docs (https://docs.anaconda.com/anaconda/install/) on how to install the Anaconda package.
Ziad Attia: zattia@colgate.edu