Modular_ML_functions

This repo contains a set of ML functions to assess performance of different combinations of cluster-assigned features.

The purpose of these functions is to conduct an exploratory analysis to identify an optimal machine learning algorithm for the user's data, and to fine-tune the feature space to use in the identified algorithm via a combinatorix exercise. Additionally, there is functionality to create an ensemble model from these functions and feed the outputs into a specific ensemble model function here.

Running the model search functions

To execute these functions, users will require the following:

CARET and progressr libraries installed (caretEnsemble if the ensemble function will be used)
Training and validation data matrices
Training and validation metadata tables with the actual classification assignments as factors a) For the svm and ML_suite functions, an id_column will be required to combine the data matrices and metadata tables since caret's svm wrapper requires formulaic input
A data.frame of the complete feature space and their cluster assignments, known as modules a) Clusters can be self-assigned based on self-defined heuristics, or assigned via clustering algorithms such as hierarchical clustering
A vector of strings containing all desired module combinations to test, with modules within each combination seperated by an underscore (i.e. "_")

Optinal components for running the model search functions

While the functions are set to run on their own, users have the option to tune the model design and outputs via the following:

Caret's trainControl object can be included by the user should they want to define different paramaters than the function-defined default (i.e. folds, repeats, method, etc.)
The metric used by the train function, as well as whether to maximize or minimize, are definable. a) NOTE: Caret's twoClassSummary and multiClassSummary are used here, and so the selected metrics must align with either of these
All functions are capable of running on either a single thread or in parallel. Should parallel processing be desired, the user is responsible for establishing their own parallel back-end a) For an example of establishing a parallel back-end see doFutures from Tidyverse for more information

Function outputs

All functions will output their generated model and confusion matrix objects for further assessment. Additionally, all (except the ML_suite and ensemble functions) will print performance statistics for each combination tested as the function runs so users can view them in real-time.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
Binary.Multiclass module model functions.R		Binary.Multiclass module model functions.R
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modular_ML_functions

Running the model search functions

Optinal components for running the model search functions

Function outputs

About

Uh oh!

Releases

Packages

Languages

rafev/Modular_ML_functions

Folders and files

Latest commit

History

Repository files navigation

Modular_ML_functions

Running the model search functions

Optinal components for running the model search functions

Function outputs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages