Skip to content

KnowEnG/Phenotype_Prediction_Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KnowEnG's Phenotype Prediction Pipeline

This is the Knowledge Engine for Genomics (KnowEnG), an NIH BD2K Center of Excellence, Phenotype Prediction Pipeline.

This pipeline **predicts" the relative importance of a set of genes associated with a given phenotype.

This pipeline supports two regression models:

Options Method Parameters
Elastic Net Elastic elastic_net
Lasso Lasso Lasso

How to run this pipeline with our data


1. Clone the Phenotype_Prediction_Pipeline

 git clone https://github.com/KnowEnG/Phenotype_Prediction_Pipeline.git

2. Install the following (Ubuntu or Linux)

apt-get install -y python3-pip
apt-get install -y libblas-dev liblapack-dev libatlas-base-dev gfortran
pip3 install numpy==1.11.1
pip3 install pandas==0.18.1
pip3 install scipy==0.18.0
pip3 install scikit-learn==0.17.1
apt-get install -y libfreetype6-dev libxft-dev
pip3 install matplotlib=1.4.2
pip3 install pyyaml
pip3 install knpackage

3. Change directory to Phenotype_Prediction_Pipeline

cd Phenotype_Prediction_Pipeline

4. Change directory to test

cd test

5. Create a local directory "run_dir" and place all the run files in it

make env_setup

6. Select and run a phenotype prediction option:

  • Run Elastic Net pipeline
make run_elastic_net
  • Run Lasso pipeline
make run_lasso

Follow steps 1-3 above then do the following:

* Create your results directory

mkdir results_directory

* Create run_parameters file (YAML Format)

Look for examples of run_parameters in:

Phenotype_Prediction_Pipeline/data/run_files BENCHMARK_2_ElasticNet.yml

Phenotype_Prediction_Pipeline/data/run_files BENCHMARK_1_Lasso.yml

* Run

Using Elastic net

python3 ../src/phenotype_prediction.py -run_directory ./run_dir -run_file BENCHMARK_2_ElasticNet.yml

Using Lasso

python3 ../src/phenotype_prediction.py -run_directory ./run_dir -run_file BENCHMARK_1_Lasso.yml

Description of "run_parameters" file


Key Value Comments
Method elastic_net_predict scikit-learn.org elastic-net
Method lasso_predict scikit-learn.org lasso
results_directory directory Directory to save the output files
spreadsheet_name_full_path spreadsheet_name Input Gene Expression data
response_name_full_path response_name Input Drug Response data
test_spreadsheet_name_full_path test_spreadsheet_name Input testing feature data
min_alpha float number Minimum number in alpha list
max_alpha float number Maximum number in alpha list
tolerance float number The tolerance for the optimization
fit_intercept boolean value whether to calculate the intercept for this model
normalize boolean value whether the regressors will be normalized
max_iter integer number The maximum number of iterations
n_alpha integer number Number of alphas in alpha list
min_l1 float number Minimum l1 in the grid of l1
max_l1 float number Maximum l1 in the grid of l1
n_l1 integer number Length of grid of l1
eps float number Length of the path

spreadsheet_name = features_train_clean.df
response_name = response_train_clean.df
test_spreadsheet_name = features_test_clean.df


Description of Output files are saved in results directory


Gene Name Relarive Importance
User Gene 1 Float
... ...
User Gene n Float

About

Regression to infer an 'omic'-phenotype association

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •