RNAseq_ML

Investigating methods to undertake feature selection and reduction on RNA-seq data.

A large number of Gleason score 7s (80/105 -> 10/35) have been temp removed as they were causing bias (with 23,281 features).

Draft scripts for discovering best methods for feature selection with RNAseq data (using sklearn package). To run:
$ python3 test_script.py

Mehods include:

Also PCA analysis

Multilabel confusion matrix (normalised for true data):

Validation curve:

Feature cross validation scores are visualised in order of method used for feature selection at the end:

For the high correlation filter, a heatmap is generated:

Feature importance is also extracted and plotted at each step:

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
example_figs		example_figs
figures		figures
logs		logs
suppcode		suppcode
GSE54460_FPKM-genes-TopHat2-106samples-12-4-13.txt		GSE54460_FPKM-genes-TopHat2-106samples-12-4-13.txt
README.md		README.md
methods_functions.py		methods_functions.py
pipeline.py		pipeline.py
test_script.py		test_script.py

Provide feedback