Skip to content

QC pipeline #19

@shraddhapai

Description

@shraddhapai

We need a pipeline for preprocessing steps in assessing data quality and data cleaning before running the predictor. Currently there is no such mechanism in place. Operations pipeline would run:

  • identify structure in missingness of data
  • identify and flag outlier samples
  • run some unsupervised analyses on the samples. e.g. pca, hierarchical clustering
  • For continuous-valued data, compare several similarity metrics to find one which best separates classes. e.g. RNAcorr.R written by SP for PanCancer
  • Hierarchical clustering of classes and PCA, following same idea.
  • Running univariate test to prune matrix of variables that goes into netDx.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions