Skip to content

Config System YAML #9

@jfear

Description

@jfear

Starting to think about the config system YAML.

#### General Settings ####
settings: # location of system level settings
    title: My Very Cool Project
    Author: Bob
    data: /data/bob/original_data # path like settings
    python2: py2.7 # conda environment names
    env: HOME # names to access specific envs by 

#### Experiment Level Settings ####
exp.settings: # experiment level settings, settings that apply to all samples
    sampleinfo: sample_metadata.csv # Sample information relating sample specific settings to sample ids
    fastq_suffix: '.fastq.gz'  # it would be nice to be able define a setting here that applies to all samples, or define for each sample in the sampleinfo table case they are different. 
    annotation: # Need to some way to specify annotation to use, maybe here is not the best place.
        genic: /data/...
        transcript: /data/....
        intergenic: /data/...
    models: # add modeling information here
        formula: ~ sex + tissue + time
        factors: # tell which columns in sample table should be treated like factors
             - sex
             - tissue
             - time

#### Workflow Settings ####
# I think using a naming scheme that follow folder structure would be useful. For example:
# if there is a workflows folder then we would have
workflows.qc: # could define workflow specific settings
    steps_to_run: # List pieces of the pipeline to run, (or not run may be better)
        - fastqc
        - rseqc
    trim: True # or could have logical operators switches to change workflow behavior

workflows.align:
    aligner: 'tophat2=2.1.0' # define what software to use and optionally what version
    aggregated_output_dir: /data/...
    report_output_dir: /data/...

workflow.rnaseq: ...

workflows.references: ... 

#### Rule Specific Settings ####
rules.align.bowtie2: # rule level settings again with naming based on folder structure if we need folder structure
    cluster: # It would be nice to be able to have cluster settings with rule setting, can't think of a way to get this to work, probably just need a separate cluster config.
        threads: 16
        mem: 60g
        walltime: 8:00:00
    index: /data/... # bowtie index prefix
    params: # Access to any parameters that need set
        options: -p 16 -k 8 # place to change the options
    aln_suffix: '.bt2.bam'  # place to change how files are named
    log_suffix: '.bt2.log'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions