Skip to content

Demultiplexing tip for undetermined fastq files #22

@bfremin

Description

@bfremin

We have been getting data back as a giant fastq file of undetermined reads (instead of bcl) with the barcode in the read name. Most tools that demultiplex from fastq were very slow, could not be parallelized, and/or failed. This is just a pre-preprocessing tip.

You need two files (a file that lists your barcodes, and a script)

barcodes.txt:
samplenameA GGACTCCT+AGAGGATA
samplenameB TAGGCATG+AGAGGATA
samplenameC CTCTCTAC+AGAGGATA
...all your samples

demultiplex.sh
#!/bin/bash
module load sickle/1.33

#demultiplex samples
grep -A3 --no-group-separator -i $2 {giant_UndeterminedFile_1.fq} | gzip > $1_1.fq.gz &
grep -A3 --no-group-separator -i $2 {giant_UndeterminedFile_2.fq} | gzip > $1_2.fq.gz &
wait

#remove instances that do not have pairs (trimming will fail if you do not)
sickle pe -f $1_1.fq.gz -r $1_2.fq.gz -t sanger -o paired_$1_1.fq -p paired_$1_2.fq -s $1_single.fq

Run:
cat barcodes.txt | xargs -l bash -c 'sbatch ..... demultiplex.sh $0 $1'

Will save you a lot of time instead of trying existing tools.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions