Demultiplexing tip for undetermined fastq files

We have been getting data back as a giant fastq file of undetermined reads (instead of bcl) with the barcode in the read name. Most tools that demultiplex from fastq were very slow, could not be parallelized, and/or failed. This is just a pre-preprocessing tip.

You need two files (a file that lists your barcodes, and a script)

**barcodes.txt:**
samplenameA	GGACTCCT+AGAGGATA
samplenameB	TAGGCATG+AGAGGATA
samplenameC	CTCTCTAC+AGAGGATA
...all your samples


**demultiplex.sh**
#!/bin/bash
module load sickle/1.33

#demultiplex samples
grep -A3 --no-group-separator -i $2 {giant_UndeterminedFile_1.fq} | gzip  > $1_1.fq.gz &
grep -A3 --no-group-separator -i $2 {giant_UndeterminedFile_2.fq} | gzip  > $1_2.fq.gz &
wait

#remove instances that do not have pairs (trimming will fail if you do not)
sickle pe -f $1_1.fq.gz -r $1_2.fq.gz -t sanger -o paired_$1_1.fq -p paired_$1_2.fq -s $1_single.fq

**Run:**
cat barcodes.txt | xargs -l bash -c 'sbatch ..... demultiplex.sh $0 $1'

Will save you a lot of time instead of trying existing tools. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demultiplexing tip for undetermined fastq files #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Demultiplexing tip for undetermined fastq files #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions