Skip to content

Test_data_examples

dzhao2019 edited this page Sep 23, 2025 · 6 revisions

Example dataset to test Eukfinder with short read data

1. Prepare short read using Eukfinder read_prep

 

Input files

paired fastq files

  •  test_R1.fastq

  •  test_R2.fastq

test host genome

  •  test.host.fasta

 

Example command line to run Eukfinder read_prep with test data

     conda activate Eukfinder

     Eukfinder read_prep  --r1 test_R1.fastq --r2 test_R2.fastq  \
                          -n 10 --hcrop 10 -l 15 -t 15 --wsize 40 --qscore 25 --mlen 40  --mhlen 40\
                          -o read_prep  -i "adaptor_file_location" --hg test.host.fasta 

Output files

  • Two paired fastq files one unpaired fastq file

    • read_prep_p.1.fastq

    • read_prep_p.2.fastq

  • one unpaired fastq file

    • read_prep_un.fastq
  • Two centrifuge result files for paired and unpaired reads

    • read_prep_centrifuge_P

    • read_prep_centrifuge_UP

read_prep output files can also be downloaded from prep output

2. Short reads classification using Eukfinder short_seqs

 

Input files

  • use the 5 output files from Eukfinder read_prep above

    • Two paired fastq files one unpaired fastq file

      • read_prep_p.1.fastq

      • read_prep_p.2.fastq

    • one unpaired fastq file

      • read_prep_un.fastq
    • Two centrifuge result files for paired and unpaired reads

      • read_prep_centrifuge_P

      • read_prep_centrifuge_UP

 

Example command to run Eukfinder short_seqs with test data

     conda activate Eukfinder

     Eukfinder short_seqs --r1 read_prep_p.1.fastq --r2 read_prep_p.2.fastq --un read_prep_un.fastq \
                          -o shortread_test -n 10 -z 10 -t T --max_m 100 \
                          -e 0.01 --pid 60 --cov 30 --mhlen 50  \
                          --pclass read_prep_centrifuge_P --uclass read_prep_centrifuge_UP 

Output files

  • Located in Directory Eukfinder_results

  • Up to six fastq files are possible (bacterial, archaeal, eukaryotic, viral, unknown and eukaryotic+unknown).
     Note this example includes only Archaea, Bacteria, Eukaryote and unknown sequences, so no Misc.fasta file will be created

    • shortread_test.Arch.fq (contigs classified as archaea)
    • shortread_test.Bact.fq (contigs classified as bacteria)
    • shortread_test.Euk.fq (contigs classified as eukaryote)
    • shortread_test.Unk.fq (unclassified contigs)
    • shortread_test.EUnk.fq (combined eukaryote classified or unclassified contigs)

Example output results can be downloaded from here: short_seqs output

 

 

Example dataset to test Eukfinder with long read data

 

1. Long reads classification

 

Input files

  • longreads.fastq (this is example longread sequence data)

 

Example command to run Eukfinder_long with test data

 conda activate Eukfinder

 python eukfinder.py long_seqs -l longreads.fastq -o longreads_test  \
                                -n 48 -z 6 -t False \
                                -e 0.01 --pid 60 --cov 30 --mhlen 100 

## Output files

  • Located in Directory Eukfinder_results

  • Up to six fastq files possible (bacterial, archaeal, eukaryotic, viral, unknown and eukaryotic+unknown).
     Note this example includes only Bacteria, Eukaryote and unknown sequences, so no Arch.fq or Misc.fq file will be created

    • longreads_test.Bact.fq (contigs classified as bacteria)
    • longreads_test.Euk.fq (contigs classified as eukaryote)
    • longreads_test.Unk.fq (unclassified contigs)
    • longreads_test.EUnk.fq (contigs classified as eukaryote or unclassified )

Example output files can be downloaded from here: long_seqs output