fastqc -o ./reads/fastqc reads/SRR292678sub_S1_L001_R1_001.fastq.gz reads/SRR292678sub_S1_L001_R2_001.fastq.gzReads count:
SRR292678sub_S1_L001_R1_001.fastq.gz - 5499346
SRR292678sub_S1_L001_R2_001.fastq.gz - 5499346
The assembled genome was downloaded into the refs folder and extracted manually using the following command:
unzip SRR292678.zipquast.py refs/SRR292678/contigs.fasta refs/SRR292678/scaffolds.fasta -t 1 -o quast/short_reads > quast/short_reads/quast.stdout.log 2> quast/short_reads/quast.stderr.log The assembled genome (short + long reads) was downloaded to the refs folder:
quast.py refs/SRR292678/contigs.fasta refs/SRR292678/scaffolds.fasta refs/scaffolds.fasta -t 1 -o quast/long_reads > quast/long_reads/quast.stdout.log 2> quast/long_reads/quast.stderr.log contigs: short contings
SRR292678_scaffolds: short scaffolds
refs_scaffolds: long scaffolds
Task: Also answer, how did the quality of the assembly improved and why.
Answer:
The assembly showed a clear improvement: the number of contigs dropped sharply (Contigs: 210/221 → 26) while their length increased significantly (Largest contig: 300,763 → 2,579,755 / N50: 111,860 → 968,098).
This result is achieved because long reads can span repetitive sequences and reduce the complexity of the assembly graph.
prokka --outdir prokka --force --centre XXX refs/scaffolds.fastabarrnap refs/scaffolds.fasta --kingdom bac --threads 6 > barrnap/barrnap_my_bac.gff 2> barrnap/barrnap_my_bac.stderr.log
awk 'NR==1 || $9 ~ /^Name=16S_rRNA/' barrnap/barrnap_my_bac.gff > barrnap/barrnap_my_bac.16s_rrna.gff
awk '!/^#/ {print $1 "\t" $4-1 "\t" $5 "\t" $9}' barrnap/barrnap_my_bac.16s_rrna.gff > barrnap/barrnap_my_bac.16s_rrna.bed
bedtools getfasta -fi refs/scaffolds.fasta -bed barrnap/barrnap_my_bac.16s_rrna.bed > barrnap/barrnap_my_bac.16s_rrna.faWe found 8 regions that match 16S rRNA.
One of these regions aligned to only 25% of the full 16S rRNA sequence.
Four regions are on the forward strand, and four are on the reverse strand.
All 8 regions have different positions on the chromosome. They are most likely unique sequences (transcribed in opposite directions) and not just reverse-complement copies of each other.
Name (DEFINITION): Escherichia coli 55989, complete sequence.
Annotation Name: GCF_000026245.1-RS_2025_06_09
ACCESSION: NC_011748
Mauve was downloaded from the source.

In the E. coli X strain we studied, we found two genes that code for Shiga toxins. To be precise, these are two subunits.
- stxA (4445290-4446249)
- stxB (4446261-4446530)
Yes, my file has many "hypothetical proteins". However, the nearby nohA_3 gene is a clue. This gene codes for a prophage DNA-packing protein. This suggests that the stxA and stxB genes were likely acquired by horizontal transfer. Specifically, this probably happened through lysogeny, which is when a phage genome integrates into the bacterial chromosome.
- nohA_3 (Prophage DNA-packing protein NohA) (4448169-4448717)

The likely mechanism involves mobile genetic elements — specifically, plasmids that carry transposons.
- tnpR_1 (Transposon Tn3 resolvase) (2915041-2915292)
- tnpR_2 (Transposon Tn3 resolvase) (5263638-5264195)
.
├── README.md
├── barrnap
│ ├── barrnap_my_bac.16s_rrna.bed
│ ├── barrnap_my_bac.16s_rrna.fa
│ ├── barrnap_my_bac.16s_rrna.gff
│ ├── barrnap_my_bac.gff
│ └── barrnap_my_bac.stderr.log
├── blast
│ └── 55989.fasta
├── images
│ ├── 55989_ar.png
│ ├── bla.png
│ ├── blast_results.png
│ ├── quast_output.png
│ ├── shiga_genes.png
│ └── x_ar.png
├── mauve
│ ├── mauve_results
│ ├── mauve_results.backbone
│ ├── mauve_results.bbcols
│ └── mauve_results.guide_tree
├── prokka
│ ├── PROKKA_12132025.err
│ ├── PROKKA_12132025.faa
│ ├── PROKKA_12132025.ffn
│ ├── PROKKA_12132025.fna
│ ├── PROKKA_12132025.fsa
│ ├── PROKKA_12132025.gbf-r
│ ├── PROKKA_12132025.gbk
│ ├── PROKKA_12132025.gbk.sslist
│ ├── PROKKA_12132025.gff
│ ├── PROKKA_12132025.log
│ ├── PROKKA_12132025.sqn
│ ├── PROKKA_12132025.tbl
│ ├── PROKKA_12132025.tsv
│ └── PROKKA_12132025.txt
├── quast
│ ├── long_reads
│ │ ├── basic_stats
│ │ │ ├── GC_content_plot.pdf
│ │ │ ├── Nx_plot.pdf
│ │ │ ├── SRR292678_scaffolds_GC_content_plot.pdf
│ │ │ ├── SRR292678_scaffolds_coverage_histogram.pdf
│ │ │ ├── contigs_GC_content_plot.pdf
│ │ │ ├── contigs_coverage_histogram.pdf
│ │ │ ├── coverage_histogram.pdf
│ │ │ ├── cumulative_plot.pdf
│ │ │ ├── refs_scaffolds_GC_content_plot.pdf
│ │ │ └── refs_scaffolds_coverage_histogram.pdf
│ │ ├── icarus.html
│ │ ├── icarus_viewers
│ │ │ └── contig_size_viewer.html
│ │ ├── quast.log
│ │ ├── quast.stderr.log
│ │ ├── quast.stdout.log
│ │ ├── report.html
│ │ ├── report.pdf
│ │ ├── report.tex
│ │ ├── report.tsv
│ │ ├── report.txt
│ │ ├── transposed_report.tex
│ │ ├── transposed_report.tsv
│ │ └── transposed_report.txt
│ └── short_reads
│ ├── basic_stats
│ │ ├── GC_content_plot.pdf
│ │ ├── Nx_plot.pdf
│ │ ├── contigs_GC_content_plot.pdf
│ │ ├── contigs_coverage_histogram.pdf
│ │ ├── coverage_histogram.pdf
│ │ ├── cumulative_plot.pdf
│ │ ├── scaffolds_GC_content_plot.pdf
│ │ └── scaffolds_coverage_histogram.pdf
│ ├── icarus.html
│ ├── icarus_viewers
│ │ └── contig_size_viewer.html
│ ├── quast.log
│ ├── quast.stderr.log
│ ├── quast.stdout.log
│ ├── report.html
│ ├── report.pdf
│ ├── report.tex
│ ├── report.tsv
│ ├── report.txt
│ ├── transposed_report.tex
│ ├── transposed_report.tsv
│ └── transposed_report.txt
├── reads
│ ├── SRR292678sub_S1_L001_R1_001.fastq.gz
│ ├── SRR292678sub_S1_L001_R2_001.fastq.gz
│ └── fastqc
│ ├── SRR292678sub_S1_L001_R1_001_fastqc.html
│ ├── SRR292678sub_S1_L001_R1_001_fastqc.zip
│ ├── SRR292678sub_S1_L001_R2_001_fastqc.html
│ └── SRR292678sub_S1_L001_R2_001_fastqc.zip
├── refs
│ ├── SRR292678
│ │ ├── contigs.fasta
│ │ ├── scaffolds.fasta
│ │ └── spades.log
│ ├── SRR292678.zip
│ ├── assembly_graph_with_scaffolds.gfa
│ ├── scaffolds.fasta
│ └── scaffolds.fasta.fai
├── setup.sh
└── tree.txt
17 directories, 90 files


