Skip to content

CaptainLabMan/practicum_project_3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1. Exploring the dataset

fastqc -o ./reads/fastqc reads/SRR292678sub_S1_L001_R1_001.fastq.gz reads/SRR292678sub_S1_L001_R2_001.fastq.gz

Reads count:

SRR292678sub_S1_L001_R1_001.fastq.gz - 5499346
SRR292678sub_S1_L001_R2_001.fastq.gz - 5499346

2. (COMPLETELY OPTIONAL!). K-mer profile and genome size estimation

3. Assembling E. coli X genome from paired reads

The assembled genome was downloaded into the refs folder and extracted manually using the following command:

unzip SRR292678.zip
quast.py refs/SRR292678/contigs.fasta refs/SRR292678/scaffolds.fasta -t 1 -o quast/short_reads > quast/short_reads/quast.stdout.log 2> quast/short_reads/quast.stderr.log 

4. Impact of long reads

The assembled genome (short + long reads) was downloaded to the refs folder:

quast.py refs/SRR292678/contigs.fasta refs/SRR292678/scaffolds.fasta refs/scaffolds.fasta -t 1 -o quast/long_reads > quast/long_reads/quast.stdout.log 2> quast/long_reads/quast.stderr.log   

QUAST report:
quast_output

contigs: short contings
SRR292678_scaffolds: short scaffolds
refs_scaffolds: long scaffolds

Task: Also answer, how did the quality of the assembly improved and why.
Answer:

The assembly showed a clear improvement: the number of contigs dropped sharply (Contigs: 210/221 → 26) while their length increased significantly (Largest contig: 300,763 → 2,579,755 / N50: 111,860 → 968,098).
This result is achieved because long reads can span repetitive sequences and reduce the complexity of the assembly graph.

5. Genome Annotation

prokka --outdir prokka --force --centre XXX refs/scaffolds.fasta

6. Finding the closest relative of E. coli X

barrnap refs/scaffolds.fasta --kingdom bac --threads 6 > barrnap/barrnap_my_bac.gff 2> barrnap/barrnap_my_bac.stderr.log  
awk 'NR==1 || $9 ~ /^Name=16S_rRNA/' barrnap/barrnap_my_bac.gff > barrnap/barrnap_my_bac.16s_rrna.gff
awk '!/^#/ {print $1 "\t" $4-1 "\t" $5 "\t" $9}' barrnap/barrnap_my_bac.16s_rrna.gff > barrnap/barrnap_my_bac.16s_rrna.bed
bedtools getfasta -fi refs/scaffolds.fasta -bed barrnap/barrnap_my_bac.16s_rrna.bed > barrnap/barrnap_my_bac.16s_rrna.fa

We found 8 regions that match 16S rRNA.
One of these regions aligned to only 25% of the full 16S rRNA sequence.
Four regions are on the forward strand, and four are on the reverse strand.
All 8 regions have different positions on the chromosome. They are most likely unique sequences (transcribed in opposite directions) and not just reverse-complement copies of each other.

BLAST results:
blast_results

Name (DEFINITION): Escherichia coli 55989, complete sequence.
Annotation Name: GCF_000026245.1-RS_2025_06_09
ACCESSION: NC_011748

7. What is the genetic cause of HUS?

Mauve was downloaded from the source.
shiga_genes
In the E. coli X strain we studied, we found two genes that code for Shiga toxins. To be precise, these are two subunits.

  1. stxA (4445290-4446249)
  2. stxB (4446261-4446530)

8. Tracing the source of toxin genes in E. coli X

Yes, my file has many "hypothetical proteins". However, the nearby nohA_3 gene is a clue. This gene codes for a prophage DNA-packing protein. This suggests that the stxA and stxB genes were likely acquired by horizontal transfer. Specifically, this probably happened through lysogeny, which is when a phage genome integrates into the bacterial chromosome.

  • nohA_3 (Prophage DNA-packing protein NohA) (4448169-4448717)

9. Antibiotic resistance detection

E. coli X AR:
X_AR

55989 AR:
55989_AR

10. Antibiotic resistance mechanism

bla
The likely mechanism involves mobile genetic elements — specifically, plasmids that carry transposons.

  1. tnpR_1 (Transposon Tn3 resolvase) (2915041-2915292)
  2. tnpR_2 (Transposon Tn3 resolvase) (5263638-5264195)

Project's tree:

.
├── README.md
├── barrnap
│   ├── barrnap_my_bac.16s_rrna.bed
│   ├── barrnap_my_bac.16s_rrna.fa
│   ├── barrnap_my_bac.16s_rrna.gff
│   ├── barrnap_my_bac.gff
│   └── barrnap_my_bac.stderr.log
├── blast
│   └── 55989.fasta
├── images
│   ├── 55989_ar.png
│   ├── bla.png
│   ├── blast_results.png
│   ├── quast_output.png
│   ├── shiga_genes.png
│   └── x_ar.png
├── mauve
│   ├── mauve_results
│   ├── mauve_results.backbone
│   ├── mauve_results.bbcols
│   └── mauve_results.guide_tree
├── prokka
│   ├── PROKKA_12132025.err
│   ├── PROKKA_12132025.faa
│   ├── PROKKA_12132025.ffn
│   ├── PROKKA_12132025.fna
│   ├── PROKKA_12132025.fsa
│   ├── PROKKA_12132025.gbf-r
│   ├── PROKKA_12132025.gbk
│   ├── PROKKA_12132025.gbk.sslist
│   ├── PROKKA_12132025.gff
│   ├── PROKKA_12132025.log
│   ├── PROKKA_12132025.sqn
│   ├── PROKKA_12132025.tbl
│   ├── PROKKA_12132025.tsv
│   └── PROKKA_12132025.txt
├── quast
│   ├── long_reads
│   │   ├── basic_stats
│   │   │   ├── GC_content_plot.pdf
│   │   │   ├── Nx_plot.pdf
│   │   │   ├── SRR292678_scaffolds_GC_content_plot.pdf
│   │   │   ├── SRR292678_scaffolds_coverage_histogram.pdf
│   │   │   ├── contigs_GC_content_plot.pdf
│   │   │   ├── contigs_coverage_histogram.pdf
│   │   │   ├── coverage_histogram.pdf
│   │   │   ├── cumulative_plot.pdf
│   │   │   ├── refs_scaffolds_GC_content_plot.pdf
│   │   │   └── refs_scaffolds_coverage_histogram.pdf
│   │   ├── icarus.html
│   │   ├── icarus_viewers
│   │   │   └── contig_size_viewer.html
│   │   ├── quast.log
│   │   ├── quast.stderr.log
│   │   ├── quast.stdout.log
│   │   ├── report.html
│   │   ├── report.pdf
│   │   ├── report.tex
│   │   ├── report.tsv
│   │   ├── report.txt
│   │   ├── transposed_report.tex
│   │   ├── transposed_report.tsv
│   │   └── transposed_report.txt
│   └── short_reads
│       ├── basic_stats
│       │   ├── GC_content_plot.pdf
│       │   ├── Nx_plot.pdf
│       │   ├── contigs_GC_content_plot.pdf
│       │   ├── contigs_coverage_histogram.pdf
│       │   ├── coverage_histogram.pdf
│       │   ├── cumulative_plot.pdf
│       │   ├── scaffolds_GC_content_plot.pdf
│       │   └── scaffolds_coverage_histogram.pdf
│       ├── icarus.html
│       ├── icarus_viewers
│       │   └── contig_size_viewer.html
│       ├── quast.log
│       ├── quast.stderr.log
│       ├── quast.stdout.log
│       ├── report.html
│       ├── report.pdf
│       ├── report.tex
│       ├── report.tsv
│       ├── report.txt
│       ├── transposed_report.tex
│       ├── transposed_report.tsv
│       └── transposed_report.txt
├── reads
│   ├── SRR292678sub_S1_L001_R1_001.fastq.gz
│   ├── SRR292678sub_S1_L001_R2_001.fastq.gz
│   └── fastqc
│       ├── SRR292678sub_S1_L001_R1_001_fastqc.html
│       ├── SRR292678sub_S1_L001_R1_001_fastqc.zip
│       ├── SRR292678sub_S1_L001_R2_001_fastqc.html
│       └── SRR292678sub_S1_L001_R2_001_fastqc.zip
├── refs
│   ├── SRR292678
│   │   ├── contigs.fasta
│   │   ├── scaffolds.fasta
│   │   └── spades.log
│   ├── SRR292678.zip
│   ├── assembly_graph_with_scaffolds.gfa
│   ├── scaffolds.fasta
│   └── scaffolds.fasta.fai
├── setup.sh
└── tree.txt

17 directories, 90 files

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages