diff --git a/README.md b/README.md index 2050e57..bb7e53f 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,11 @@ - logo +microBioRust logo light mode +microBioRust logo dark mode [![Docs](https://img.shields.io/badge/docs-mkdocs-blue.svg)](https://lcrossman.github.io/microBioRust/) ![Crates.io Version](https://img.shields.io/crates/v/microBioRust?style=flat&link=https%3A%2F%2Fcrates.io%2Fcrates%2FmicroBioRust) - - ## A Rust bioinformatics crate aimed at Microbial genomics
The aim of this crate is to provide Microbiology friendly Rust functions for bioinformatics.
@@ -16,6 +15,8 @@ The aim of this crate is to provide Microbiology friendly Rust functions for bio Some concepts with many thanks to Rust-bio
Please see the Roadmap for futher details [here](ROADMAP.md) +Check out the [docs here](https://microBioRust.github.io/microBioRust) + To install Rust - please see here [Rust install](https://www.rust-lang.org/tools/install) or with Conda
If you would like to contribute please follow the [Rust code of conduct](https://www.rust-lang.org/policies/code-of-conduct) diff --git a/ROADMAP.md b/ROADMAP.md index 95064d5..4e741c4 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -16,6 +16,7 @@ Integration of common types and parsers such as: - [ ] fastq gzipped version parsing - [ ] SAM format parser - [ ] RPKM output parser +- [ ] Further RNA-seq transcriptomics analysis - [ ] Support for other compressed files such as BAM and CRAM - [ ] Writer support for those types diff --git a/assets/BIO W.png b/assets/BIO W.png new file mode 100644 index 0000000..e2fbf31 Binary files /dev/null and b/assets/BIO W.png differ diff --git a/docs/assets/MICROBIO B.svg b/assets/MICROBIO B.svg similarity index 100% rename from docs/assets/MICROBIO B.svg rename to assets/MICROBIO B.svg diff --git a/docs/assets/attributes_diagram.png b/docs/assets/attributes_diagram.png deleted file mode 100644 index 29c0f10..0000000 Binary files a/docs/assets/attributes_diagram.png and /dev/null differ diff --git a/docs/assets/pc_specs.png b/docs/assets/pc_specs.png deleted file mode 100644 index 6da32ac..0000000 Binary files a/docs/assets/pc_specs.png and /dev/null differ diff --git a/docs/assets/records_diagram.pdf b/docs/assets/records_diagram.pdf deleted file mode 100644 index 26df57f..0000000 Binary files a/docs/assets/records_diagram.pdf and /dev/null differ diff --git a/docs/assets/records_diagram.png b/docs/assets/records_diagram.png deleted file mode 100644 index ad70627..0000000 Binary files a/docs/assets/records_diagram.png and /dev/null differ diff --git a/docs/assets/system_model.png b/docs/assets/system_model.png deleted file mode 100644 index 7e8396e..0000000 Binary files a/docs/assets/system_model.png and /dev/null differ diff --git a/docs/assets/window_code.png b/docs/assets/window_code.png deleted file mode 100644 index 7f7f1f7..0000000 Binary files a/docs/assets/window_code.png and /dev/null differ diff --git a/docs/formats_and_parsing.md b/docs/formats_and_parsing.md deleted file mode 100644 index f5a5c15..0000000 --- a/docs/formats_and_parsing.md +++ /dev/null @@ -1,98 +0,0 @@ -**File types and Parsing behaviour** - -**Genbank (.gbk) & Embl (.embl)** - - -- We also provide the ability to convert these formats to gff3 - -Each genome file is a basic text file following a specific ruleset and genbank (gbk) and embl are similar but differ in the required formatting. - - -*Structure and Parsing* - -Top Level is the Records type. There is one Records type per genome file. -Three types of macro have getters and setters for the data, these are SourceAttributes, FeatureAttributes, SequenceAttributes. - -The next level is the Record type. There may be one or many Record in a Records (up to ~ 2000 but more usually ~ 50). Each Record has a DNA sequence which is calculated on the fly by slicing the total sequence of Records with the start and stop coordinates. Each Record also has a SourceAttributes macro which stores ID, total start and stop of the Record sequence (different to the CDS features start and stop below). It also stores the Organism among some other database comments. - -![explanatory diagram for the file datatypes](assets/records_diagram.png){ loading=lazy } - -The full structure of the SourceAttributes, FeatureAttributes and SequenceAttributes is: - -![explanatory diagram for the Attribute macros](assets/attributes_diagram.png){ width=500 } - -SourceAttributes stores the following in an enum: - -``` -pub enum SourceAttributes { - Start { value: RangeValue }, - Stop { value: RangeValue }, - Organism { value: String }, - MolType { value: String}, - Strain { value: String}, - CultureCollection { value: String}, - TypeMaterial { value: String}, - DbXref { value:String} -} -``` - -Where RangeValue can be either of: - -``` -RangeValue::Exact(value) -RangeValue::LessThan(value) -RangeValue::GreaterThan(value) -``` - -Most RangeValues are Exact(value) with exceptions usually at the start and end of sequences, indicating that they are truncated. - -Note that the start and stop of SourceAttributes relate to the sequence of the whole record - -Each Record can have None or many hundreds of coding sequences, CDS (stored in the FeatureAttributes). -These are the predicted genes and contain annotation data per gene such as locus_tag (id), gene (may be empty), start, stop, strand (-1 or +1), codon start (1,2 or 3), product. - -FeatureAttributes stores the following in an enum: - -``` -pub enum FeatureAttributes { - Start { value: RangeValue }, - Stop { value: RangeValue }, - Gene { value: String }, - Product { value: String }, - CodonStart { value: u8 }, - Strand { value: i8 }, - // ec_number { value: String } -} -``` - -currently EC_number is commented out but could be added back if there is demand - -Each CDS also has a DNA sequence .ffn (calculated on the fly from the start, stop and strand) and a protein sequence .faa (translated on the fly from the start, stop, strand and codon_start). The sequences are stored in the SequenceAttributes. - -SequenceAttributes stores the following in an enum: - -``` -pub enum SequenceAttributes { - Start { value: RangeValue }, - Stop { value: RangeValue }, - SequenceFfn { value: String }, - SequenceFaa { value: String }, - CodonStart { value: u8 }, - Strand { value: i8 }, -} -``` - -Note the start, stop, strand and codon start of SequenceAttributes and FeatureAttributes are identical - -Sequences are stored separately in SequenceAttributes for efficiency. Although start, stop, locus_tag, and strand are duplicated in SequenceAttributes and FeatureAttributes, keeping them together may make it easier to slice the sequence and access the specific feature metadata at the same time. - - - - - - - - - - - diff --git a/docs/formats_and_parsing.md~ b/docs/formats_and_parsing.md~ deleted file mode 100644 index 7d78275..0000000 --- a/docs/formats_and_parsing.md~ +++ /dev/null @@ -1,94 +0,0 @@ -**File types and Parsing behaviour** - -**Genbank (.gbk) & Embl (.embl)** - - -- We also provide the ability to convert these formats to gff3 - -Each genome file is a basic text file following a specific ruleset and genbank (gbk) and embl are similar but differ in the required formatting. - - -*Structure and Parsing* - -Top Level is the Records type. There is one Records type per genome file. -Three types of macro have getters and setters for the data, these are SourceAttributes, FeatureAttributes, SequenceAttributes. - -The next level is the Record type. There may be one or many Record in a Records (up to ~ 2000 but more usually ~ 50). Each Record has a DNA sequence which is calculated on the fly by slicing the total sequence of Records with the start and stop coordinates. Each Record also has a SourceAttributes macro which stores ID, total start and stop of the Record sequence (different to the CDS features start and stop below). It also stores the Organism among some other database comments. - -![explanatory diagram for the file datatypes](assets/records_diagram.png){ loading=lazy } - -SourceAttributes stores the following in an enum: - -``` -pub enum SourceAttributes { - Start { value: RangeValue }, - Stop { value: RangeValue }, - Organism { value: String }, - MolType { value: String}, - Strain { value: String}, - CultureCollection { value: String}, - TypeMaterial { value: String}, - DbXref { value:String} -} -``` - -Where RangeValue can be either of: - -``` -RangeValue::Exact(value) -RangeValue::LessThan(value) -RangeValue::GreaterThan(value) -``` - -Most RangeValues are Exact(value) with exceptions usually at the start and end of sequences, indicating that they are truncated. - -Note that the start and stop of SourceAttributes relate to the sequence of the whole record - -Each Record can have None or many hundreds of coding sequences, CDS (stored in the FeatureAttributes). -These are the predicted genes and contain annotation data per gene such as locus_tag (id), gene (may be empty), start, stop, strand (-1 or +1), codon start (1,2 or 3), product. - -FeatureAttributes stores the following in an enum: - -``` -pub enum FeatureAttributes { - Start { value: RangeValue }, - Stop { value: RangeValue }, - Gene { value: String }, - Product { value: String }, - CodonStart { value: u8 }, - Strand { value: i8 }, - // ec_number { value: String } -} -``` - -currently EC_number is commented out but could be added back if there is demand - -Each CDS also has a DNA sequence .ffn (calculated on the fly from the start, stop and strand) and a protein sequence .faa (translated on the fly from the start, stop, strand and codon_start). The sequences are stored in the SequenceAttributes. - -SequenceAttributes stores the following in an enum: - -``` -pub enum SequenceAttributes { - Start { value: RangeValue }, - Stop { value: RangeValue }, - SequenceFfn { value: String }, - SequenceFaa { value: String }, - CodonStart { value: u8 }, - Strand { value: i8 }, -} -``` - -Note the start, stop, strand and codon start of SequenceAttributes and FeatureAttributes are identical - -Sequences are stored separately in SequenceAttributes for efficiency. Although start, stop, locus_tag, and strand are duplicated in SequenceAttributes and FeatureAttributes, keeping them together may make it easier to slice the sequence and access its metadata at the same time. - - - - - - - - - - - diff --git a/docs/index.md b/docs/index.md deleted file mode 100644 index 6e79c4e..0000000 --- a/docs/index.md +++ /dev/null @@ -1,39 +0,0 @@ -# Welcome to micro**BioRust** - -A blazing-fast, sustainable bioinformatics toolkit written in [Rust](https://www.rust-lang.org/) — for microbial genomics rresearch, and optimised for functions used in data exploration. - ---- - -## Features - -- 🦀 Built in Rust programming language for speed and safety -- 🔄 Python bindings _via_ pyo3 for InterOp - Rust meets Python -- 📦 Open source and community-driven - ---- - -## Get Started!! -See Installation for details on how to install Rust for Linux, MacOSX and Windows -Interested in microbiorust-py? Check out the microbiorust-py section for quick-start & more! - -Start a new project -```cargo new microBioRust_test``` - -Add to your Cargo.toml -```cargo add -p microBioRust``` - -to add the whole workspace including file parsing, sequence metrics, coming soon data viz (heatmap demonstration) and python bindings (microbiorust-py) - -```cargo add -p seqmetrics``` -```cargo add -p heatmap``` -```cargo add -p microbiorust-py``` - -or clone the repo -```git clone https://github.com/LCrossman/microBioRust.git``` - -Build the project -```cargo build``` - -Run the tests -```cargo test``` - diff --git a/docs/installation.md b/docs/installation.md deleted file mode 100644 index e825813..0000000 --- a/docs/installation.md +++ /dev/null @@ -1,35 +0,0 @@ -**How to Install** - -**Install Rust (recommended method):** - -Please see [here](https://www.rust-lang.org/tools/install) for base environment installation details for Rust programming language - -For MacOSX, Linux or other unix OS you can install Rust _via_: - -```curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh``` - -For further details on Windows please see: [other install](https://forge.rust-lang.org/infra/other-installation-methods.html) - -**Install with Conda:** -*Disclaimer:* While conda is a powerful package manager, it is primarily designed for managing Python environments, and the -Rust package available may not always be up-to-date or include all the necessary components - -```conda install conda-forge::rust``` - - -**Install with Python for Rust with Python InterOp** -To use Python Interoperability you will need to install Python and additionally you may need (even if in a conda environment): - -For Linux: -```export LD_LIBRARY_PATH=[directory where python is located]``` - -For MacOSX: -```export DYLD_LIBRARY_PATH=[directory where python is located]``` - -For Windows please see this StackOverflow issue for a fix: [fix here](https://stackoverflow.com/questions/79627918/cant-set-python-version-when-running-rust-analyzer-and-pyo3-on-wsl/79627921#79627921) and please also see our specific windows_install page - - - - - - diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css deleted file mode 100644 index 117f791..0000000 --- a/docs/stylesheets/extra.css +++ /dev/null @@ -1,38 +0,0 @@ -@import url('https://fonts.googleapis.com/css2?family=Roboto+Slab:wght@400..600&display=swap'); - -/* Light mode (scheme: default) */ -[data-md-color-scheme="default"] { - --md-default-bg-color: #fefcfb; - --md-default-fg-color: #1f2128; - --md-primary-fg-color: #e45a28; - --md-accent-fg-color: #f4a63a; -} - -/* Dark mode (scheme: slate) */ -[data-md-color-scheme="slate"] { - --md-default-bg-color: #1f2128; - --md-default-fg-color: #fefcfb; - --md-primary-fg-color: #F4A63A; - --md-accent-fg-color: #E45928; -} - -/* Light mode */ -[data-md-color-scheme="default"] h1, -[data-md-color-scheme="default"] h1 strong, -[data-md-color-scheme="default"] h2 { - font-family: 'Roboto Slab', sans-serif; - font-weight: 400 !important; - color: #333333 !important; /* Choose your light mode color */ -} - -/* Dark mode */ -[data-md-color-scheme="slate"] h1, -[data-md-color-scheme="slate"] h1 strong, -[data-md-color-scheme="slate"] h2 { - font-family: 'Roboto Slab', sans-serif; - font-weight: 400 !important; - color: #FEFCFB !important; /* Your dark mode color */ -} -.md-typeset h1 strong { - font-weight: 600 !important; -} diff --git a/docs/usage.md b/docs/usage.md deleted file mode 100644 index 955f35f..0000000 --- a/docs/usage.md +++ /dev/null @@ -1,234 +0,0 @@ -In microBioRust: - -You can parse genbank files and save as a GFF (gff3) format as well as extracting DNA sequences, gene DNA sequences (ffn) and protein fasta sequences (faa) -Super simple way: - -```rust -pub fn genbank_to_faa() -> Result<(), anyhow::Error> { - let args = Arguments::parse(); - let records = genbank!(&args.filename); - for record in records.iter() { - for (k, v) in &record.cds.attributes { - if let Some(seq) = record.seq_features.get_sequence_faa(k) { - println!(">{}|{}\n{}", &record.id, &k, seq); - } - } - } - return Ok(()); -} - -``` - -Better for Debugging: - -```rust -pub fn genbank_to_faa() -> Result<(), anyhow::Error> { - let args: Vec = env::args().collect(); - let config = Config::new(&args).unwrap_or_else(|err| { - println!("Problem with parsing file arguments: {}", err); - process::exit(1); - }); - let file_gbk = fs::File::open(config.filename)?; - let mut reader = Reader::new(file_gbk); - let mut records = reader.records(); - let mut cds_counter: u32 = 0; - loop { - //collect from each record advancing on a next record basis, count cds records - match records.next() { - Some(Ok(mut record)) => { - for (k, v) in &record.cds.attributes { - match record.seq_features.get_sequence_faa(&k) { - Some(value) => { - let seq_faa = value.to_string(); - println!(">{}|{}\n{}", &record.id, &k, seq_faa); - } - _ => (), - }; - } - cds_counter += 1; - } - Some(Err(e)) => { - println!("Error encountered - an err {:?}", e); - } - None => { - println!("finished iteration"); - break; - } - } - } - println!("Total records processed: {}", read_counter); - return Ok(()); -} -``` - -Example to save a provided multi- or single genbank file as a GFF file (by joining any multi-genbank) - -```rust -pub fn genbank_to_gff() -> io::Result<()> { - let args: Vec = env::args().collect(); - let config = Config::new(&args).unwrap_or_else(|err| { - println!("Problem with parsing file arguments: {}", err); - process::exit(1); - }); - let file_gbk = fs::File::open(&config.filename)?; - let prev_start: u32 = 0; - let mut prev_end: u32 = 0; - let mut reader = Reader::new(file_gbk); - let mut records = reader.records(); - let mut read_counter: u32 = 0; - let mut seq_region: BTreeMap = BTreeMap::new(); - let mut record_vec: Vec = Vec::new(); - loop { - match records.next() { - Some(Ok(mut record)) => { - //println!("next record"); - //println!("Record id: {:?}", record.id); - let source = record.source_map.source_name.clone().expect("issue collecting source name"); - let beginning = match record.source_map.get_start(&source) { - Some(value) => value.get_value(), - _ => 0, - }; - let ending = match record.source_map.get_stop(&source) { - Some(value) => value.get_value(), - _ => 0, - }; - if ending + prev_end < beginning + prev_end { - } - seq_region.insert(source, (beginning + prev_end, ending + prev_end)); - record_vec.push(record); - // Add additional fields to print if needed - read_counter+=1; - prev_end+=ending; // create the joined record if there are multiple - }, - Some(Err(e)) => { println!("theres an err {:?}", e); }, - None => { - println!("finished iteration"); - break; }, - } - } - let output_file = format!("{}.gff", &config.filename); - gff_write(seq_region.clone(), record_vec, &output_file, true); - println!("Total records processed: {}", read_counter); - return Ok(()); -} -``` - -Example to create a completely new record, use of setters or set_ functionality - -To write into GFF format requires gff_write(seq_region, record_vec, filename, true or false) - -The seq_region is the region of interest to save with name and DNA coordinates such as `seqregion.entry("source_1".to_string(), (1,897))` - -This makes it possible to save the whole file or to subset it - -record_vec is a list of the records. If there is only one record, include this as a vec using `vec![record]` - -The boolean true/false describes whether the DNA sequence should be included in the GFF3 file - -To write into genbank format requires gbk_write(seq_region, record_vec, filename), no true or false since genbank format will include the DNA sequence - - ```rust -pub fn create_new_record() -> Result<(), anyhow::Error> { - let filename = format!("new_record.gff"); - let mut record = Record::new(); - let mut seq_region: BTreeMap = BTreeMap::new(); - //example from E.coli K12 - seq_region.insert("source_1".to_string(), (1, 897)); - //Add the source into SourceAttributes - record - .source_map - .set_counter("source_1".to_string()) - .set_start(RangeValue::Exact(1)) - .set_stop(RangeValue::Exact(897)) - .set_organism("Escherichia coli".to_string()) - .set_mol_type("DNA".to_string()) - .set_strain("K-12 substr. MG1655".to_string()) - .set_type_material("type strain of Escherichia coli K12".to_string()) - .set_db_xref("PRJNA57779".to_string()); - //Add the features into FeatureAttributes, here we are setting two features, i.e. coding sequences or genes - record - .cds - .set_counter("b3304".to_string()) - .set_start(RangeValue::Exact(1)) - .set_stop(RangeValue::Exact(354)) - .set_gene("rplR".to_string()) - .set_product("50S ribosomal subunit protein L18".to_string()) - .set_codon_start(1) - .set_strand(-1); - record - .cds - .set_counter("b3305".to_string()) - .set_start(RangeValue::Exact(364)) - .set_stop(RangeValue::Exact(897)) - .set_gene("rplF".to_string()) - .set_product("50S ribosomal subunit protein L6".to_string()) - .set_codon_start(1) - .set_strand(-1); - //Add the sequences for the coding sequence (CDS) into SequenceAttributes - record - .seq_features - .set_counter("b3304".to_string()) - .set_start(RangeValue::Exact(1)) - .set_stop(RangeValue::Exact(354)) - .set_sequence_ffn( - "ATGGATAAGAAATCTGCTCGTATCCGTCGTGCGACCCGCGCACGCCGCAAGCTCCAGGAG -CTGGGCGCAACTCGCCTGGTGGTACATCGTACCCCGCGTCACATTTACGCACAGGTAATT -GCACCGAACGGTTCTGAAGTTCTGGTAGCTGCTTCTACTGTAGAAAAAGCTATCGCTGAA -CAACTGAAGTACACCGGTAACAAAGACGCGGCTGCAGCTGTGGGTAAAGCTGTCGCTGAA -CGCGCTCTGGAAAAAGGCATCAAAGATGTATCCTTTGACCGTTCCGGGTTCCAATATCAT -GGTCGTGTCCAGGCACTGGCAGATGCTGCCCGTGAAGCTGGCCTTCAGTTCTAA" - .to_string(), - ) - .set_sequence_faa( - "MDKKSARIRRATRARRKLQELGATRLVVHRTPRHIYAQVIAPNGSEVLVAASTVEKAIAE -QLKYTGNKDAAAAVGKAVAERALEKGIKDVSFDRSGFQYHGRVQALADAAREAGLQF" - .to_string(), - ) - .set_codon_start(1) - .set_strand(-1); - record - .seq_features - .set_counter("bb3305".to_string()) - .set_start(RangeValue::Exact(364)) - .set_stop(RangeValue::Exact(897)) - .set_sequence_ffn( - "ATGTCTCGTGTTGCTAAAGCACCGGTCGTTGTTCCTGCCGGCGTTGACGTAAAAATCAAC -GGTCAGGTTATTACGATCAAAGGTAAAAACGGCGAGCTGACTCGTACTCTCAACGATGCT -GTTGAAGTTAAACATGCAGATAATACCCTGACCTTCGGTCCGCGTGATGGTTACGCAGAC -GGTTGGGCACAGGCTGGTACCGCGCGTGCCCTGCTGAACTCAATGGTTATCGGTGTTACC -GAAGGCTTCACTAAGAAGCTGCAGCTGGTTGGTGTAGGTTACCGTGCAGCGGTTAAAGGC -AATGTGATTAACCTGTCTCTGGGTTTCTCTCATCCTGTTGACCATCAGCTGCCTGCGGGT -ATCACTGCTGAATGTCCGACTCAGACTGAAATCGTGCTGAAAGGCGCTGATAAGCAGGTG -ATCGGCCAGGTTGCAGCGGATCTGCGCGCCTACCGTCGTCCTGAGCCTTATAAAGGCAAG -GGTGTTCGTTACGCCGACGAAGTCGTGCGTACCAAAGAGGCTAAGAAGAAGTAA" - .to_string(), - ) - .set_sequence_faa( - "MSRVAKAPVVVPAGVDVKINGQVITIKGKNGELTRTLNDAVEVKHADNTLTFGPRDGYAD -GWAQAGTARALLNSMVIGVTEGFTKKLQLVGVGYRAAVKGNVINLSLGFSHPVDHQLPAG -ITAECPTQTEIVLKGADKQVIGQVAADLRAYRRPEPYKGKGVRYADEVVRTKEAKKK" - .to_string(), - ) - .set_codon_start(1) - .set_strand(-1); - //Add the full sequence of the entire record into the record.sequence - record.sequence = "TTAGAACTGAAGGCCAGCTTCACGGGCAGCATCTGCCAGTGCCTGGACACGACCATGATA -TTGGAACCCGGAACGGTCAAAGGATACATCTTTGATGCCTTTTTCCAGAGCGCGTTCAGC -GACAGCTTTACCCACAGCTGCAGCCGCGTCTTTGTTACCGGTGTACTTCAGTTGTTCAGC -GATAGCTTTTTCTACAGTAGAAGCAGCTACCAGAACTTCAGAACCGTTCGGTGCAATTAC -CTGTGCGTAAATGTGACGCGGGGTACGATGTACCACCAGGCGAGTTGCGCCCAGCTCCTG -GAGCTTGCGGCGTGCGCGGGTCGCACGACGGATACGAGCAGATTTCTTATCCATAGTGTT -ACCTTACTTCTTCTTAGCCTCTTTGGTACGCACGACTTCGTCGGCGTAACGAACACCCTT -GCCTTTATAAGGCTCAGGACGACGGTAGGCGCGCAGATCCGCTGCAACCTGGCCGATCAC -CTGCTTATCAGCGCCTTTCAGCACGATTTCAGTCTGAGTCGGACATTCAGCAGTGATACC -CGCAGGCAGCTGATGGTCAACAGGATGAGAGAAACCCAGAGACAGGTTAATCACATTGCC -TTTAACCGCTGCACGGTAACCTACACCAACCAGCTGCAGCTTCTTAGTGAAGCCTTCGGT -AACACCGATAACCATTGAGTTCAGCAGGGCACGCGCGGTACCAGCCTGTGCCCAACCGTC -TGCGTAACCATCACGCGGACCGAAGGTCAGGGTATTATCTGCATGTTTAACTTCAACAGC -ATCGTTGAGAGTACGAGTCAGCTCGCCGTTTTTACCTTTGATCGTAATAACCTGACCGTT -GATTTTTACGTCAACGCCGGCAGGAACAACGACCGGTGCTTTAGCAACACGAGACAT" - .to_string(); - gff_write(seq_region, vec![record], &filename, true); - return Ok(()); -} -``` diff --git a/docs/windows_install.md b/docs/windows_install.md deleted file mode 100644 index 1ec9711..0000000 --- a/docs/windows_install.md +++ /dev/null @@ -1,81 +0,0 @@ -#Installation on Windows - -:pencil2: Author: Sreeram Peela - -**Pre-requisites** - -Rust is the programming language of choice for complex tasks these days. Installing micro**BioRust** -in windows requires Rust to be installed and the PATH variables to be added. - -We recommend installing Rust (and micro**BioRust**) using Windows Powershell (logged as admin). - -Alternately, the latest executables for GUI-based installation can be found here: -[https://www.rust-lang.org/tools/install](https://www.rust-lang.org/tools/install) - -1. Navigate to the directory of your choice and open Powershell in admin mode. -2. Download Rust executable from Powershell using the command: - -``` -Invoke-WebRequest --Uri https://static.rust-lang.org/rustup/dist/x86_64-pc-windows-msvc/rustup-init.exe --OutFile rustup-init.exe -# Run the below command to start installation -.\rustup-init.exe``` - -``` -> :mega: Installing Rust typically can be made through Visual Studio Community installer (Select option 1 when prompted). This documentation is written by selecting this option - -Installation VS community installer requires downloading additional packages - Win11_SDK and .. Please note that both these require almost 6GB of memory !! - -3. Select ‘Default installation’ when prompted. -4. After a typical installation is over, close the powershell and reopen it to make changes in the PATH. - -**# Check if installation is successful** -``` -cargo help -``` - -For a successful installation, the above command will display different options and subcommands in Cargo. We will be using this for installing microBioRust (or any Rust package). - -**Install microBioRust** - -The micro**BioRust** repo is being hosted on GitHub [here](https://github.com/LCrossman/microBioRust). We recommend using Git for smooth installation. Alternatively, users can download the repo as a ZIP file, uncompress it (with your own choice of tools), and navigate to the directory. - - -**# Clone the repo using Git** -``` -git clone https://github.com/LCrossman/microBioRust.git -``` - -**# Navigate to the dir** -``` -cd microBioRust -``` - -Inside the directory, one can use Cargo to build the library. - -``` -# Inside the project directory -cargo build -``` - ->:arrow_right: The above command downloads and installs necessary dependencies for smooth functioning of the package. Please wait until all the required dependencies are installed and compilation for micro**BioRust** is completed. - - -Once the package has been built, it is a general practise to test whether installation was successful. For testing the installation, run the command: - -``` -# test installation of the package -cargo test -``` - -The above test instance runs over multiple files packed with the repo, and a final output message can help us in understanding errors. Typically, successful installation gives the following last few lines: - -![Screen Image](assets/window_code.png){ loading=lazy } - -Congratulations!! You have successfully installed microBioRust in your system. You can proceed with Getting started section of the documentation. - -> Session Information: - -![PC Specs](assets/pc_specs.png){ loading=lazy } -![System model](assets/system_model.png){ loading=lazy } diff --git a/heatmap/static/index.html b/heatmap/static/index.html deleted file mode 100644 index a4090ad..0000000 --- a/heatmap/static/index.html +++ /dev/null @@ -1,39 +0,0 @@ - - - - - - - Using Rust and WebAssembly with d3 to create Heatmaps - - - - - - -

Rust, Wasm & d3.js Heatmap

- - - - - - diff --git a/heatmap/static/style.css b/heatmap/static/style.css deleted file mode 100644 index 80e9b56..0000000 --- a/heatmap/static/style.css +++ /dev/null @@ -1,3 +0,0 @@ -#heatmap { - overflow: hidden; -} diff --git a/microBioRust/expand.rs b/microBioRust/expand.rs deleted file mode 100644 index 163e5b0..0000000 --- a/microBioRust/expand.rs +++ /dev/null @@ -1,5981 +0,0 @@ -#![feature(prelude_import)] -//! The aim of this crate is to provide Microbiology friendly Rust functions for bioinformatics. -//! -//! -//! With the genbank parser, you are able to parse a genbank format file, then write into gff3 format -//! -//! It is also possible to print the DNA sequences extracted from the coding sequences (genes, ffn format), -//! plus the protein fasta sequences (faa format). -//! -//! Additionally, you can create new features and records and save them either in genbank or gff3 format -//! -#![allow(non_snake_case)] -#[prelude_import] -use std::prelude::rust_2021::*; -#[macro_use] -extern crate std; -pub mod embl { - //! # An EMBL format to GFF parser - //! - //! - //! You are able to parse genbank and save as a GFF (gff3) format as well as extracting DNA sequences, gene DNA sequences (ffn) and protein fasta sequences (faa) - //! - //! You can also create new records and save as a embl (gbk) format - //! - //! ## Detailed Explanation - //! - //! - //! The Embl parser contains: - //! - //! Records - a top level structure which consists of either one record (single embl) or multiple instances of record (multi-embl). - //! - //! Each Record contains: - //! - //! 1. A source, ```SourceAttributes```, construct(enum) of counter (source name), start, stop [of source or contig], organism, mol_type, strain, type_material, db_xref - //! 2. Features, ```FeatureAttributes```, construct(enum) of counter (locus tag), gene (if present), product, codon start, strand, start, stop [of cds/gene] - //! 3. Sequence features, ```SequenceAttributes```, construct(enum) of counter (locus tag), sequence_ffn (DNA gene sequence) sequence_faa (protein translation), strand, codon start, start, stop [cds/gene] - //! 4. The DNA sequence of the whole record (or contig) - //! - //! Example to extract and print all the protein sequence fasta, example using getters (or get_ functionality), simplified embl! macro - //! - //!```rust - //! use clap::Parser; - //! use std::fs::File; - //! use microBioRust::embl::Reader; - //! use std::io; - //! use microBioRust::embl; - //! - //! - //! #[derive(Parser, Debug)] - //! #[clap(author, version, about)] - //! struct Arguments { - //! #[clap(short, long)] - //! filename: String, - //! } - //! - //! pub fn genbank_to_faa() -> Result<(), anyhow::Error> { - //! let args = Arguments::parse(); - //! let records = embl!(&args.filename); - //! for record in records { - //! for (k, v) in &record.cds.attributes { - //! if let Some(seq) = record.seq_features.get_sequence_faa(k) { - //! println!(">{}|{}\n{}", &record.id, &k, seq); - //! } - //! } - //! } - //! return Ok(()); - //! } - //!``` - //! - //! Example to extract protein sequence from embl file, debugging use - //!```rust - //! use clap::Parser; - //! use std::fs::File; - //! use microBioRust::embl::Reader; - //! use std::io; - //! - //! #[derive(Parser, Debug)] - //! #[clap(author, version, about)] - //! struct Arguments { - //! #[clap(short, long)] - //! filename: String, - //! } - //! - //! pub fn embl_to_faa() -> Result<(), anyhow::Error> { - //! let args = Arguments::parse(); - //! let file_embl = File::open(args.filename)?; - //! let mut reader = Reader::new(file_embl); - //! let mut records = reader.records(); - //! loop { - //! //collect from each record advancing on a next record basis, count cds records - //! match records.next() { - //! Some(Ok(mut record)) => { - //! for (k, v) in &record.cds.attributes { - //! match record.seq_features.get_sequence_faa(&k) { - //! Some(value) => { let seq_faa = value.to_string(); - //! println!(">{}|{}\n{}", &record.id, &k, seq_faa); - //! }, - //! _ => (), - //! }; - //! } - //! }, - //! Some(Err(e)) => { println!("Error encountered - an err {:?}", e); }, - //! None => break, - //! } - //! } - //! return Ok(()); - //! } - //!``` - //! - //! - //! Example to save a provided multi- or single genbank file as a GFF file (by joining any multi-genbank) - //! - //! - //! ```rust - //! use microBioRust::embl::gff_write; - //! use microBioRust::embl::Reader; - //! use microBioRust::embl::Record; - //! use std::collections::BTreeMap; - //! use std::fs::File; - //! use clap::Parser; - //! use std::io; - //! - //! #[derive(Parser, Debug)] - //! #[clap(author, version, about)] - //! struct Arguments { - //! #[clap(short, long)] - //! filename: String, - //! } - //! - //! pub fn embl_to_gff() -> io::Result<()> { - //! let args = Arguments::parse(); - //! let file_embl = File::open(&args.filename)?; - //! let prev_start: u32 = 0; - //! let mut prev_end: u32 = 0; - //! let mut reader = Reader::new(file_embl); - //! let mut records = reader.records(); - //! let mut read_counter: u32 = 0; - //! let mut seq_region: BTreeMap = BTreeMap::new(); - //! let mut record_vec: Vec = Vec::new(); - //! loop { - //! match records.next() { - //! Some(Ok(mut record)) => { - //! //println!("next record"); - //! //println!("Record id: {:?}", record.id); - //! let source = record.source_map.source_name.clone().expect("issue collecting source name"); - //! let beginning = match record.source_map.get_start(&source) { - //! Some(value) => value.get_value(), - //! _ => 0, - //! }; - //! let ending = match record.source_map.get_stop(&source) { - //! Some(value) => value.get_value(), - //! _ => 0, - //! }; - //! if ending + prev_end < beginning + prev_end { - //! println!("debug: end value smaller is than the start {:?}", beginning); - //! } - //! seq_region.insert(source, (beginning + prev_end, ending + prev_end)); - //! record_vec.push(record); - //! // Add additional fields to print if needed - //! read_counter+=1; - //! prev_end+=ending; // create the joined record if there are multiple - //! }, - //! Some(Err(e)) => { println!("theres an err {:?}", e); }, - //! None => { - //! println!("finished iteration"); - //! break; }, - //! } - //! } - //! let output_file = format!("{}.gff", &args.filename); - //! gff_write(seq_region.clone(), record_vec, &output_file, true); - //! println!("Total records processed: {}", read_counter); - //! return Ok(()); - //! } - //!``` - //! Example to create a completely new record, use of setters or set_ functionality - //! - //! To write into GFF format requires gff_write(seq_region, record_vec, filename, true or false) - //! - //! The seq_region is the region of interest to save with name and DNA coordinates such as ``` seqregion.entry("source_1".to_string(), (1,897))``` - //! This makes it possible to save the whole file or to subset it - //! - //! record_vec is a list of the records. If there is only one record, include this as a vec using ``` vec![record] ``` - //! - //! The boolean true/false describes whether the DNA sequence should be included in the GFF3 file - //! - //! To write into embl format requires embl_write(seq_region, record_vec, filename), no true or false since embl format will include the DNA sequence - //! - //! - //! ```rust - //! use microBioRust::embl::gff_write; - //! use microBioRust::embl::RangeValue; - //! use microBioRust::embl::Record; - //! use std::collections::BTreeMap; - //! - //! pub fn create_new_record() -> Result<(), anyhow::Error> { - //! let filename = format!("new_record.gff"); - //! let mut record = Record::new(); - //! let mut seq_region: BTreeMap = BTreeMap::new(); - //! //example from E.coli K12 - //! seq_region.insert("source_1".to_string(), (1,897)); - //! //Add the source into SourceAttributes - //! record.source_map - //! .set_counter("source_1".to_string()) - //! .set_start(RangeValue::Exact(1)) - //! .set_stop(RangeValue::Exact(897)) - //! .set_organism("Escherichia coli".to_string()) - //! .set_mol_type("DNA".to_string()) - //! .set_strain("K-12 substr. MG1655".to_string()) - //! .set_type_material("type strain of Escherichia coli K12".to_string()) - //! .set_db_xref("PRJNA57779".to_string()); - //! //Add the features into FeatureAttributes, here we are setting two features, i.e. coding sequences or genes - //! record.cds - //! .set_counter("b3304".to_string()) - //! .set_start(RangeValue::Exact(1)) - //! .set_stop(RangeValue::Exact(354)) - //! .set_gene("rplR".to_string()) - //! .set_product("50S ribosomal subunit protein L18".to_string()) - //! .set_codon_start(1) - //! .set_strand(-1); - //! record.cds - //! .set_counter("b3305".to_string()) - //! .set_start(RangeValue::Exact(364)) - //! .set_stop(RangeValue::Exact(897)) - //! .set_gene("rplF".to_string()) - //! .set_product("50S ribosomal subunit protein L6".to_string()) - //! .set_codon_start(1) - //! .set_strand(-1); - //! //Add the sequences for the coding sequence (CDS) into SequenceAttributes - //! record.seq_features - //! .set_counter("b3304".to_string()) - //! .set_start(RangeValue::Exact(1)) - //! .set_stop(RangeValue::Exact(354)) - //! .set_sequence_ffn("ATGGATAAGAAATCTGCTCGTATCCGTCGTGCGACCCGCGCACGCCGCAAGCTCCAGGAG - //!CTGGGCGCAACTCGCCTGGTGGTACATCGTACCCCGCGTCACATTTACGCACAGGTAATT - //!GCACCGAACGGTTCTGAAGTTCTGGTAGCTGCTTCTACTGTAGAAAAAGCTATCGCTGAA - //!CAACTGAAGTACACCGGTAACAAAGACGCGGCTGCAGCTGTGGGTAAAGCTGTCGCTGAA - //!CGCGCTCTGGAAAAAGGCATCAAAGATGTATCCTTTGACCGTTCCGGGTTCCAATATCAT - //!GGTCGTGTCCAGGCACTGGCAGATGCTGCCCGTGAAGCTGGCCTTCAGTTCTAA".to_string()) - //! .set_sequence_faa("MDKKSARIRRATRARRKLQELGATRLVVHRTPRHIYAQVIAPNGSEVLVAASTVEKAIAE - //!QLKYTGNKDAAAAVGKAVAERALEKGIKDVSFDRSGFQYHGRVQALADAAREAGLQF".to_string()) - //! .set_codon_start(1) - //! .set_strand(-1); - //! record.seq_features - //! .set_counter("bb3305".to_string()) - //! .set_start(RangeValue::Exact(364)) - //! .set_stop(RangeValue::Exact(897)) - //! .set_sequence_ffn("ATGTCTCGTGTTGCTAAAGCACCGGTCGTTGTTCCTGCCGGCGTTGACGTAAAAATCAAC - //!GGTCAGGTTATTACGATCAAAGGTAAAAACGGCGAGCTGACTCGTACTCTCAACGATGCT - //!GTTGAAGTTAAACATGCAGATAATACCCTGACCTTCGGTCCGCGTGATGGTTACGCAGAC - //!GGTTGGGCACAGGCTGGTACCGCGCGTGCCCTGCTGAACTCAATGGTTATCGGTGTTACC - //!GAAGGCTTCACTAAGAAGCTGCAGCTGGTTGGTGTAGGTTACCGTGCAGCGGTTAAAGGC - //!AATGTGATTAACCTGTCTCTGGGTTTCTCTCATCCTGTTGACCATCAGCTGCCTGCGGGT - //!ATCACTGCTGAATGTCCGACTCAGACTGAAATCGTGCTGAAAGGCGCTGATAAGCAGGTG - //!ATCGGCCAGGTTGCAGCGGATCTGCGCGCCTACCGTCGTCCTGAGCCTTATAAAGGCAAG - //!GGTGTTCGTTACGCCGACGAAGTCGTGCGTACCAAAGAGGCTAAGAAGAAGTAA".to_string()) - //! .set_sequence_faa("MSRVAKAPVVVPAGVDVKINGQVITIKGKNGELTRTLNDAVEVKHADNTLTFGPRDGYAD - //!GWAQAGTARALLNSMVIGVTEGFTKKLQLVGVGYRAAVKGNVINLSLGFSHPVDHQLPAG - //!ITAECPTQTEIVLKGADKQVIGQVAADLRAYRRPEPYKGKGVRYADEVVRTKEAKKK".to_string()) - //! .set_codon_start(1) - //! .set_strand(-1); - //! //Add the full sequence of the entire record into the record.sequence - //! record.sequence = "TTAGAACTGAAGGCCAGCTTCACGGGCAGCATCTGCCAGTGCCTGGACACGACCATGATA - //!TTGGAACCCGGAACGGTCAAAGGATACATCTTTGATGCCTTTTTCCAGAGCGCGTTCAGC - //!GACAGCTTTACCCACAGCTGCAGCCGCGTCTTTGTTACCGGTGTACTTCAGTTGTTCAGC - //!GATAGCTTTTTCTACAGTAGAAGCAGCTACCAGAACTTCAGAACCGTTCGGTGCAATTAC - //!CTGTGCGTAAATGTGACGCGGGGTACGATGTACCACCAGGCGAGTTGCGCCCAGCTCCTG - //!GAGCTTGCGGCGTGCGCGGGTCGCACGACGGATACGAGCAGATTTCTTATCCATAGTGTT - //!ACCTTACTTCTTCTTAGCCTCTTTGGTACGCACGACTTCGTCGGCGTAACGAACACCCTT - //!GCCTTTATAAGGCTCAGGACGACGGTAGGCGCGCAGATCCGCTGCAACCTGGCCGATCAC - //!CTGCTTATCAGCGCCTTTCAGCACGATTTCAGTCTGAGTCGGACATTCAGCAGTGATACC - //!CGCAGGCAGCTGATGGTCAACAGGATGAGAGAAACCCAGAGACAGGTTAATCACATTGCC - //!TTTAACCGCTGCACGGTAACCTACACCAACCAGCTGCAGCTTCTTAGTGAAGCCTTCGGT - //!AACACCGATAACCATTGAGTTCAGCAGGGCACGCGCGGTACCAGCCTGTGCCCAACCGTC - //!TGCGTAACCATCACGCGGACCGAAGGTCAGGGTATTATCTGCATGTTTAACTTCAACAGC - //!ATCGTTGAGAGTACGAGTCAGCTCGCCGTTTTTACCTTTGATCGTAATAACCTGACCGTT - //!GATTTTTACGTCAACGCCGGCAGGAACAACGACCGGTGCTTTAGCAACACGAGACAT".to_string(); - //! gff_write(seq_region, vec![record], &filename, true); - //! return Ok(()); - //! } - //!``` - //! - use std::io::{self, Write}; - use std::fs; - use regex::Regex; - use std::vec::Vec; - use std::str; - use std::convert::AsRef; - use protein_translate::translate; - use std::path::Path; - use bio::alphabets::dna::revcomp; - use anyhow::anyhow; - use std::collections::BTreeMap; - use std::fs::{OpenOptions, File}; - use anyhow::Context; - use std::collections::HashSet; - use paste::paste; - use std::convert::TryInto; - use chrono::prelude::*; - /// import macro to create get_ functions for the values - use crate::create_getters; - /// import macro to create the set_ functions for the values in a Builder format - use crate::create_builder; - /// An EMBL reader. - pub struct Records - where - B: io::BufRead, - { - reader: Reader, - error_has_occurred: bool, - } - #[automatically_derived] - impl ::core::fmt::Debug for Records - where - B: io::BufRead, - { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - ::core::fmt::Formatter::debug_struct_field2_finish( - f, - "Records", - "reader", - &self.reader, - "error_has_occurred", - &&self.error_has_occurred, - ) - } - } - impl Records - where - B: io::BufRead, - { - #[allow(unused_mut)] - pub fn new(mut reader: Reader) -> Self { - Records { - reader: reader, - error_has_occurred: false, - } - } - } - impl Iterator for Records - where - B: io::BufRead, - { - type Item = Result; - fn next(&mut self) -> Option> { - if self.error_has_occurred { - { - ::std::io::_print( - format_args!("error was encountered in iteration\n"), - ); - }; - None - } else { - let mut record = Record::new(); - match self.reader.read(&mut record) { - Ok(_) => if record.is_empty() { None } else { Some(Ok(record)) } - Err(err) => { - self.error_has_occurred = true; - Some( - Err( - ::anyhow::Error::msg( - ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("next record read error {0:?}", err), - ); - res - }), - ), - ), - ) - } - } - } - } - } - pub trait EmblRead { - fn read(&mut self, record: &mut Record) -> Result; - } - ///per line reader for the file - pub struct Reader { - reader: B, - line_buffer: String, - } - #[automatically_derived] - impl ::core::fmt::Debug for Reader { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - ::core::fmt::Formatter::debug_struct_field2_finish( - f, - "Reader", - "reader", - &self.reader, - "line_buffer", - &&self.line_buffer, - ) - } - } - #[automatically_derived] - impl ::core::default::Default for Reader { - #[inline] - fn default() -> Reader { - Reader { - reader: ::core::default::Default::default(), - line_buffer: ::core::default::Default::default(), - } - } - } - impl Reader> { - /// Read Embl from given file path in given format. - pub fn from_file + std::fmt::Debug>( - path: P, - ) -> anyhow::Result { - fs::File::open(&path) - .map(Reader::new) - .with_context(|| ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("Failed to read Embl from {0:#?}", path), - ); - res - })) - } - } - impl Reader> - where - R: io::Read, - { - pub fn new(reader: R) -> Self { - Reader { - reader: io::BufReader::new(reader), - line_buffer: String::new(), - } - } - } - impl Reader - where - B: io::BufRead, - { - pub fn from_bufread(bufreader: B) -> Self { - Reader { - reader: bufreader, - line_buffer: String::new(), - } - } - pub fn records(self) -> Records { - Records { - reader: self, - error_has_occurred: false, - } - } - } - ///main embl parser - impl<'a, B> EmblRead for Reader - where - B: io::BufRead, - { - #[allow(unused_mut)] - #[allow(unused_variables)] - #[allow(unused_assignments)] - fn read(&mut self, record: &mut Record) -> Result { - record.rec_clear(); - let mut sequences = String::new(); - let mut source_map = SourceAttributeBuilder::new(); - let mut cds = FeatureAttributeBuilder::new(); - let mut seq_features = SequenceAttributeBuilder::new(); - let mut cds_counter: i32 = 0; - let mut source_counter: i32 = 0; - let mut prev_end: u32 = 0; - let mut organism = String::new(); - let mut mol_type = String::new(); - let mut strain = String::new(); - let mut source_name = String::new(); - let mut type_material = String::new(); - let mut theend: u32 = 0; - let mut thestart: u32 = 0; - let mut db_xref = String::new(); - if self.line_buffer.is_empty() { - self.reader.read_line(&mut self.line_buffer)?; - if self.line_buffer.is_empty() { - return Ok(record.to_owned()); - } - } - 'outer: while !self.line_buffer.is_empty() { - if self.line_buffer.starts_with("ID") { - record.rec_clear(); - let mut header_fields: Vec<&str> = self - .line_buffer - .split_whitespace() - .collect(); - let header_len = header_fields.len(); - let mut header_iter = header_fields.iter(); - header_iter.next(); - record.id = header_iter - .next() - .ok_or_else(|| ::anyhow::__private::must_use({ - let error = ::anyhow::__private::format_err( - format_args!("missing record id"), - ); - error - }))? - .to_string(); - if record.id.ends_with(";") { - record.id.pop(); - } - header_iter.next(); - header_iter.next(); - header_iter.next(); - header_iter.next(); - header_iter.next(); - header_iter.next(); - header_iter.next(); - let lens = header_iter - .next() - .ok_or_else(|| ::anyhow::__private::must_use({ - let error = ::anyhow::__private::format_err( - format_args!("missing record length"), - ); - error - }))? - .to_string(); - record.length = lens.trim().parse::()?; - self.line_buffer.clear(); - } - if self.line_buffer.starts_with("FT source") { - let re = Regex::new(r"([0-9]+)[[:punct:]]+([0-9]+)")?; - let location = re - .captures(&self.line_buffer) - .ok_or_else(|| ::anyhow::__private::must_use({ - let error = ::anyhow::__private::format_err( - format_args!("missing location"), - ); - error - }))?; - let start = &location[1]; - let end = &location[2]; - thestart = start.trim().parse::()?; - source_counter += 1; - source_name = ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("source_{0}_{1}", record.id, source_counter), - ); - res - }) - .to_string(); - thestart += prev_end; - theend = end.trim().parse::()? + prev_end; - loop { - self.line_buffer.clear(); - self.reader.read_line(&mut self.line_buffer)?; - if self.line_buffer.starts_with("FT CDS") { - record - .source_map - .set_counter(source_name.to_string()) - .set_start(RangeValue::Exact(thestart)) - .set_stop(RangeValue::Exact(theend)) - .set_organism(organism.clone()) - .set_mol_type(mol_type.clone()) - .set_strain(strain.clone()) - .set_type_material(type_material.clone()) - .set_db_xref(db_xref.clone()); - continue 'outer; - } - if self.line_buffer.contains("/organism") { - let org: Vec<&str> = self.line_buffer.split('\"').collect(); - organism = org[1].to_string(); - } - if self.line_buffer.contains("/mol_type") { - let mol: Vec<&str> = self.line_buffer.split('\"').collect(); - mol_type = mol[1].to_string(); - } - if self.line_buffer.contains("/strain") { - let stra: Vec<&str> = self.line_buffer.split('\"').collect(); - strain = stra[1].to_string(); - } - if self.line_buffer.contains("/type_material") { - let mat: Vec<&str> = self.line_buffer.split('\"').collect(); - type_material = mat[1].to_string(); - } - if self.line_buffer.contains("/db_xref") { - let db: Vec<&str> = self.line_buffer.split('\"').collect(); - db_xref = db[1].to_string(); - } - } - } - if self.line_buffer.starts_with("FT CDS") { - let mut startiter: Vec<_> = Vec::new(); - let mut enditer: Vec<_> = Vec::new(); - let mut thestart: u32 = 0; - let mut thend: u32 = 0; - let mut joined: bool = false; - let joined = if self.line_buffer.contains("join") { - true - } else { - false - }; - let re = Regex::new(r"([0-9]+)[[:punct:]]+([0-9]+)")?; - for cap in re.captures_iter(&self.line_buffer) { - cds_counter += 1; - thestart = cap[1] - .parse() - .expect("failed to match and parse numerical start"); - theend = cap[2] - .parse() - .expect("failed to match and parse numerical end"); - startiter.push(thestart); - enditer.push(theend); - } - let mut gene = String::new(); - let mut product = String::new(); - let strand: i8 = if self.line_buffer.contains("complement") { - -1 - } else { - 1 - }; - let mut locus_tag = String::new(); - let mut codon_start: u8 = 1; - loop { - self.line_buffer.clear(); - self.reader.read_line(&mut self.line_buffer)?; - if self.line_buffer.contains("/locus_tag=") { - let loctag: Vec<&str> = self - .line_buffer - .split('\"') - .collect(); - locus_tag = loctag[1].to_string(); - } - if self.line_buffer.contains("/codon_start") { - let codstart: Vec<&str> = self - .line_buffer - .split('=') - .collect(); - let valstart = codstart[1].trim().parse::()?; - codon_start = valstart; - } - if self.line_buffer.contains("/gene=") { - let gen: Vec<&str> = self.line_buffer.split('\"').collect(); - gene = gen[1].to_string(); - } - if self.line_buffer.contains("/product") { - let prod: Vec<&str> = self.line_buffer.split('\"').collect(); - product = substitute_odd_punctuation(prod[1].to_string())?; - } - if self.line_buffer.starts_with("FT CDS") - || self.line_buffer.starts_with("SQ Sequence") - || self.line_buffer.starts_with("FT intron") - || self.line_buffer.starts_with("FT exon") - || self.line_buffer.starts_with(" misc_feature") - { - if locus_tag.is_empty() { - locus_tag = ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("CDS_{0}", cds_counter), - ); - res - }) - .to_string(); - } - if joined { - for (i, m) in startiter.iter().enumerate() { - let loc_tag = ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("{0}_{1}", locus_tag.clone(), i), - ); - res - }); - record - .cds - .set_counter(loc_tag) - .set_start(RangeValue::Exact(*m)) - .set_stop(RangeValue::Exact(enditer[i])) - .set_gene(gene.to_string()) - .set_product(product.to_string()) - .set_codon_start(codon_start) - .set_strand(strand); - } - continue 'outer; - } else { - record - .cds - .set_counter(locus_tag.clone()) - .set_start(RangeValue::Exact(thestart)) - .set_stop(RangeValue::Exact(theend)) - .set_gene(gene.to_string()) - .set_product(product.to_string()) - .set_codon_start(codon_start) - .set_strand(strand); - continue 'outer; - } - } - } - } - if self.line_buffer.starts_with("SQ Sequence") { - let mut sequences = String::new(); - let result_seq = loop { - self.line_buffer.clear(); - self.reader.read_line(&mut self.line_buffer)?; - if self.line_buffer.starts_with("//") { - break sequences; - } else { - let s: Vec<&str> = self - .line_buffer - .split_whitespace() - .collect(); - let sequence = if s.len() > 1 { - s[0..s.len() - 1].join("") - } else { - String::new() - }; - sequences.push_str(&sequence); - } - }; - record.sequence = result_seq.to_string(); - let mut iterablecount: u32 = 0; - for (key, val) in record.cds.iter_sorted() { - let ( - mut a, - mut b, - mut c, - mut d, - ): (Option, Option, Option, Option) = ( - None, - None, - None, - None, - ); - for value in val { - match value { - FeatureAttributes::Start { value } => { - a = match value { - RangeValue::Exact(v) => Some(*v), - RangeValue::LessThan(v) => Some(*v), - RangeValue::GreaterThan(v) => Some(*v), - }; - } - FeatureAttributes::Stop { value } => { - b = match value { - RangeValue::Exact(v) => Some(*v), - RangeValue::LessThan(v) => Some(*v), - RangeValue::GreaterThan(v) => Some(*v), - }; - } - FeatureAttributes::Strand { value } => { - c = match value { - value => Some(*value), - }; - } - FeatureAttributes::CodonStart { value } => { - d = match value { - value => Some(value.clone()), - }; - } - _ => {} - } - } - let sta = a - .map(|o| o as usize) - .ok_or( - ::anyhow::__private::must_use({ - let error = ::anyhow::__private::format_err( - format_args!("No value for start"), - ); - error - }), - )?; - let sto = b - .map(|t| t as usize) - .ok_or( - ::anyhow::__private::must_use({ - let error = ::anyhow::__private::format_err( - format_args!("No value for stop"), - ); - error - }), - )? - 1; - let stra = c - .map(|u| u as i8) - .ok_or( - ::anyhow::__private::must_use({ - let error = ::anyhow::__private::format_err( - format_args!("No value for strand"), - ); - error - }), - )?; - let cod = d - .map(|v| v as usize - 1) - .ok_or( - ::anyhow::__private::must_use({ - let error = ::anyhow::__private::format_err( - format_args!("No value for strand"), - ); - error - }), - )?; - let star = sta.try_into()?; - let stow = sto.try_into()?; - let codd = cod.try_into()?; - let mut sliced_sequence: &str = ""; - if stra == -1 { - if cod > 1 { - { - ::std::io::_print( - format_args!( - "reverse strand coding start more than one {0:?}\n", - &iterablecount, - ), - ); - }; - if sto + 1 <= record.sequence.len() { - sliced_sequence = &record.sequence[sta + cod..sto + 1]; - } else { - sliced_sequence = &record.sequence[sta + cod..sto]; - } - } else { - { - ::std::io::_print( - format_args!( - "record sta {0:?} sto {1:?} cod {2:?} stra {3:?} record.seq length {4:?}\n", - &sta, - &sto, - &cod, - &stra, - &record.sequence.len(), - ), - ); - }; - { - ::std::io::_print( - format_args!( - "sliced sta {0:?} sliced sto {1:?} record.id {2:?}\n", - sta, - sto, - &record.id, - ), - ); - }; - { - ::std::io::_print( - format_args!( - "iterable count is {0:?} reverse strand codon start one\n", - &iterablecount, - ), - ); - }; - { - ::std::io::_print( - format_args!( - "this is the sequence len {0:?}\n", - &record.sequence.len(), - ), - ); - }; - if sto + 1 <= record.sequence.len() { - sliced_sequence = &record.sequence[sta..sto + 1]; - } else { - sliced_sequence = &record.sequence[sta..sto]; - } - { - ::std::io::_print( - format_args!( - "iterable count after is {0:?}\n", - &iterablecount, - ), - ); - }; - } - let cds_char = sliced_sequence; - let prot_seq = translate(&revcomp(cds_char.as_bytes())); - let parts: Vec<&str> = prot_seq.split('*').collect(); - { - ::std::io::_print( - format_args!("this is the prot_seq {0:?}\n", &prot_seq), - ); - }; - record - .seq_features - .set_counter(key.to_string()) - .set_start(RangeValue::Exact(star)) - .set_stop(RangeValue::Exact(stow)) - .set_sequence_ffn(cds_char.to_string()) - .set_sequence_faa(parts[0].to_string()) - .set_codon_start(codd) - .set_strand(stra); - } else { - if cod > 1 { - sliced_sequence = &record.sequence[sta + cod - 1..sto]; - } else { - sliced_sequence = &record.sequence[sta - 1..sto]; - } - let cds_char = sliced_sequence; - let prot_seq = translate(cds_char.as_bytes()); - let parts: Vec<&str> = prot_seq.split('*').collect(); - record - .seq_features - .set_counter(key.to_string()) - .set_start(RangeValue::Exact(star)) - .set_stop(RangeValue::Exact(stow)) - .set_sequence_ffn(cds_char.to_string()) - .set_sequence_faa(parts[0].to_string()) - .set_codon_start(codd) - .set_strand(stra); - } - } - return Ok(record.to_owned()); - } - self.line_buffer.clear(); - self.reader.read_line(&mut self.line_buffer)?; - } - Ok(record.to_owned()) - } - } - ///stores a value for start or stop (end) which can be denoted as a < value or > value. - pub enum RangeValue { - Exact(u32), - LessThan(u32), - GreaterThan(u32), - } - #[automatically_derived] - impl ::core::fmt::Debug for RangeValue { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - match self { - RangeValue::Exact(__self_0) => { - ::core::fmt::Formatter::debug_tuple_field1_finish( - f, - "Exact", - &__self_0, - ) - } - RangeValue::LessThan(__self_0) => { - ::core::fmt::Formatter::debug_tuple_field1_finish( - f, - "LessThan", - &__self_0, - ) - } - RangeValue::GreaterThan(__self_0) => { - ::core::fmt::Formatter::debug_tuple_field1_finish( - f, - "GreaterThan", - &__self_0, - ) - } - } - } - } - #[automatically_derived] - impl ::core::hash::Hash for RangeValue { - #[inline] - fn hash<__H: ::core::hash::Hasher>(&self, state: &mut __H) -> () { - let __self_discr = ::core::intrinsics::discriminant_value(self); - ::core::hash::Hash::hash(&__self_discr, state); - match self { - RangeValue::Exact(__self_0) => ::core::hash::Hash::hash(__self_0, state), - RangeValue::LessThan(__self_0) => { - ::core::hash::Hash::hash(__self_0, state) - } - RangeValue::GreaterThan(__self_0) => { - ::core::hash::Hash::hash(__self_0, state) - } - } - } - } - #[automatically_derived] - impl ::core::marker::StructuralPartialEq for RangeValue {} - #[automatically_derived] - impl ::core::cmp::PartialEq for RangeValue { - #[inline] - fn eq(&self, other: &RangeValue) -> bool { - let __self_discr = ::core::intrinsics::discriminant_value(self); - let __arg1_discr = ::core::intrinsics::discriminant_value(other); - __self_discr == __arg1_discr - && match (self, other) { - (RangeValue::Exact(__self_0), RangeValue::Exact(__arg1_0)) => { - __self_0 == __arg1_0 - } - (RangeValue::LessThan(__self_0), RangeValue::LessThan(__arg1_0)) => { - __self_0 == __arg1_0 - } - ( - RangeValue::GreaterThan(__self_0), - RangeValue::GreaterThan(__arg1_0), - ) => __self_0 == __arg1_0, - _ => unsafe { ::core::intrinsics::unreachable() } - } - } - } - #[automatically_derived] - impl ::core::cmp::Eq for RangeValue { - #[inline] - #[doc(hidden)] - #[coverage(off)] - fn assert_receiver_is_total_eq(&self) -> () { - let _: ::core::cmp::AssertParamIsEq; - } - } - #[automatically_derived] - impl ::core::clone::Clone for RangeValue { - #[inline] - fn clone(&self) -> RangeValue { - match self { - RangeValue::Exact(__self_0) => { - RangeValue::Exact(::core::clone::Clone::clone(__self_0)) - } - RangeValue::LessThan(__self_0) => { - RangeValue::LessThan(::core::clone::Clone::clone(__self_0)) - } - RangeValue::GreaterThan(__self_0) => { - RangeValue::GreaterThan(::core::clone::Clone::clone(__self_0)) - } - } - } - } - impl RangeValue { - pub fn get_value(&self) -> u32 { - match self { - RangeValue::Exact(value) => *value, - RangeValue::LessThan(value) => *value, - RangeValue::GreaterThan(value) => *value, - } - } - } - ///stores the details of the source features in genbank (contigs) - pub enum SourceAttributes { - Start { value: RangeValue }, - Stop { value: RangeValue }, - Organism { value: String }, - MolType { value: String }, - Strain { value: String }, - CultureCollection { value: String }, - TypeMaterial { value: String }, - DbXref { value: String }, - } - #[automatically_derived] - impl ::core::fmt::Debug for SourceAttributes { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - match self { - SourceAttributes::Start { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Start", - "value", - &__self_0, - ) - } - SourceAttributes::Stop { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Stop", - "value", - &__self_0, - ) - } - SourceAttributes::Organism { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Organism", - "value", - &__self_0, - ) - } - SourceAttributes::MolType { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "MolType", - "value", - &__self_0, - ) - } - SourceAttributes::Strain { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Strain", - "value", - &__self_0, - ) - } - SourceAttributes::CultureCollection { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "CultureCollection", - "value", - &__self_0, - ) - } - SourceAttributes::TypeMaterial { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "TypeMaterial", - "value", - &__self_0, - ) - } - SourceAttributes::DbXref { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "DbXref", - "value", - &__self_0, - ) - } - } - } - } - #[automatically_derived] - impl ::core::cmp::Eq for SourceAttributes { - #[inline] - #[doc(hidden)] - #[coverage(off)] - fn assert_receiver_is_total_eq(&self) -> () { - let _: ::core::cmp::AssertParamIsEq; - let _: ::core::cmp::AssertParamIsEq; - } - } - #[automatically_derived] - impl ::core::marker::StructuralPartialEq for SourceAttributes {} - #[automatically_derived] - impl ::core::cmp::PartialEq for SourceAttributes { - #[inline] - fn eq(&self, other: &SourceAttributes) -> bool { - let __self_discr = ::core::intrinsics::discriminant_value(self); - let __arg1_discr = ::core::intrinsics::discriminant_value(other); - __self_discr == __arg1_discr - && match (self, other) { - ( - SourceAttributes::Start { value: __self_0 }, - SourceAttributes::Start { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SourceAttributes::Stop { value: __self_0 }, - SourceAttributes::Stop { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SourceAttributes::Organism { value: __self_0 }, - SourceAttributes::Organism { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SourceAttributes::MolType { value: __self_0 }, - SourceAttributes::MolType { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SourceAttributes::Strain { value: __self_0 }, - SourceAttributes::Strain { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SourceAttributes::CultureCollection { value: __self_0 }, - SourceAttributes::CultureCollection { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SourceAttributes::TypeMaterial { value: __self_0 }, - SourceAttributes::TypeMaterial { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SourceAttributes::DbXref { value: __self_0 }, - SourceAttributes::DbXref { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - _ => unsafe { ::core::intrinsics::unreachable() } - } - } - } - #[automatically_derived] - impl ::core::hash::Hash for SourceAttributes { - #[inline] - fn hash<__H: ::core::hash::Hasher>(&self, state: &mut __H) -> () { - let __self_discr = ::core::intrinsics::discriminant_value(self); - ::core::hash::Hash::hash(&__self_discr, state); - match self { - SourceAttributes::Start { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SourceAttributes::Stop { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SourceAttributes::Organism { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SourceAttributes::MolType { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SourceAttributes::Strain { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SourceAttributes::CultureCollection { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SourceAttributes::TypeMaterial { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SourceAttributes::DbXref { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - } - } - } - #[automatically_derived] - impl ::core::clone::Clone for SourceAttributes { - #[inline] - fn clone(&self) -> SourceAttributes { - match self { - SourceAttributes::Start { value: __self_0 } => { - SourceAttributes::Start { - value: ::core::clone::Clone::clone(__self_0), - } - } - SourceAttributes::Stop { value: __self_0 } => { - SourceAttributes::Stop { - value: ::core::clone::Clone::clone(__self_0), - } - } - SourceAttributes::Organism { value: __self_0 } => { - SourceAttributes::Organism { - value: ::core::clone::Clone::clone(__self_0), - } - } - SourceAttributes::MolType { value: __self_0 } => { - SourceAttributes::MolType { - value: ::core::clone::Clone::clone(__self_0), - } - } - SourceAttributes::Strain { value: __self_0 } => { - SourceAttributes::Strain { - value: ::core::clone::Clone::clone(__self_0), - } - } - SourceAttributes::CultureCollection { value: __self_0 } => { - SourceAttributes::CultureCollection { - value: ::core::clone::Clone::clone(__self_0), - } - } - SourceAttributes::TypeMaterial { value: __self_0 } => { - SourceAttributes::TypeMaterial { - value: ::core::clone::Clone::clone(__self_0), - } - } - SourceAttributes::DbXref { value: __self_0 } => { - SourceAttributes::DbXref { - value: ::core::clone::Clone::clone(__self_0), - } - } - } - } - } - impl SourceAttributeBuilder { - pub fn get_start(&self, key: &str) -> Option<&RangeValue> { - self.source_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SourceAttributes::Start { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_stop(&self, key: &str) -> Option<&RangeValue> { - self.source_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SourceAttributes::Stop { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_organism(&self, key: &str) -> Option<&String> { - self.source_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SourceAttributes::Organism { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_mol_type(&self, key: &str) -> Option<&String> { - self.source_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SourceAttributes::MolType { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_strain(&self, key: &str) -> Option<&String> { - self.source_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SourceAttributes::Strain { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_type_material(&self, key: &str) -> Option<&String> { - self.source_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SourceAttributes::TypeMaterial { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_db_xref(&self, key: &str) -> Option<&String> { - self.source_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SourceAttributes::DbXref { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - } - ///builder for the source information on a per record basis - pub struct SourceAttributeBuilder { - pub source_attributes: BTreeMap>, - pub source_name: Option, - } - #[automatically_derived] - impl ::core::fmt::Debug for SourceAttributeBuilder { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - ::core::fmt::Formatter::debug_struct_field2_finish( - f, - "SourceAttributeBuilder", - "source_attributes", - &self.source_attributes, - "source_name", - &&self.source_name, - ) - } - } - #[automatically_derived] - impl ::core::default::Default for SourceAttributeBuilder { - #[inline] - fn default() -> SourceAttributeBuilder { - SourceAttributeBuilder { - source_attributes: ::core::default::Default::default(), - source_name: ::core::default::Default::default(), - } - } - } - #[automatically_derived] - impl ::core::clone::Clone for SourceAttributeBuilder { - #[inline] - fn clone(&self) -> SourceAttributeBuilder { - SourceAttributeBuilder { - source_attributes: ::core::clone::Clone::clone(&self.source_attributes), - source_name: ::core::clone::Clone::clone(&self.source_name), - } - } - } - impl SourceAttributeBuilder { - pub fn set_source_name(&mut self, name: String) { - self.source_name = Some(name); - } - pub fn get_source_name(&self) -> Option<&String> { - self.source_name.as_ref() - } - pub fn add_source_attribute( - &mut self, - key: String, - attribute: SourceAttributes, - ) { - self.source_attributes - .entry(key) - .or_insert_with(HashSet::new) - .insert(attribute); - } - pub fn get_source_attributes( - &self, - key: &str, - ) -> Option<&HashSet> { - self.source_attributes.get(key) - } - } - impl SourceAttributeBuilder { - pub fn new() -> Self { - SourceAttributeBuilder { - source_attributes: BTreeMap::new(), - source_name: None, - } - } - pub fn set_counter(&mut self, counter: String) -> &mut Self { - self.source_name = Some(counter); - self - } - pub fn insert_to(&mut self, value: SourceAttributes) { - if let Some(counter) = &self.source_name { - self.source_attributes - .entry(counter.to_string()) - .or_insert_with(HashSet::new) - .insert(value); - } else { - { - ::core::panicking::panic_fmt(format_args!("Counter key not set")); - }; - } - } - pub fn set_start(&mut self, value: RangeValue) -> &mut Self { - self.insert_to(SourceAttributes::Start { value }); - self - } - pub fn set_stop(&mut self, value: RangeValue) -> &mut Self { - self.insert_to(SourceAttributes::Stop { value }); - self - } - pub fn set_organism(&mut self, value: String) -> &mut Self { - self.insert_to(SourceAttributes::Organism { - value, - }); - self - } - pub fn set_mol_type(&mut self, value: String) -> &mut Self { - self.insert_to(SourceAttributes::MolType { value }); - self - } - pub fn set_strain(&mut self, value: String) -> &mut Self { - self.insert_to(SourceAttributes::Strain { value }); - self - } - pub fn set_type_material(&mut self, value: String) -> &mut Self { - self.insert_to(SourceAttributes::TypeMaterial { - value, - }); - self - } - pub fn set_db_xref(&mut self, value: String) -> &mut Self { - self.insert_to(SourceAttributes::DbXref { value }); - self - } - pub fn build(self) -> BTreeMap> { - self.source_attributes - } - pub fn iter_sorted( - &self, - ) -> std::collections::btree_map::Iter> { - self.source_attributes.iter() - } - pub fn default() -> Self { - SourceAttributeBuilder { - source_attributes: BTreeMap::new(), - source_name: None, - } - } - } - ///attributes for each feature, cds or gene - pub enum FeatureAttributes { - Start { value: RangeValue }, - Stop { value: RangeValue }, - Gene { value: String }, - Product { value: String }, - CodonStart { value: u8 }, - Strand { value: i8 }, - } - #[automatically_derived] - impl ::core::fmt::Debug for FeatureAttributes { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - match self { - FeatureAttributes::Start { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Start", - "value", - &__self_0, - ) - } - FeatureAttributes::Stop { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Stop", - "value", - &__self_0, - ) - } - FeatureAttributes::Gene { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Gene", - "value", - &__self_0, - ) - } - FeatureAttributes::Product { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Product", - "value", - &__self_0, - ) - } - FeatureAttributes::CodonStart { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "CodonStart", - "value", - &__self_0, - ) - } - FeatureAttributes::Strand { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Strand", - "value", - &__self_0, - ) - } - } - } - } - #[automatically_derived] - impl ::core::cmp::Eq for FeatureAttributes { - #[inline] - #[doc(hidden)] - #[coverage(off)] - fn assert_receiver_is_total_eq(&self) -> () { - let _: ::core::cmp::AssertParamIsEq; - let _: ::core::cmp::AssertParamIsEq; - let _: ::core::cmp::AssertParamIsEq; - let _: ::core::cmp::AssertParamIsEq; - } - } - #[automatically_derived] - impl ::core::hash::Hash for FeatureAttributes { - #[inline] - fn hash<__H: ::core::hash::Hasher>(&self, state: &mut __H) -> () { - let __self_discr = ::core::intrinsics::discriminant_value(self); - ::core::hash::Hash::hash(&__self_discr, state); - match self { - FeatureAttributes::Start { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - FeatureAttributes::Stop { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - FeatureAttributes::Gene { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - FeatureAttributes::Product { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - FeatureAttributes::CodonStart { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - FeatureAttributes::Strand { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - } - } - } - #[automatically_derived] - impl ::core::marker::StructuralPartialEq for FeatureAttributes {} - #[automatically_derived] - impl ::core::cmp::PartialEq for FeatureAttributes { - #[inline] - fn eq(&self, other: &FeatureAttributes) -> bool { - let __self_discr = ::core::intrinsics::discriminant_value(self); - let __arg1_discr = ::core::intrinsics::discriminant_value(other); - __self_discr == __arg1_discr - && match (self, other) { - ( - FeatureAttributes::Start { value: __self_0 }, - FeatureAttributes::Start { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - FeatureAttributes::Stop { value: __self_0 }, - FeatureAttributes::Stop { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - FeatureAttributes::Gene { value: __self_0 }, - FeatureAttributes::Gene { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - FeatureAttributes::Product { value: __self_0 }, - FeatureAttributes::Product { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - FeatureAttributes::CodonStart { value: __self_0 }, - FeatureAttributes::CodonStart { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - FeatureAttributes::Strand { value: __self_0 }, - FeatureAttributes::Strand { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - _ => unsafe { ::core::intrinsics::unreachable() } - } - } - } - #[automatically_derived] - impl ::core::clone::Clone for FeatureAttributes { - #[inline] - fn clone(&self) -> FeatureAttributes { - match self { - FeatureAttributes::Start { value: __self_0 } => { - FeatureAttributes::Start { - value: ::core::clone::Clone::clone(__self_0), - } - } - FeatureAttributes::Stop { value: __self_0 } => { - FeatureAttributes::Stop { - value: ::core::clone::Clone::clone(__self_0), - } - } - FeatureAttributes::Gene { value: __self_0 } => { - FeatureAttributes::Gene { - value: ::core::clone::Clone::clone(__self_0), - } - } - FeatureAttributes::Product { value: __self_0 } => { - FeatureAttributes::Product { - value: ::core::clone::Clone::clone(__self_0), - } - } - FeatureAttributes::CodonStart { value: __self_0 } => { - FeatureAttributes::CodonStart { - value: ::core::clone::Clone::clone(__self_0), - } - } - FeatureAttributes::Strand { value: __self_0 } => { - FeatureAttributes::Strand { - value: ::core::clone::Clone::clone(__self_0), - } - } - } - } - } - impl FeatureAttributeBuilder { - pub fn get_start(&self, key: &str) -> Option<&RangeValue> { - self.attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let FeatureAttributes::Start { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_stop(&self, key: &str) -> Option<&RangeValue> { - self.attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let FeatureAttributes::Stop { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_gene(&self, key: &str) -> Option<&String> { - self.attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let FeatureAttributes::Gene { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_product(&self, key: &str) -> Option<&String> { - self.attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let FeatureAttributes::Product { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_codon_start(&self, key: &str) -> Option<&u8> { - self.attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let FeatureAttributes::CodonStart { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_strand(&self, key: &str) -> Option<&i8> { - self.attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let FeatureAttributes::Strand { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - } - ///builder for the feature information on a per coding sequence (CDS) basis - pub struct FeatureAttributeBuilder { - pub attributes: BTreeMap>, - locus_tag: Option, - } - #[automatically_derived] - impl ::core::fmt::Debug for FeatureAttributeBuilder { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - ::core::fmt::Formatter::debug_struct_field2_finish( - f, - "FeatureAttributeBuilder", - "attributes", - &self.attributes, - "locus_tag", - &&self.locus_tag, - ) - } - } - #[automatically_derived] - impl ::core::default::Default for FeatureAttributeBuilder { - #[inline] - fn default() -> FeatureAttributeBuilder { - FeatureAttributeBuilder { - attributes: ::core::default::Default::default(), - locus_tag: ::core::default::Default::default(), - } - } - } - #[automatically_derived] - impl ::core::clone::Clone for FeatureAttributeBuilder { - #[inline] - fn clone(&self) -> FeatureAttributeBuilder { - FeatureAttributeBuilder { - attributes: ::core::clone::Clone::clone(&self.attributes), - locus_tag: ::core::clone::Clone::clone(&self.locus_tag), - } - } - } - impl FeatureAttributeBuilder { - pub fn new() -> Self { - FeatureAttributeBuilder { - attributes: BTreeMap::new(), - locus_tag: None, - } - } - pub fn set_counter(&mut self, counter: String) -> &mut Self { - self.locus_tag = Some(counter); - self - } - pub fn insert_to(&mut self, value: FeatureAttributes) { - if let Some(counter) = &self.locus_tag { - self.attributes - .entry(counter.to_string()) - .or_insert_with(HashSet::new) - .insert(value); - } else { - { - ::core::panicking::panic_fmt(format_args!("Counter key not set")); - }; - } - } - pub fn set_start(&mut self, value: RangeValue) -> &mut Self { - self.insert_to(FeatureAttributes::Start { value }); - self - } - pub fn set_stop(&mut self, value: RangeValue) -> &mut Self { - self.insert_to(FeatureAttributes::Stop { value }); - self - } - pub fn set_gene(&mut self, value: String) -> &mut Self { - self.insert_to(FeatureAttributes::Gene { value }); - self - } - pub fn set_product(&mut self, value: String) -> &mut Self { - self.insert_to(FeatureAttributes::Product { - value, - }); - self - } - pub fn set_codon_start(&mut self, value: u8) -> &mut Self { - self.insert_to(FeatureAttributes::CodonStart { - value, - }); - self - } - pub fn set_strand(&mut self, value: i8) -> &mut Self { - self.insert_to(FeatureAttributes::Strand { value }); - self - } - pub fn build(self) -> BTreeMap> { - self.attributes - } - pub fn iter_sorted( - &self, - ) -> std::collections::btree_map::Iter> { - self.attributes.iter() - } - pub fn default() -> Self { - FeatureAttributeBuilder { - attributes: BTreeMap::new(), - locus_tag: None, - } - } - } - ///stores the sequences of the coding sequences (genes) and proteins. Also stores start, stop, codon_start and strand information - pub enum SequenceAttributes { - Start { value: RangeValue }, - Stop { value: RangeValue }, - SequenceFfn { value: String }, - SequenceFaa { value: String }, - CodonStart { value: u8 }, - Strand { value: i8 }, - } - #[automatically_derived] - impl ::core::fmt::Debug for SequenceAttributes { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - match self { - SequenceAttributes::Start { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Start", - "value", - &__self_0, - ) - } - SequenceAttributes::Stop { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Stop", - "value", - &__self_0, - ) - } - SequenceAttributes::SequenceFfn { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "SequenceFfn", - "value", - &__self_0, - ) - } - SequenceAttributes::SequenceFaa { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "SequenceFaa", - "value", - &__self_0, - ) - } - SequenceAttributes::CodonStart { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "CodonStart", - "value", - &__self_0, - ) - } - SequenceAttributes::Strand { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Strand", - "value", - &__self_0, - ) - } - } - } - } - #[automatically_derived] - impl ::core::cmp::Eq for SequenceAttributes { - #[inline] - #[doc(hidden)] - #[coverage(off)] - fn assert_receiver_is_total_eq(&self) -> () { - let _: ::core::cmp::AssertParamIsEq; - let _: ::core::cmp::AssertParamIsEq; - let _: ::core::cmp::AssertParamIsEq; - let _: ::core::cmp::AssertParamIsEq; - } - } - #[automatically_derived] - impl ::core::marker::StructuralPartialEq for SequenceAttributes {} - #[automatically_derived] - impl ::core::cmp::PartialEq for SequenceAttributes { - #[inline] - fn eq(&self, other: &SequenceAttributes) -> bool { - let __self_discr = ::core::intrinsics::discriminant_value(self); - let __arg1_discr = ::core::intrinsics::discriminant_value(other); - __self_discr == __arg1_discr - && match (self, other) { - ( - SequenceAttributes::Start { value: __self_0 }, - SequenceAttributes::Start { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SequenceAttributes::Stop { value: __self_0 }, - SequenceAttributes::Stop { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SequenceAttributes::SequenceFfn { value: __self_0 }, - SequenceAttributes::SequenceFfn { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SequenceAttributes::SequenceFaa { value: __self_0 }, - SequenceAttributes::SequenceFaa { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SequenceAttributes::CodonStart { value: __self_0 }, - SequenceAttributes::CodonStart { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SequenceAttributes::Strand { value: __self_0 }, - SequenceAttributes::Strand { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - _ => unsafe { ::core::intrinsics::unreachable() } - } - } - } - #[automatically_derived] - impl ::core::hash::Hash for SequenceAttributes { - #[inline] - fn hash<__H: ::core::hash::Hasher>(&self, state: &mut __H) -> () { - let __self_discr = ::core::intrinsics::discriminant_value(self); - ::core::hash::Hash::hash(&__self_discr, state); - match self { - SequenceAttributes::Start { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SequenceAttributes::Stop { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SequenceAttributes::SequenceFfn { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SequenceAttributes::SequenceFaa { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SequenceAttributes::CodonStart { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SequenceAttributes::Strand { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - } - } - } - #[automatically_derived] - impl ::core::clone::Clone for SequenceAttributes { - #[inline] - fn clone(&self) -> SequenceAttributes { - match self { - SequenceAttributes::Start { value: __self_0 } => { - SequenceAttributes::Start { - value: ::core::clone::Clone::clone(__self_0), - } - } - SequenceAttributes::Stop { value: __self_0 } => { - SequenceAttributes::Stop { - value: ::core::clone::Clone::clone(__self_0), - } - } - SequenceAttributes::SequenceFfn { value: __self_0 } => { - SequenceAttributes::SequenceFfn { - value: ::core::clone::Clone::clone(__self_0), - } - } - SequenceAttributes::SequenceFaa { value: __self_0 } => { - SequenceAttributes::SequenceFaa { - value: ::core::clone::Clone::clone(__self_0), - } - } - SequenceAttributes::CodonStart { value: __self_0 } => { - SequenceAttributes::CodonStart { - value: ::core::clone::Clone::clone(__self_0), - } - } - SequenceAttributes::Strand { value: __self_0 } => { - SequenceAttributes::Strand { - value: ::core::clone::Clone::clone(__self_0), - } - } - } - } - } - impl SequenceAttributeBuilder { - pub fn get_start(&self, key: &str) -> Option<&RangeValue> { - self.seq_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SequenceAttributes::Start { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_stop(&self, key: &str) -> Option<&RangeValue> { - self.seq_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SequenceAttributes::Stop { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_sequence_ffn(&self, key: &str) -> Option<&String> { - self.seq_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SequenceAttributes::SequenceFfn { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_sequence_faa(&self, key: &str) -> Option<&String> { - self.seq_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SequenceAttributes::SequenceFaa { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_codon_start(&self, key: &str) -> Option<&u8> { - self.seq_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SequenceAttributes::CodonStart { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_strand(&self, key: &str) -> Option<&i8> { - self.seq_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SequenceAttributes::Strand { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - } - ///builder for the sequence information on a per coding sequence (CDS) basis - pub struct SequenceAttributeBuilder { - pub seq_attributes: BTreeMap>, - locus_tag: Option, - } - #[automatically_derived] - impl ::core::fmt::Debug for SequenceAttributeBuilder { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - ::core::fmt::Formatter::debug_struct_field2_finish( - f, - "SequenceAttributeBuilder", - "seq_attributes", - &self.seq_attributes, - "locus_tag", - &&self.locus_tag, - ) - } - } - #[automatically_derived] - impl ::core::default::Default for SequenceAttributeBuilder { - #[inline] - fn default() -> SequenceAttributeBuilder { - SequenceAttributeBuilder { - seq_attributes: ::core::default::Default::default(), - locus_tag: ::core::default::Default::default(), - } - } - } - #[automatically_derived] - impl ::core::clone::Clone for SequenceAttributeBuilder { - #[inline] - fn clone(&self) -> SequenceAttributeBuilder { - SequenceAttributeBuilder { - seq_attributes: ::core::clone::Clone::clone(&self.seq_attributes), - locus_tag: ::core::clone::Clone::clone(&self.locus_tag), - } - } - } - impl SequenceAttributeBuilder { - pub fn new() -> Self { - SequenceAttributeBuilder { - seq_attributes: BTreeMap::new(), - locus_tag: None, - } - } - pub fn set_counter(&mut self, counter: String) -> &mut Self { - self.locus_tag = Some(counter); - self - } - pub fn insert_to(&mut self, value: SequenceAttributes) { - if let Some(counter) = &self.locus_tag { - self.seq_attributes - .entry(counter.to_string()) - .or_insert_with(HashSet::new) - .insert(value); - } else { - { - ::core::panicking::panic_fmt(format_args!("Counter key not set")); - }; - } - } - pub fn set_start(&mut self, value: RangeValue) -> &mut Self { - self.insert_to(SequenceAttributes::Start { value }); - self - } - pub fn set_stop(&mut self, value: RangeValue) -> &mut Self { - self.insert_to(SequenceAttributes::Stop { value }); - self - } - pub fn set_sequence_ffn(&mut self, value: String) -> &mut Self { - self.insert_to(SequenceAttributes::SequenceFfn { - value, - }); - self - } - pub fn set_sequence_faa(&mut self, value: String) -> &mut Self { - self.insert_to(SequenceAttributes::SequenceFaa { - value, - }); - self - } - pub fn set_codon_start(&mut self, value: u8) -> &mut Self { - self.insert_to(SequenceAttributes::CodonStart { - value, - }); - self - } - pub fn set_strand(&mut self, value: i8) -> &mut Self { - self.insert_to(SequenceAttributes::Strand { - value, - }); - self - } - pub fn build(self) -> BTreeMap> { - self.seq_attributes - } - pub fn iter_sorted( - &self, - ) -> std::collections::btree_map::Iter> { - self.seq_attributes.iter() - } - pub fn default() -> Self { - SequenceAttributeBuilder { - seq_attributes: BTreeMap::new(), - locus_tag: None, - } - } - } - ///product lines can contain difficult to parse punctuation such as biochemical symbols like unclosed single quotes, superscripts, single and double brackets etc. - ///here we substitute these for an underscore - pub fn substitute_odd_punctuation(input: String) -> Result { - let re = Regex::new(r"[/?()',`]|[α-ωΑ-Ω]")?; - let cleaned = input.trim_end_matches(&['\r', '\n'][..]); - Ok(re.replace_all(cleaned, "_").to_string()) - } - ///GFF3 field9 construct - pub struct GFFInner { - id: String, - name: String, - locus_tag: String, - gene: String, - product: String, - } - #[automatically_derived] - impl ::core::fmt::Debug for GFFInner { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - ::core::fmt::Formatter::debug_struct_field5_finish( - f, - "GFFInner", - "id", - &self.id, - "name", - &self.name, - "locus_tag", - &self.locus_tag, - "gene", - &self.gene, - "product", - &&self.product, - ) - } - } - impl GFFInner { - pub fn new( - id: String, - name: String, - locus_tag: String, - gene: String, - product: String, - ) -> Self { - GFFInner { - id, - name, - locus_tag, - gene, - product, - } - } - } - ///The main GFF3 construct - pub struct GFFOuter<'a> { - seqid: String, - source: String, - type_val: String, - start: u32, - end: u32, - score: f64, - strand: String, - phase: u8, - attributes: &'a GFFInner, - } - #[automatically_derived] - impl<'a> ::core::fmt::Debug for GFFOuter<'a> { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - let names: &'static _ = &[ - "seqid", - "source", - "type_val", - "start", - "end", - "score", - "strand", - "phase", - "attributes", - ]; - let values: &[&dyn ::core::fmt::Debug] = &[ - &self.seqid, - &self.source, - &self.type_val, - &self.start, - &self.end, - &self.score, - &self.strand, - &self.phase, - &&self.attributes, - ]; - ::core::fmt::Formatter::debug_struct_fields_finish( - f, - "GFFOuter", - names, - values, - ) - } - } - impl<'a> GFFOuter<'a> { - pub fn new( - seqid: String, - source: String, - type_val: String, - start: u32, - end: u32, - score: f64, - strand: String, - phase: u8, - attributes: &'a GFFInner, - ) -> Self { - GFFOuter { - seqid, - source, - type_val, - start, - end, - score, - strand, - phase, - attributes, - } - } - pub fn field9_attributes_build(&self) -> String { - let mut full_field9 = Vec::new(); - if !self.attributes.id.is_empty() { - full_field9 - .push( - ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("id={0}", self.attributes.id), - ); - res - }), - ); - } - if !self.attributes.name.is_empty() { - full_field9 - .push( - ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("name={0}", self.attributes.name), - ); - res - }), - ); - } - if !self.attributes.gene.is_empty() { - full_field9 - .push( - ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("gene={0}", self.attributes.gene), - ); - res - }), - ); - } - if !self.attributes.locus_tag.is_empty() { - full_field9 - .push( - ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("locus_tag={0}", self.attributes.locus_tag), - ); - res - }), - ); - } - if !self.attributes.product.is_empty() { - full_field9 - .push( - ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("product={0}", self.attributes.product), - ); - res - }), - ); - } - full_field9.join(";") - } - } - ///formats the translation string which can be mulitple lines, for embl - pub fn format_translation(translation: &str) -> String { - let mut formatted = String::new(); - let cleaned_translation = translation.replace("\n", ""); - formatted.push_str(" /translation=\""); - let line_length: usize = 60; - let final_num = line_length - 15; - formatted - .push_str( - &::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("{0}\n", &cleaned_translation[0..final_num]), - ); - res - }), - ); - for i in (47..translation.len()).step_by(60) { - let end = i + 60 - 1; - let valid_end = if end >= translation.len() { - &cleaned_translation.len() - 1 - } else { - end - }; - formatted - .push_str( - &::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!( - " {0}", - &cleaned_translation[i..valid_end], - ), - ); - res - }), - ); - { - ::std::io::_print( - format_args!( - "cleaned translation leng is {0:?}\n", - &cleaned_translation[i..valid_end].len(), - ), - ); - }; - if *&cleaned_translation[i..valid_end].len() < 59 { - formatted.push('\"'); - } else { - formatted.push('\n'); - } - } - formatted - } - ///writes the DNA sequence in gbk format with numbering - pub fn write_gbk_format_sequence(sequence: &str, file: &mut File) -> io::Result<()> { - file.write_fmt(format_args!("ORIGIN\n"))?; - let mut formatted = String::new(); - let cleaned_input = sequence.replace("\n", ""); - let mut index = 1; - for (_i, chunk) in cleaned_input.as_bytes().chunks(60).enumerate() { - formatted - .push_str( - &::alloc::__export::must_use({ - let res = ::alloc::fmt::format(format_args!("{0:>5} ", index)); - res - }), - ); - for (j, sub_chunk) in chunk.chunks(10).enumerate() { - if j > 0 { - formatted.push(' '); - } - formatted.push_str(&String::from_utf8_lossy(sub_chunk)); - } - formatted.push('\n'); - index += 60; - } - file.write_fmt(format_args!("{0:>6}\n", &formatted))?; - file.write_fmt(format_args!("//\n"))?; - Ok(()) - } - ///saves the parsed data in genbank format - pub fn gbk_write( - seq_region: BTreeMap, - record_vec: Vec, - filename: &str, - ) -> io::Result<()> { - let now = Local::now(); - let formatted_date = now.format("%d-%b-%Y").to_string().to_uppercase(); - let mut file = OpenOptions::new() - .write(true) - .append(true) - .create(true) - .open(filename)?; - for (i, (key, _val)) in seq_region.iter().enumerate() { - let strain = match &record_vec[i].source_map.get_strain(key) { - Some(value) => value.to_string(), - None => "Unknown".to_string(), - }; - let organism = match &record_vec[i].source_map.get_organism(key) { - Some(value) => value.to_string(), - None => "Unknown".to_string(), - }; - let mol_type = match &record_vec[i].source_map.get_mol_type(key) { - Some(value) => value.to_string(), - None => "Unknown".to_string(), - }; - let type_material = match &record_vec[i].source_map.get_type_material(&key) { - Some(value) => value.to_string(), - None => "Unknown".to_string(), - }; - let db_xref = match &record_vec[i].source_map.get_db_xref(key) { - Some(value) => value.to_string(), - None => "Unknown".to_string(), - }; - let source_stop = match &record_vec[i].source_map.get_stop(key) { - Some(value) => value.get_value(), - None => { - { - { - ::std::io::_print(format_args!("stop value not found\n")); - }; - None - } - .expect("stop value not received") - } - }; - file.write_fmt( - format_args!( - "LOCUS {0} {1} bp DNA linear CON {2}\n", - &key, - &record_vec[i].sequence.len(), - &formatted_date, - ), - )?; - file.write_fmt(format_args!("DEFINITION {0} {1}.\n", &organism, &strain))?; - file.write_fmt(format_args!("ACCESSION {0}\n", &key))?; - file.write_fmt(format_args!("KEYWORDS .\n"))?; - file.write_fmt(format_args!("SOURCE {0} {1}\n", &organism, &strain))?; - file.write_fmt(format_args!(" ORGANISM {0} {1}\n", &organism, &strain))?; - file.write_fmt(format_args!("FEATURES Location/Qualifiers\n"))?; - file.write_fmt(format_args!(" source 1..{0}\n", &source_stop))?; - file.write_fmt( - format_args!(" /organism=\"{0}\"\n", &strain), - )?; - file.write_fmt( - format_args!(" /mol_type=\"{0}\"\n", &mol_type), - )?; - file.write_fmt( - format_args!(" /strain=\"{0}\"\n", &strain), - )?; - if type_material != *"Unknown".to_string() { - file.write_fmt( - format_args!( - " /type_material=\"{0}\"\n", - &type_material, - ), - )?; - } - file.write_fmt( - format_args!(" /db_xref=\"{0}\"\n", &db_xref), - )?; - for (locus_tag, _value) in &record_vec[i].cds.attributes { - let start = match &record_vec[i].cds.get_start(locus_tag) { - Some(value) => value.get_value(), - None => { - { - { - ::std::io::_print(format_args!("start value not found\n")); - }; - None - } - .expect("start value not received") - } - }; - let stop = match &record_vec[i].cds.get_stop(locus_tag) { - Some(value) => value.get_value(), - None => { - { - { - ::std::io::_print(format_args!("stop value not found\n")); - }; - None - } - .expect("stop value not received") - } - }; - let product = match &record_vec[i].cds.get_product(locus_tag) { - Some(value) => value.to_string(), - None => "unknown product".to_string(), - }; - let strand = match &record_vec[i].cds.get_strand(locus_tag) { - Some(value) => **value, - None => 0, - }; - let codon_start = match &record_vec[i].cds.get_codon_start(locus_tag) { - Some(value) => **value, - None => 0, - }; - let gene = match &record_vec[i].cds.get_gene(locus_tag) { - Some(value) => value.to_string(), - None => "unknown".to_string(), - }; - let translation = match &record_vec[i] - .seq_features - .get_sequence_faa(locus_tag) - { - Some(value) => value.to_string(), - None => "unknown".to_string(), - }; - if strand == 1 { - file.write_fmt( - format_args!(" gene {0}..{1}\n", &start, &stop), - )?; - } else { - file.write_fmt( - format_args!( - " gene complement({0}..{1})\n", - &start, - &stop, - ), - )?; - } - file.write_fmt( - format_args!(" /locus_tag=\"{0}\"\n", &locus_tag), - )?; - if strand == 1 { - file.write_fmt( - format_args!(" CDS {0}..{1}\n", &start, &stop), - )?; - } else { - file.write_fmt( - format_args!( - " CDS complement({0}..{1})\n", - &start, - &stop, - ), - )?; - } - file.write_fmt( - format_args!(" /locus_tag=\"{0}\"\n", &locus_tag), - )?; - file.write_fmt( - format_args!( - " /codon_start=\"{0}\"\n", - &codon_start, - ), - )?; - if gene != "unknown" { - file.write_fmt( - format_args!(" /gene=\"{0}\"\n", &gene), - )?; - } - if translation != "unknown" { - let formatted_translation = format_translation(&translation); - file.write_fmt(format_args!("{0}\n", &formatted_translation))?; - } - file.write_fmt( - format_args!(" /product=\"{0}\"\n", &product), - )?; - } - write_gbk_format_sequence(&record_vec[i].sequence, &mut file)?; - } - Ok(()) - } - ///saves the parsed data in gff3 format - #[allow(unused_assignments)] - #[allow(unused_variables)] - pub fn gff_write( - seq_region: BTreeMap, - mut record_vec: Vec, - filename: &str, - dna: bool, - ) -> io::Result<()> { - let mut file = OpenOptions::new().append(true).create(true).open(filename)?; - if file.metadata()?.len() == 0 { - file.write_fmt(format_args!("##gff-version 3\n"))?; - } - let mut full_seq = String::new(); - let mut prev_end: u32 = 0; - for (k, v) in seq_region.iter() { - file.write_fmt( - format_args!("##sequence-region\t{0}\t{1}\t{2}\n", &k, v.0, v.1), - )?; - } - for ((source_name, (seq_start, seq_end)), record) in seq_region - .iter() - .zip(record_vec.drain(..)) - { - if dna == true { - full_seq.push_str(&record.sequence); - } - for (locus_tag, _valu) in &record.cds.attributes { - let start = match record.cds.get_start(&locus_tag) { - Some(value) => value.get_value(), - None => { - { - { - ::std::io::_print(format_args!("start value not found\n")); - }; - None - } - .expect("start value not received") - } - }; - let stop = match record.cds.get_stop(&locus_tag) { - Some(value) => value.get_value(), - None => { - { - { - ::std::io::_print(format_args!("stop value not found\n")); - }; - None - } - .expect("stop value not received") - } - }; - let gene = match record.cds.get_gene(&locus_tag) { - Some(value) => value.to_string(), - None => "unknown".to_string(), - }; - let product = match record.cds.get_product(&locus_tag) { - Some(value) => value.to_string(), - None => "unknown product".to_string(), - }; - let strand = match record.cds.get_strand(&locus_tag) { - Some(valu) => { - match valu { - 1 => "+".to_string(), - -1 => "-".to_string(), - _ => { - { - ::std::io::_print( - format_args!( - "unexpected strand value {0} for locus_tag {1}\n", - valu, - &locus_tag, - ), - ); - }; - "unknownstrand".to_string() - } - } - } - None => "unknownvalue".to_string(), - }; - let phase = match record.cds.get_codon_start(&locus_tag) { - Some(valuer) => { - match valuer { - 1 => 0, - 2 => 1, - 3 => 2, - _ => { - { - ::std::io::_print( - format_args!( - "unexpected phase value {0} in the bagging area for locus_tag {1}\n", - valuer, - &locus_tag, - ), - ); - }; - 1 - } - } - } - None => 1, - }; - let gff_inner = GFFInner::new( - locus_tag.to_string(), - source_name.clone(), - locus_tag.to_string(), - gene, - product, - ); - let gff_outer = GFFOuter::new( - source_name.clone(), - ".".to_string(), - "CDS".to_string(), - start + prev_end, - stop + prev_end, - 0.0, - strand, - phase, - &gff_inner, - ); - let field9_attributes = gff_outer.field9_attributes_build(); - file.write_fmt( - format_args!( - "{0}\t{1}\t{2}\t{3:?}\t{4:?}\t{5}\t{6}\t{7}\t{8}\n", - gff_outer.seqid, - gff_outer.source, - gff_outer.type_val, - gff_outer.start, - gff_outer.end, - gff_outer.score, - gff_outer.strand, - gff_outer.phase, - field9_attributes, - ), - )?; - } - prev_end = *seq_end; - } - if dna { - file.write_fmt(format_args!("##FASTA\n"))?; - file.write_fmt(format_args!("{0}\n", full_seq))?; - } - Ok(()) - } - ///internal record containing data from a single source or contig. Has multiple features. - pub struct Record { - pub id: String, - pub length: u32, - pub sequence: String, - pub start: usize, - pub end: usize, - pub strand: i32, - pub cds: FeatureAttributeBuilder, - pub source_map: SourceAttributeBuilder, - pub seq_features: SequenceAttributeBuilder, - } - #[automatically_derived] - impl ::core::fmt::Debug for Record { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - let names: &'static _ = &[ - "id", - "length", - "sequence", - "start", - "end", - "strand", - "cds", - "source_map", - "seq_features", - ]; - let values: &[&dyn ::core::fmt::Debug] = &[ - &self.id, - &self.length, - &self.sequence, - &self.start, - &self.end, - &self.strand, - &self.cds, - &self.source_map, - &&self.seq_features, - ]; - ::core::fmt::Formatter::debug_struct_fields_finish( - f, - "Record", - names, - values, - ) - } - } - #[automatically_derived] - impl ::core::clone::Clone for Record { - #[inline] - fn clone(&self) -> Record { - Record { - id: ::core::clone::Clone::clone(&self.id), - length: ::core::clone::Clone::clone(&self.length), - sequence: ::core::clone::Clone::clone(&self.sequence), - start: ::core::clone::Clone::clone(&self.start), - end: ::core::clone::Clone::clone(&self.end), - strand: ::core::clone::Clone::clone(&self.strand), - cds: ::core::clone::Clone::clone(&self.cds), - source_map: ::core::clone::Clone::clone(&self.source_map), - seq_features: ::core::clone::Clone::clone(&self.seq_features), - } - } - } - impl Record { - /// Create a new instance. - pub fn new() -> Self { - Record { - id: "".to_owned(), - length: 0, - sequence: "".to_owned(), - start: 0, - end: 0, - strand: 0, - source_map: SourceAttributeBuilder::new(), - cds: FeatureAttributeBuilder::new(), - seq_features: SequenceAttributeBuilder::new(), - } - } - pub fn is_empty(&mut self) -> bool { - self.id.is_empty() && self.length == 0 - } - pub fn check(&mut self) -> Result<(), &str> { - if self.id().is_empty() { - return Err("Expecting id for Embl record."); - } - Ok(()) - } - pub fn id(&mut self) -> &str { - &self.id - } - pub fn length(&mut self) -> u32 { - self.length - } - pub fn sequence(&mut self) -> &str { - &self.sequence - } - pub fn start(&mut self) -> u32 { - self.start.try_into().unwrap() - } - pub fn end(&mut self) -> u32 { - self.end.try_into().unwrap() - } - pub fn strand(&mut self) -> i32 { - self.strand - } - pub fn cds(&mut self) -> FeatureAttributeBuilder { - self.cds.clone() - } - pub fn source_map(&mut self) -> SourceAttributeBuilder { - self.source_map.clone() - } - pub fn seq_features(&mut self) -> SequenceAttributeBuilder { - self.seq_features.clone() - } - fn rec_clear(&mut self) { - self.id.clear(); - self.length = 0; - self.sequence.clear(); - self.start = 0; - self.end = 0; - self.strand = 0; - self.source_map = SourceAttributeBuilder::new(); - self.cds = FeatureAttributeBuilder::new(); - self.seq_features = SequenceAttributeBuilder::new(); - } - } - impl Default for Record { - fn default() -> Self { - Self::new() - } - } - #[allow(dead_code)] - pub struct Config { - filename: String, - } - impl Config { - pub fn new(args: &[String]) -> Result { - if args.len() < 2 { - { - ::core::panicking::panic_fmt( - format_args!("not enough arguments, please provide filename"), - ); - }; - } - let filename = args[1].clone(); - Ok(Config { filename }) - } - } -} -pub mod gbk { - //! # A Genbank to GFF parser - //! - //! - //! You are able to parse genbank and save as a GFF (gff3) format as well as extracting DNA sequences, gene DNA sequences (ffn) and protein fasta sequences (faa) - //! - //! You can also create new records and save as a genbank (gbk) format - //! - //! ## Detailed Explanation - //! - //! - //! The Genbank parser contains: - //! - //! Records - a top level structure which consists of either one record (single genbank) or multiple instances of record (multi-genbank). - //! - //! Each Record contains: - //! - //! 1. A source, ```SourceAttributes```, construct(enum) of counter (source name), start, stop [of source or contig], organism, mol_type, strain, type_material, db_xref - //! 2. Features, ```FeatureAttributes```, construct(enum) of counter (locus tag), gene (if present), product, codon start, strand, start, stop [of cds/gene] - //! 3. Sequence features, ```SequenceAttributes```, construct(enum) of counter (locus tag), sequence_ffn (DNA gene sequence) sequence_faa (protein translation), strand, codon start, start, stop [cds/gene] - //! 4. The DNA sequence of the whole record (or contig) - //! - //! Example to extract and print all the protein sequence fasta, example using getters or get_ functionality - //! - //! - //!```rust - //! use clap::Parser; - //! use std::fs::File; - //! use microBioRust::gbk::Reader; - //! use std::io; - //! - //! #[derive(Parser, Debug)] - //! #[clap(author, version, about)] - //! struct Arguments { - //! #[clap(short, long)] - //! filename: String, - //! } - //! - //! pub fn genbank_to_faa() -> Result<(), anyhow::Error> { - //! let args = Arguments::parse(); - //! let file_gbk = File::open(args.filename)?; - //! let mut reader = Reader::new(file_gbk); - //! let mut records = reader.records(); - //! loop { - //! //collect from each record advancing on a next record basis, count cds records - //! match records.next() { - //! Some(Ok(mut record)) => { - //! for (k, v) in &record.cds.attributes { - //! match record.seq_features.get_sequence_faa(&k) { - //! Some(value) => { let seq_faa = value.to_string(); - //! println!(">{}|{}\n{}", &record.id, &k, seq_faa); - //! }, - //! _ => (), - //! }; - //! } - //! }, - //! Some(Err(e)) => { println!("Error encountered - an err {:?}", e); }, - //! None => break, - //! } - //! } - //! return Ok(()); - //! } - //!``` - //! - //! Example to extract the protein sequences with simplified genbank! macro use - //! - //!```rust - //! use clap::Parser; - //! use std::fs::File; - //! use microBioRust::gbk::Reader; - //! use std::io; - //! use microBioRust::genbank; - //! - //! - //! #[derive(Parser, Debug)] - //! #[clap(author, version, about)] - //! struct Arguments { - //! #[clap(short, long)] - //! filename: String, - //! } - //! - //! pub fn genbank_to_faa() -> Result<(), anyhow::Error> { - //! let args = Arguments::parse(); - //! let records = genbank!(&args.filename); - //! for record in records { - //! for (k, v) in &record.cds.attributes { - //! if let Some(seq) = record.seq_features.get_sequence_faa(k) { - //! println!(">{}|{}\n{}", &record.id, &k, seq); - //! } - //! } - //! } - //! return Ok(()); - //! } - //! - //!``` - //! Example to save a provided multi- or single genbank file as a GFF file (by joining any multi-genbank) - //! - //! ```rust - //! use microBioRust::gbk::gff_write; - //! use microBioRust::gbk::Reader; - //! use microBioRust::gbk::Record; - //! use std::collections::BTreeMap; - //! use std::fs::File; - //! use clap::Parser; - //! use std::io; - //! - //! #[derive(Parser, Debug)] - //! #[clap(author, version, about)] - //! struct Arguments { - //! #[clap(short, long)] - //! filename: String, - //! } - //! - //! pub fn genbank_to_gff() -> io::Result<()> { - //! let args = Arguments::parse(); - //! let file_gbk = File::open(&args.filename)?; - //! let prev_start: u32 = 0; - //! let mut prev_end: u32 = 0; - //! let mut reader = Reader::new(file_gbk); - //! let mut records = reader.records(); - //! let mut read_counter: u32 = 0; - //! let mut seq_region: BTreeMap = BTreeMap::new(); - //! let mut record_vec: Vec = Vec::new(); - //! loop { - //! match records.next() { - //! Some(Ok(mut record)) => { - //! println!("next record"); - //! println!("Record id: {:?}", record.id); - //! let source = record.source_map.source_name.clone().expect("issue collecting source name"); - //! let beginning = match record.source_map.get_start(&source) { - //! Some(value) => value.get_value(), - //! _ => 0, - //! }; - //! let ending = match record.source_map.get_stop(&source) { - //! Some(value) => value.get_value(), - //! _ => 0, - //! }; - //! if ending + prev_end < beginning + prev_end { - //! println!("debug: end value smaller is than the start {:?}", beginning); - //! } - //! seq_region.insert(source, (beginning + prev_end, ending + prev_end)); - //! record_vec.push(record); - //! // Add additional fields to print if needed - //! read_counter+=1; - //! prev_end+=ending; // create the joined record if there are multiple - //! }, - //! Some(Err(e)) => { println!("theres an err {:?}", e); }, - //! None => { - //! println!("finished iteration"); - //! break; }, - //! } - //! } - //! let output_file = format!("{}.gff", &args.filename); - //! if std::path::Path::new(&output_file).exists() { - //! println!("Deleting existing file: {}", &output_file); - //! std::fs::remove_file(&output_file).expect("NOOO"); - //! } - //! gff_write(seq_region.clone(), record_vec, &output_file, true); - //! println!("Total records processed: {}", read_counter); - //! return Ok(()); - //! } - //!``` - //! Example to create a completely new record, use of setters or set_ functionality - //! - //! To write into GFF format requires gff_write(seq_region, record_vec, filename, true or false) - //! - //! The seq_region is the region of interest to save with name and DNA coordinates such as ``` seqregion.entry("source_1".to_string(), (1,897))``` - //! This makes it possible to save the whole file or to subset it - //! - //! record_vec is a list of the records. If there is only one record, include this as a vec using ``` vec![record] ``` - //! - //! The boolean true/false describes whether the DNA sequence should be included in the GFF3 file - //! - //! To write into genbank format requires gbk_write(seq_region, record_vec, filename), no true or false since genbank format will include the DNA sequence - //! - //! - //! ```rust - //! use microBioRust::gbk::gff_write; - //! use microBioRust::gbk::RangeValue; - //! use microBioRust::gbk::Record; - //! use std::fs::File; - //! use std::collections::BTreeMap; - //! - //! pub fn create_new_record() -> Result<(), anyhow::Error> { - //! let filename = format!("new_record.gff"); - //! if std::path::Path::new(&filename).exists() { - //! std::fs::remove_file(&filename)?; - //! } - //! let mut record = Record::new(); - //! let mut seq_region: BTreeMap = BTreeMap::new(); - //! //example from E.coli K12 - //! seq_region.insert("source_1".to_string(), (1,897)); - //! //Add the source into SourceAttributes - //! record.source_map - //! .set_counter("source_1".to_string()) - //! .set_start(RangeValue::Exact(1)) - //! .set_stop(RangeValue::Exact(897)) - //! .set_organism("Escherichia coli".to_string()) - //! .set_mol_type("DNA".to_string()) - //! .set_strain("K-12 substr. MG1655".to_string()) - //! .set_type_material("type strain of Escherichia coli K12".to_string()) - //! .set_db_xref("PRJNA57779".to_string()); - //! //Add the features into FeatureAttributes, here we are setting two features, i.e. coding sequences or genes - //! record.cds - //! .set_counter("b3304".to_string()) - //! .set_start(RangeValue::Exact(1)) - //! .set_stop(RangeValue::Exact(354)) - //! .set_gene("rplR".to_string()) - //! .set_product("50S ribosomal subunit protein L18".to_string()) - //! .set_codon_start(1) - //! .set_strand(-1); - //! record.cds - //! .set_counter("b3305".to_string()) - //! .set_start(RangeValue::Exact(364)) - //! .set_stop(RangeValue::Exact(897)) - //! .set_gene("rplF".to_string()) - //! .set_product("50S ribosomal subunit protein L6".to_string()) - //! .set_codon_start(1) - //! .set_strand(-1); - //! //Add the sequences for the coding sequence (CDS) into SequenceAttributes - //! record.seq_features - //! .set_counter("b3304".to_string()) - //! .set_start(RangeValue::Exact(1)) - //! .set_stop(RangeValue::Exact(354)) - //! .set_sequence_ffn("ATGGATAAGAAATCTGCTCGTATCCGTCGTGCGACCCGCGCACGCCGCAAGCTCCAGGAG - //!CTGGGCGCAACTCGCCTGGTGGTACATCGTACCCCGCGTCACATTTACGCACAGGTAATT - //!GCACCGAACGGTTCTGAAGTTCTGGTAGCTGCTTCTACTGTAGAAAAAGCTATCGCTGAA - //!CAACTGAAGTACACCGGTAACAAAGACGCGGCTGCAGCTGTGGGTAAAGCTGTCGCTGAA - //!CGCGCTCTGGAAAAAGGCATCAAAGATGTATCCTTTGACCGTTCCGGGTTCCAATATCAT - //!GGTCGTGTCCAGGCACTGGCAGATGCTGCCCGTGAAGCTGGCCTTCAGTTCTAA".to_string()) - //! .set_sequence_faa("MDKKSARIRRATRARRKLQELGATRLVVHRTPRHIYAQVIAPNGSEVLVAASTVEKAIAE - //!QLKYTGNKDAAAAVGKAVAERALEKGIKDVSFDRSGFQYHGRVQALADAAREAGLQF".to_string()) - //! .set_codon_start(1) - //! .set_strand(-1); - //! record.seq_features - //! .set_counter("bb3305".to_string()) - //! .set_start(RangeValue::Exact(364)) - //! .set_stop(RangeValue::Exact(897)) - //! .set_sequence_ffn("ATGTCTCGTGTTGCTAAAGCACCGGTCGTTGTTCCTGCCGGCGTTGACGTAAAAATCAAC - //!GGTCAGGTTATTACGATCAAAGGTAAAAACGGCGAGCTGACTCGTACTCTCAACGATGCT - //!GTTGAAGTTAAACATGCAGATAATACCCTGACCTTCGGTCCGCGTGATGGTTACGCAGAC - //!GGTTGGGCACAGGCTGGTACCGCGCGTGCCCTGCTGAACTCAATGGTTATCGGTGTTACC - //!GAAGGCTTCACTAAGAAGCTGCAGCTGGTTGGTGTAGGTTACCGTGCAGCGGTTAAAGGC - //!AATGTGATTAACCTGTCTCTGGGTTTCTCTCATCCTGTTGACCATCAGCTGCCTGCGGGT - //!ATCACTGCTGAATGTCCGACTCAGACTGAAATCGTGCTGAAAGGCGCTGATAAGCAGGTG - //!ATCGGCCAGGTTGCAGCGGATCTGCGCGCCTACCGTCGTCCTGAGCCTTATAAAGGCAAG - //!GGTGTTCGTTACGCCGACGAAGTCGTGCGTACCAAAGAGGCTAAGAAGAAGTAA".to_string()) - //! .set_sequence_faa("MSRVAKAPVVVPAGVDVKINGQVITIKGKNGELTRTLNDAVEVKHADNTLTFGPRDGYAD - //!GWAQAGTARALLNSMVIGVTEGFTKKLQLVGVGYRAAVKGNVINLSLGFSHPVDHQLPAG - //!ITAECPTQTEIVLKGADKQVIGQVAADLRAYRRPEPYKGKGVRYADEVVRTKEAKKK".to_string()) - //! .set_codon_start(1) - //! .set_strand(-1); - //! //Add the full sequence of the entire record into the record.sequence - //! record.sequence = "TTAGAACTGAAGGCCAGCTTCACGGGCAGCATCTGCCAGTGCCTGGACACGACCATGATA - //!TTGGAACCCGGAACGGTCAAAGGATACATCTTTGATGCCTTTTTCCAGAGCGCGTTCAGC - //!GACAGCTTTACCCACAGCTGCAGCCGCGTCTTTGTTACCGGTGTACTTCAGTTGTTCAGC - //!GATAGCTTTTTCTACAGTAGAAGCAGCTACCAGAACTTCAGAACCGTTCGGTGCAATTAC - //!CTGTGCGTAAATGTGACGCGGGGTACGATGTACCACCAGGCGAGTTGCGCCCAGCTCCTG - //!GAGCTTGCGGCGTGCGCGGGTCGCACGACGGATACGAGCAGATTTCTTATCCATAGTGTT - //!ACCTTACTTCTTCTTAGCCTCTTTGGTACGCACGACTTCGTCGGCGTAACGAACACCCTT - //!GCCTTTATAAGGCTCAGGACGACGGTAGGCGCGCAGATCCGCTGCAACCTGGCCGATCAC - //!CTGCTTATCAGCGCCTTTCAGCACGATTTCAGTCTGAGTCGGACATTCAGCAGTGATACC - //!CGCAGGCAGCTGATGGTCAACAGGATGAGAGAAACCCAGAGACAGGTTAATCACATTGCC - //!TTTAACCGCTGCACGGTAACCTACACCAACCAGCTGCAGCTTCTTAGTGAAGCCTTCGGT - //!AACACCGATAACCATTGAGTTCAGCAGGGCACGCGCGGTACCAGCCTGTGCCCAACCGTC - //!TGCGTAACCATCACGCGGACCGAAGGTCAGGGTATTATCTGCATGTTTAACTTCAACAGC - //!ATCGTTGAGAGTACGAGTCAGCTCGCCGTTTTTACCTTTGATCGTAATAACCTGACCGTT - //!GATTTTTACGTCAACGCCGGCAGGAACAACGACCGGTGCTTTAGCAACACGAGACAT".to_string(); - //! gff_write(seq_region, vec![record], &filename, true); - //! return Ok(()); - //! } - //!``` - //! - use std::io::{self, Write}; - use std::fs; - use regex::Regex; - use itertools::Itertools; - use std::vec::Vec; - use std::str; - use std::convert::AsRef; - use protein_translate::translate; - use std::path::Path; - use bio::alphabets::dna::revcomp; - use anyhow::anyhow; - use std::collections::BTreeMap; - use std::fs::{OpenOptions, File}; - use anyhow::Context; - use std::collections::HashSet; - use paste::paste; - use std::convert::TryInto; - use chrono::prelude::*; - /// A Gbk reader. - #[allow(unused_mut)] - pub struct Records - where - B: io::BufRead, - { - reader: Reader, - error_has_occurred: bool, - } - #[automatically_derived] - #[allow(unused_mut)] - impl ::core::fmt::Debug for Records - where - B: io::BufRead, - { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - ::core::fmt::Formatter::debug_struct_field2_finish( - f, - "Records", - "reader", - &self.reader, - "error_has_occurred", - &&self.error_has_occurred, - ) - } - } - impl Records - where - B: io::BufRead, - { - #[allow(unused_mut)] - pub fn new(mut reader: Reader) -> Self { - Records { - reader: reader, - error_has_occurred: false, - } - } - } - impl Iterator for Records - where - B: io::BufRead, - { - type Item = Result; - fn next(&mut self) -> Option { - if self.error_has_occurred { - { - ::std::io::_print( - format_args!("error was encountered in iteration\n"), - ); - }; - None - } else { - let mut record = Record::new(); - match self.reader.read(&mut record) { - Ok(_) => if record.is_empty() { None } else { Some(Ok(record)) } - Err(err) => { - self.error_has_occurred = true; - Some( - Err( - ::anyhow::Error::msg( - ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("next record read error {0:?}", err), - ); - res - }), - ), - ), - ) - } - } - } - } - } - pub trait GbkRead { - fn read(&mut self, record: &mut Record) -> Result; - } - ///per line reader for the file - pub struct Reader { - reader: B, - line_buffer: String, - } - #[automatically_derived] - impl ::core::fmt::Debug for Reader { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - ::core::fmt::Formatter::debug_struct_field2_finish( - f, - "Reader", - "reader", - &self.reader, - "line_buffer", - &&self.line_buffer, - ) - } - } - #[automatically_derived] - impl ::core::default::Default for Reader { - #[inline] - fn default() -> Reader { - Reader { - reader: ::core::default::Default::default(), - line_buffer: ::core::default::Default::default(), - } - } - } - impl Reader> { - /// Read Gbk from given file path in given format. - pub fn from_file + std::fmt::Debug>( - path: P, - ) -> anyhow::Result { - fs::File::open(&path) - .map(Reader::new) - .with_context(|| ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("Failed to read Gbk from {0:#?}", path), - ); - res - })) - } - } - impl Reader> - where - R: io::Read, - { - pub fn new(reader: R) -> Self { - Reader { - reader: io::BufReader::new(reader), - line_buffer: String::new(), - } - } - } - impl Reader - where - B: io::BufRead, - { - pub fn from_bufread(bufreader: B) -> Self { - Reader { - reader: bufreader, - line_buffer: String::new(), - } - } - pub fn records(self) -> Records { - Records { - reader: self, - error_has_occurred: false, - } - } - } - ///main gbk parser - impl<'a, B> GbkRead for Reader - where - B: io::BufRead, - { - #[allow(unused_mut)] - #[allow(unused_variables)] - #[allow(unused_assignments)] - fn read(&mut self, record: &mut Record) -> Result { - record.rec_clear(); - let mut sequences = String::new(); - let mut source_map = SourceAttributeBuilder::new(); - let mut cds = FeatureAttributeBuilder::new(); - let mut seq_features = SequenceAttributeBuilder::new(); - let mut cds_counter: i32 = 0; - let mut source_counter: i32 = 0; - let mut prev_end: u32 = 0; - let mut organism = String::new(); - let mut mol_type = String::new(); - let mut strain = String::new(); - let mut source_name = String::new(); - let mut type_material = String::new(); - let mut theend: u32 = 0; - let mut thestart: u32 = 0; - let mut db_xref = String::new(); - if self.line_buffer.is_empty() { - self.reader.read_line(&mut self.line_buffer)?; - if self.line_buffer.is_empty() { - return Ok(record.to_owned()); - } - } - 'outer: while !self.line_buffer.is_empty() { - if self.line_buffer.starts_with("LOCUS") { - record.rec_clear(); - let mut header_fields: Vec<&str> = self - .line_buffer - .split_whitespace() - .collect(); - let mut header_iter = header_fields.iter(); - header_iter.next(); - record.id = header_iter - .next() - .ok_or_else(|| ::anyhow::__private::must_use({ - let error = ::anyhow::__private::format_err( - format_args!("missing record id"), - ); - error - }))? - .to_string(); - let lens = header_iter - .next() - .ok_or_else(|| ::anyhow::__private::must_use({ - let error = ::anyhow::__private::format_err( - format_args!("missing record length"), - ); - error - }))? - .to_string(); - record.length = lens.trim().parse::()?; - self.line_buffer.clear(); - } - if self.line_buffer.starts_with(" source") { - let re = Regex::new(r"([0-9]+)[[:punct:]]+([0-9]+)")?; - let location = re - .captures(&self.line_buffer) - .ok_or_else(|| ::anyhow::__private::must_use({ - let error = ::anyhow::__private::format_err( - format_args!("missing location"), - ); - error - }))?; - let start = &location[1]; - let end = &location[2]; - thestart = start.trim().parse::()?; - source_counter += 1; - source_name = ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("source_{0}_{1}", record.id, source_counter), - ); - res - }) - .to_string(); - thestart += prev_end; - theend = end.trim().parse::()? + prev_end; - loop { - self.line_buffer.clear(); - self.reader.read_line(&mut self.line_buffer)?; - if self.line_buffer.starts_with(" CDS") { - record - .source_map - .set_counter(source_name.to_string()) - .set_start(RangeValue::Exact(thestart)) - .set_stop(RangeValue::Exact(theend)) - .set_organism(organism.clone()) - .set_mol_type(mol_type.clone()) - .set_strain(strain.clone()) - .set_type_material(type_material.clone()) - .set_db_xref(db_xref.clone()); - continue 'outer; - } - if self.line_buffer.contains("/organism") { - let org: Vec<&str> = self.line_buffer.split('\"').collect(); - organism = org[1].to_string(); - } - if self.line_buffer.contains("/mol_type") { - let mol: Vec<&str> = self.line_buffer.split('\"').collect(); - mol_type = mol[1].to_string(); - } - if self.line_buffer.contains("/strain") { - let stra: Vec<&str> = self.line_buffer.split('\"').collect(); - strain = stra[1].to_string(); - } - if self.line_buffer.contains("/type_material") { - let mat: Vec<&str> = self.line_buffer.split('\"').collect(); - type_material = mat[1].to_string(); - } - if self.line_buffer.contains("/db_xref") { - let db: Vec<&str> = self.line_buffer.split('\"').collect(); - db_xref = db[1].to_string(); - } - } - } - if self.line_buffer.starts_with(" CDS") { - let mut startiter: Vec<_> = Vec::new(); - let mut enditer: Vec<_> = Vec::new(); - let mut thestart: u32 = 0; - let mut thend: u32 = 0; - let mut joined: bool = false; - let joined = if self.line_buffer.contains("join") { - true - } else { - false - }; - let re = Regex::new(r"([0-9]+)[[:punct:]]+([0-9]+)")?; - for cap in re.captures_iter(&self.line_buffer) { - cds_counter += 1; - thestart = cap[1] - .parse() - .expect("failed to match and parse numerical start"); - theend = cap[2] - .parse() - .expect("failed to match and parse numerical end"); - startiter.push(thestart); - enditer.push(theend); - } - let mut gene = String::new(); - let mut product = String::new(); - let strand: i8 = if self.line_buffer.contains("complement") { - -1 - } else { - 1 - }; - let mut locus_tag = String::new(); - let mut codon_start: u8 = 1; - loop { - self.line_buffer.clear(); - self.reader.read_line(&mut self.line_buffer)?; - if self.line_buffer.contains("/locus_tag=") { - let loctag: Vec<&str> = self - .line_buffer - .split('\"') - .collect(); - locus_tag = loctag[1].to_string(); - } - if self.line_buffer.contains("/codon_start") { - let codstart: Vec<&str> = self - .line_buffer - .split('=') - .collect(); - let valstart = codstart[1].trim().parse::()?; - codon_start = valstart; - } - if self.line_buffer.contains("/gene=") { - let gen: Vec<&str> = self.line_buffer.split('\"').collect(); - gene = gen[1].to_string(); - } - if self.line_buffer.contains("/product") { - let prod: Vec<&str> = self.line_buffer.split('\"').collect(); - product = substitute_odd_punctuation(prod[1].to_string())?; - } - if self.line_buffer.starts_with(" CDS") - || self.line_buffer.starts_with("ORIGIN") - || self.line_buffer.starts_with(" gene") - || self.line_buffer.starts_with(" misc_feature") - { - if locus_tag.is_empty() { - locus_tag = ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("CDS_{0}", cds_counter), - ); - res - }) - .to_string(); - } - if joined { - for (i, m) in startiter.iter().enumerate() { - let loc_tag = ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("{0}_{1}", locus_tag.clone(), i), - ); - res - }); - record - .cds - .set_counter(loc_tag) - .set_start(RangeValue::Exact(*m)) - .set_stop(RangeValue::Exact(enditer[i])) - .set_gene(gene.to_string()) - .set_product(product.to_string()) - .set_codon_start(codon_start) - .set_strand(strand); - } - continue 'outer; - } else { - record - .cds - .set_counter(locus_tag.clone()) - .set_start(RangeValue::Exact(thestart)) - .set_stop(RangeValue::Exact(theend)) - .set_gene(gene.to_string()) - .set_product(product.to_string()) - .set_codon_start(codon_start) - .set_strand(strand); - continue 'outer; - } - } - } - } - if self.line_buffer.starts_with("ORIGIN") { - let mut sequences = String::new(); - let result_seq = loop { - self.line_buffer.clear(); - self.reader.read_line(&mut self.line_buffer)?; - if self.line_buffer.starts_with("//") { - break sequences; - } else { - let s: Vec<&str> = self - .line_buffer - .split_whitespace() - .collect(); - let s = &s[1..]; - let sequence = s.iter().join(""); - sequences.push_str(&sequence); - } - }; - record.sequence = result_seq.to_string(); - let mut iterablecount: u32 = 0; - for (key, val) in record.cds.iter_sorted() { - let ( - mut a, - mut b, - mut c, - mut d, - ): (Option, Option, Option, Option) = ( - None, - None, - None, - None, - ); - for value in val { - match value { - FeatureAttributes::Start { value } => { - a = match value { - RangeValue::Exact(v) => Some(*v), - RangeValue::LessThan(v) => Some(*v), - RangeValue::GreaterThan(v) => Some(*v), - }; - } - FeatureAttributes::Stop { value } => { - b = match value { - RangeValue::Exact(v) => Some(*v), - RangeValue::LessThan(v) => Some(*v), - RangeValue::GreaterThan(v) => Some(*v), - }; - } - FeatureAttributes::Strand { value } => { - c = match value { - value => Some(*value), - }; - } - FeatureAttributes::CodonStart { value } => { - d = match value { - value => Some(value.clone()), - }; - } - _ => {} - } - } - let sta = a - .map(|o| o as usize) - .ok_or( - ::anyhow::__private::must_use({ - let error = ::anyhow::__private::format_err( - format_args!("No value for start"), - ); - error - }), - )?; - let sto = b - .map(|t| t as usize) - .ok_or( - ::anyhow::__private::must_use({ - let error = ::anyhow::__private::format_err( - format_args!("No value for stop"), - ); - error - }), - )? - 1; - let stra = c - .map(|u| u as i8) - .ok_or( - ::anyhow::__private::must_use({ - let error = ::anyhow::__private::format_err( - format_args!("No value for strand"), - ); - error - }), - )?; - let cod = d - .map(|v| v as usize - 1) - .ok_or( - ::anyhow::__private::must_use({ - let error = ::anyhow::__private::format_err( - format_args!("No value for strand"), - ); - error - }), - )?; - let star = sta.try_into()?; - let stow = sto.try_into()?; - let codd = cod.try_into()?; - let mut sliced_sequence: &str = ""; - if stra == -1 { - if cod > 1 { - if sto + 1 <= record.sequence.len() { - sliced_sequence = &record.sequence[sta + cod..sto + 1]; - } else { - sliced_sequence = &record.sequence[sta + cod..sto]; - } - } else { - if sto + 1 <= record.sequence.len() { - sliced_sequence = &record.sequence[sta..sto + 1]; - } else { - sliced_sequence = &record.sequence[sta..sto]; - } - } - let cds_char = sliced_sequence; - let prot_seq = translate(&revcomp(cds_char.as_bytes())); - let parts: Vec<&str> = prot_seq.split('*').collect(); - record - .seq_features - .set_counter(key.to_string()) - .set_start(RangeValue::Exact(star)) - .set_stop(RangeValue::Exact(stow)) - .set_sequence_ffn(cds_char.to_string()) - .set_sequence_faa(parts[0].to_string()) - .set_codon_start(codd) - .set_strand(stra); - } else { - if cod > 1 { - sliced_sequence = &record.sequence[sta + cod - 1..sto]; - } else { - sliced_sequence = &record.sequence[sta - 1..sto]; - } - let cds_char = sliced_sequence; - let prot_seq = translate(cds_char.as_bytes()); - let parts: Vec<&str> = prot_seq.split('*').collect(); - record - .seq_features - .set_counter(key.to_string()) - .set_start(RangeValue::Exact(star)) - .set_stop(RangeValue::Exact(stow)) - .set_sequence_ffn(cds_char.to_string()) - .set_sequence_faa(parts[0].to_string()) - .set_codon_start(codd) - .set_strand(stra); - } - } - return Ok(record.to_owned()); - } - self.line_buffer.clear(); - self.reader.read_line(&mut self.line_buffer)?; - } - Ok(record.to_owned()) - } - } - ///stores a value for start or stop (end) which can be denoted as a < value or > value. - pub enum RangeValue { - Exact(u32), - LessThan(u32), - GreaterThan(u32), - } - #[automatically_derived] - impl ::core::fmt::Debug for RangeValue { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - match self { - RangeValue::Exact(__self_0) => { - ::core::fmt::Formatter::debug_tuple_field1_finish( - f, - "Exact", - &__self_0, - ) - } - RangeValue::LessThan(__self_0) => { - ::core::fmt::Formatter::debug_tuple_field1_finish( - f, - "LessThan", - &__self_0, - ) - } - RangeValue::GreaterThan(__self_0) => { - ::core::fmt::Formatter::debug_tuple_field1_finish( - f, - "GreaterThan", - &__self_0, - ) - } - } - } - } - #[automatically_derived] - impl ::core::hash::Hash for RangeValue { - #[inline] - fn hash<__H: ::core::hash::Hasher>(&self, state: &mut __H) -> () { - let __self_discr = ::core::intrinsics::discriminant_value(self); - ::core::hash::Hash::hash(&__self_discr, state); - match self { - RangeValue::Exact(__self_0) => ::core::hash::Hash::hash(__self_0, state), - RangeValue::LessThan(__self_0) => { - ::core::hash::Hash::hash(__self_0, state) - } - RangeValue::GreaterThan(__self_0) => { - ::core::hash::Hash::hash(__self_0, state) - } - } - } - } - #[automatically_derived] - impl ::core::marker::StructuralPartialEq for RangeValue {} - #[automatically_derived] - impl ::core::cmp::PartialEq for RangeValue { - #[inline] - fn eq(&self, other: &RangeValue) -> bool { - let __self_discr = ::core::intrinsics::discriminant_value(self); - let __arg1_discr = ::core::intrinsics::discriminant_value(other); - __self_discr == __arg1_discr - && match (self, other) { - (RangeValue::Exact(__self_0), RangeValue::Exact(__arg1_0)) => { - __self_0 == __arg1_0 - } - (RangeValue::LessThan(__self_0), RangeValue::LessThan(__arg1_0)) => { - __self_0 == __arg1_0 - } - ( - RangeValue::GreaterThan(__self_0), - RangeValue::GreaterThan(__arg1_0), - ) => __self_0 == __arg1_0, - _ => unsafe { ::core::intrinsics::unreachable() } - } - } - } - #[automatically_derived] - impl ::core::cmp::Eq for RangeValue { - #[inline] - #[doc(hidden)] - #[coverage(off)] - fn assert_receiver_is_total_eq(&self) -> () { - let _: ::core::cmp::AssertParamIsEq; - } - } - #[automatically_derived] - impl ::core::clone::Clone for RangeValue { - #[inline] - fn clone(&self) -> RangeValue { - match self { - RangeValue::Exact(__self_0) => { - RangeValue::Exact(::core::clone::Clone::clone(__self_0)) - } - RangeValue::LessThan(__self_0) => { - RangeValue::LessThan(::core::clone::Clone::clone(__self_0)) - } - RangeValue::GreaterThan(__self_0) => { - RangeValue::GreaterThan(::core::clone::Clone::clone(__self_0)) - } - } - } - } - impl RangeValue { - pub fn get_value(&self) -> u32 { - match self { - RangeValue::Exact(value) => *value, - RangeValue::LessThan(value) => *value, - RangeValue::GreaterThan(value) => *value, - } - } - } - pub enum SourceAttributes { - Start { value: RangeValue }, - Stop { value: RangeValue }, - Organism { value: String }, - MolType { value: String }, - Strain { value: String }, - CultureCollection { value: String }, - TypeMaterial { value: String }, - DbXref { value: String }, - } - #[automatically_derived] - impl ::core::fmt::Debug for SourceAttributes { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - match self { - SourceAttributes::Start { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Start", - "value", - &__self_0, - ) - } - SourceAttributes::Stop { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Stop", - "value", - &__self_0, - ) - } - SourceAttributes::Organism { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Organism", - "value", - &__self_0, - ) - } - SourceAttributes::MolType { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "MolType", - "value", - &__self_0, - ) - } - SourceAttributes::Strain { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Strain", - "value", - &__self_0, - ) - } - SourceAttributes::CultureCollection { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "CultureCollection", - "value", - &__self_0, - ) - } - SourceAttributes::TypeMaterial { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "TypeMaterial", - "value", - &__self_0, - ) - } - SourceAttributes::DbXref { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "DbXref", - "value", - &__self_0, - ) - } - } - } - } - #[automatically_derived] - impl ::core::cmp::Eq for SourceAttributes { - #[inline] - #[doc(hidden)] - #[coverage(off)] - fn assert_receiver_is_total_eq(&self) -> () { - let _: ::core::cmp::AssertParamIsEq; - let _: ::core::cmp::AssertParamIsEq; - } - } - #[automatically_derived] - impl ::core::marker::StructuralPartialEq for SourceAttributes {} - #[automatically_derived] - impl ::core::cmp::PartialEq for SourceAttributes { - #[inline] - fn eq(&self, other: &SourceAttributes) -> bool { - let __self_discr = ::core::intrinsics::discriminant_value(self); - let __arg1_discr = ::core::intrinsics::discriminant_value(other); - __self_discr == __arg1_discr - && match (self, other) { - ( - SourceAttributes::Start { value: __self_0 }, - SourceAttributes::Start { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SourceAttributes::Stop { value: __self_0 }, - SourceAttributes::Stop { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SourceAttributes::Organism { value: __self_0 }, - SourceAttributes::Organism { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SourceAttributes::MolType { value: __self_0 }, - SourceAttributes::MolType { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SourceAttributes::Strain { value: __self_0 }, - SourceAttributes::Strain { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SourceAttributes::CultureCollection { value: __self_0 }, - SourceAttributes::CultureCollection { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SourceAttributes::TypeMaterial { value: __self_0 }, - SourceAttributes::TypeMaterial { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SourceAttributes::DbXref { value: __self_0 }, - SourceAttributes::DbXref { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - _ => unsafe { ::core::intrinsics::unreachable() } - } - } - } - #[automatically_derived] - impl ::core::hash::Hash for SourceAttributes { - #[inline] - fn hash<__H: ::core::hash::Hasher>(&self, state: &mut __H) -> () { - let __self_discr = ::core::intrinsics::discriminant_value(self); - ::core::hash::Hash::hash(&__self_discr, state); - match self { - SourceAttributes::Start { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SourceAttributes::Stop { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SourceAttributes::Organism { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SourceAttributes::MolType { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SourceAttributes::Strain { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SourceAttributes::CultureCollection { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SourceAttributes::TypeMaterial { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SourceAttributes::DbXref { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - } - } - } - #[automatically_derived] - impl ::core::clone::Clone for SourceAttributes { - #[inline] - fn clone(&self) -> SourceAttributes { - match self { - SourceAttributes::Start { value: __self_0 } => { - SourceAttributes::Start { - value: ::core::clone::Clone::clone(__self_0), - } - } - SourceAttributes::Stop { value: __self_0 } => { - SourceAttributes::Stop { - value: ::core::clone::Clone::clone(__self_0), - } - } - SourceAttributes::Organism { value: __self_0 } => { - SourceAttributes::Organism { - value: ::core::clone::Clone::clone(__self_0), - } - } - SourceAttributes::MolType { value: __self_0 } => { - SourceAttributes::MolType { - value: ::core::clone::Clone::clone(__self_0), - } - } - SourceAttributes::Strain { value: __self_0 } => { - SourceAttributes::Strain { - value: ::core::clone::Clone::clone(__self_0), - } - } - SourceAttributes::CultureCollection { value: __self_0 } => { - SourceAttributes::CultureCollection { - value: ::core::clone::Clone::clone(__self_0), - } - } - SourceAttributes::TypeMaterial { value: __self_0 } => { - SourceAttributes::TypeMaterial { - value: ::core::clone::Clone::clone(__self_0), - } - } - SourceAttributes::DbXref { value: __self_0 } => { - SourceAttributes::DbXref { - value: ::core::clone::Clone::clone(__self_0), - } - } - } - } - } - impl SourceAttributeBuilder { - pub fn get_start(&self, key: &str) -> Option<&RangeValue> { - self.source_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SourceAttributes::Start { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_stop(&self, key: &str) -> Option<&RangeValue> { - self.source_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SourceAttributes::Stop { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_organism(&self, key: &str) -> Option<&String> { - self.source_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SourceAttributes::Organism { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_mol_type(&self, key: &str) -> Option<&String> { - self.source_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SourceAttributes::MolType { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_strain(&self, key: &str) -> Option<&String> { - self.source_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SourceAttributes::Strain { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_type_material(&self, key: &str) -> Option<&String> { - self.source_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SourceAttributes::TypeMaterial { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_db_xref(&self, key: &str) -> Option<&String> { - self.source_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SourceAttributes::DbXref { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - } - ///builder for the source information on a per record basis - pub struct SourceAttributeBuilder { - pub source_attributes: BTreeMap>, - pub source_name: Option, - } - #[automatically_derived] - impl ::core::fmt::Debug for SourceAttributeBuilder { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - ::core::fmt::Formatter::debug_struct_field2_finish( - f, - "SourceAttributeBuilder", - "source_attributes", - &self.source_attributes, - "source_name", - &&self.source_name, - ) - } - } - #[automatically_derived] - impl ::core::default::Default for SourceAttributeBuilder { - #[inline] - fn default() -> SourceAttributeBuilder { - SourceAttributeBuilder { - source_attributes: ::core::default::Default::default(), - source_name: ::core::default::Default::default(), - } - } - } - #[automatically_derived] - impl ::core::clone::Clone for SourceAttributeBuilder { - #[inline] - fn clone(&self) -> SourceAttributeBuilder { - SourceAttributeBuilder { - source_attributes: ::core::clone::Clone::clone(&self.source_attributes), - source_name: ::core::clone::Clone::clone(&self.source_name), - } - } - } - impl SourceAttributeBuilder { - pub fn set_source_name(&mut self, name: String) { - self.source_name = Some(name); - } - pub fn get_source_name(&self) -> Option<&String> { - self.source_name.as_ref() - } - pub fn add_source_attribute( - &mut self, - key: String, - attribute: SourceAttributes, - ) { - self.source_attributes - .entry(key) - .or_insert_with(HashSet::new) - .insert(attribute); - } - pub fn get_source_attributes( - &self, - key: &str, - ) -> Option<&HashSet> { - self.source_attributes.get(key) - } - } - impl SourceAttributeBuilder { - pub fn new() -> Self { - SourceAttributeBuilder { - source_attributes: BTreeMap::new(), - source_name: None, - } - } - pub fn set_counter(&mut self, counter: String) -> &mut Self { - self.source_name = Some(counter); - self - } - pub fn insert_to(&mut self, value: SourceAttributes) { - if let Some(counter) = &self.source_name { - self.source_attributes - .entry(counter.to_string()) - .or_insert_with(HashSet::new) - .insert(value); - } else { - { - ::core::panicking::panic_fmt(format_args!("Counter key not set")); - }; - } - } - pub fn set_start(&mut self, value: RangeValue) -> &mut Self { - self.insert_to(SourceAttributes::Start { value }); - self - } - pub fn set_stop(&mut self, value: RangeValue) -> &mut Self { - self.insert_to(SourceAttributes::Stop { value }); - self - } - pub fn set_organism(&mut self, value: String) -> &mut Self { - self.insert_to(SourceAttributes::Organism { - value, - }); - self - } - pub fn set_mol_type(&mut self, value: String) -> &mut Self { - self.insert_to(SourceAttributes::MolType { value }); - self - } - pub fn set_strain(&mut self, value: String) -> &mut Self { - self.insert_to(SourceAttributes::Strain { value }); - self - } - pub fn set_type_material(&mut self, value: String) -> &mut Self { - self.insert_to(SourceAttributes::TypeMaterial { - value, - }); - self - } - pub fn set_db_xref(&mut self, value: String) -> &mut Self { - self.insert_to(SourceAttributes::DbXref { value }); - self - } - pub fn build(self) -> BTreeMap> { - self.source_attributes - } - pub fn iter_sorted( - &self, - ) -> std::collections::btree_map::Iter> { - self.source_attributes.iter() - } - pub fn default() -> Self { - SourceAttributeBuilder { - source_attributes: BTreeMap::new(), - source_name: None, - } - } - } - ///attributes for each feature, cds or gene - pub enum FeatureAttributes { - Start { value: RangeValue }, - Stop { value: RangeValue }, - Gene { value: String }, - Product { value: String }, - CodonStart { value: u8 }, - Strand { value: i8 }, - } - #[automatically_derived] - impl ::core::fmt::Debug for FeatureAttributes { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - match self { - FeatureAttributes::Start { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Start", - "value", - &__self_0, - ) - } - FeatureAttributes::Stop { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Stop", - "value", - &__self_0, - ) - } - FeatureAttributes::Gene { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Gene", - "value", - &__self_0, - ) - } - FeatureAttributes::Product { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Product", - "value", - &__self_0, - ) - } - FeatureAttributes::CodonStart { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "CodonStart", - "value", - &__self_0, - ) - } - FeatureAttributes::Strand { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Strand", - "value", - &__self_0, - ) - } - } - } - } - #[automatically_derived] - impl ::core::cmp::Eq for FeatureAttributes { - #[inline] - #[doc(hidden)] - #[coverage(off)] - fn assert_receiver_is_total_eq(&self) -> () { - let _: ::core::cmp::AssertParamIsEq; - let _: ::core::cmp::AssertParamIsEq; - let _: ::core::cmp::AssertParamIsEq; - let _: ::core::cmp::AssertParamIsEq; - } - } - #[automatically_derived] - impl ::core::hash::Hash for FeatureAttributes { - #[inline] - fn hash<__H: ::core::hash::Hasher>(&self, state: &mut __H) -> () { - let __self_discr = ::core::intrinsics::discriminant_value(self); - ::core::hash::Hash::hash(&__self_discr, state); - match self { - FeatureAttributes::Start { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - FeatureAttributes::Stop { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - FeatureAttributes::Gene { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - FeatureAttributes::Product { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - FeatureAttributes::CodonStart { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - FeatureAttributes::Strand { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - } - } - } - #[automatically_derived] - impl ::core::marker::StructuralPartialEq for FeatureAttributes {} - #[automatically_derived] - impl ::core::cmp::PartialEq for FeatureAttributes { - #[inline] - fn eq(&self, other: &FeatureAttributes) -> bool { - let __self_discr = ::core::intrinsics::discriminant_value(self); - let __arg1_discr = ::core::intrinsics::discriminant_value(other); - __self_discr == __arg1_discr - && match (self, other) { - ( - FeatureAttributes::Start { value: __self_0 }, - FeatureAttributes::Start { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - FeatureAttributes::Stop { value: __self_0 }, - FeatureAttributes::Stop { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - FeatureAttributes::Gene { value: __self_0 }, - FeatureAttributes::Gene { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - FeatureAttributes::Product { value: __self_0 }, - FeatureAttributes::Product { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - FeatureAttributes::CodonStart { value: __self_0 }, - FeatureAttributes::CodonStart { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - FeatureAttributes::Strand { value: __self_0 }, - FeatureAttributes::Strand { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - _ => unsafe { ::core::intrinsics::unreachable() } - } - } - } - #[automatically_derived] - impl ::core::clone::Clone for FeatureAttributes { - #[inline] - fn clone(&self) -> FeatureAttributes { - match self { - FeatureAttributes::Start { value: __self_0 } => { - FeatureAttributes::Start { - value: ::core::clone::Clone::clone(__self_0), - } - } - FeatureAttributes::Stop { value: __self_0 } => { - FeatureAttributes::Stop { - value: ::core::clone::Clone::clone(__self_0), - } - } - FeatureAttributes::Gene { value: __self_0 } => { - FeatureAttributes::Gene { - value: ::core::clone::Clone::clone(__self_0), - } - } - FeatureAttributes::Product { value: __self_0 } => { - FeatureAttributes::Product { - value: ::core::clone::Clone::clone(__self_0), - } - } - FeatureAttributes::CodonStart { value: __self_0 } => { - FeatureAttributes::CodonStart { - value: ::core::clone::Clone::clone(__self_0), - } - } - FeatureAttributes::Strand { value: __self_0 } => { - FeatureAttributes::Strand { - value: ::core::clone::Clone::clone(__self_0), - } - } - } - } - } - impl FeatureAttributeBuilder { - pub fn get_start(&self, key: &str) -> Option<&RangeValue> { - self.attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let FeatureAttributes::Start { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_stop(&self, key: &str) -> Option<&RangeValue> { - self.attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let FeatureAttributes::Stop { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_gene(&self, key: &str) -> Option<&String> { - self.attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let FeatureAttributes::Gene { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_product(&self, key: &str) -> Option<&String> { - self.attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let FeatureAttributes::Product { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_codon_start(&self, key: &str) -> Option<&u8> { - self.attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let FeatureAttributes::CodonStart { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_strand(&self, key: &str) -> Option<&i8> { - self.attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let FeatureAttributes::Strand { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - } - ///builder for the feature information on a per coding sequence (CDS) basis - pub struct FeatureAttributeBuilder { - pub attributes: BTreeMap>, - locus_tag: Option, - } - #[automatically_derived] - impl ::core::fmt::Debug for FeatureAttributeBuilder { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - ::core::fmt::Formatter::debug_struct_field2_finish( - f, - "FeatureAttributeBuilder", - "attributes", - &self.attributes, - "locus_tag", - &&self.locus_tag, - ) - } - } - #[automatically_derived] - impl ::core::default::Default for FeatureAttributeBuilder { - #[inline] - fn default() -> FeatureAttributeBuilder { - FeatureAttributeBuilder { - attributes: ::core::default::Default::default(), - locus_tag: ::core::default::Default::default(), - } - } - } - #[automatically_derived] - impl ::core::clone::Clone for FeatureAttributeBuilder { - #[inline] - fn clone(&self) -> FeatureAttributeBuilder { - FeatureAttributeBuilder { - attributes: ::core::clone::Clone::clone(&self.attributes), - locus_tag: ::core::clone::Clone::clone(&self.locus_tag), - } - } - } - impl FeatureAttributeBuilder { - pub fn new() -> Self { - FeatureAttributeBuilder { - attributes: BTreeMap::new(), - locus_tag: None, - } - } - pub fn set_counter(&mut self, counter: String) -> &mut Self { - self.locus_tag = Some(counter); - self - } - pub fn insert_to(&mut self, value: FeatureAttributes) { - if let Some(counter) = &self.locus_tag { - self.attributes - .entry(counter.to_string()) - .or_insert_with(HashSet::new) - .insert(value); - } else { - { - ::core::panicking::panic_fmt(format_args!("Counter key not set")); - }; - } - } - pub fn set_start(&mut self, value: RangeValue) -> &mut Self { - self.insert_to(FeatureAttributes::Start { value }); - self - } - pub fn set_stop(&mut self, value: RangeValue) -> &mut Self { - self.insert_to(FeatureAttributes::Stop { value }); - self - } - pub fn set_gene(&mut self, value: String) -> &mut Self { - self.insert_to(FeatureAttributes::Gene { value }); - self - } - pub fn set_product(&mut self, value: String) -> &mut Self { - self.insert_to(FeatureAttributes::Product { - value, - }); - self - } - pub fn set_codon_start(&mut self, value: u8) -> &mut Self { - self.insert_to(FeatureAttributes::CodonStart { - value, - }); - self - } - pub fn set_strand(&mut self, value: i8) -> &mut Self { - self.insert_to(FeatureAttributes::Strand { value }); - self - } - pub fn build(self) -> BTreeMap> { - self.attributes - } - pub fn iter_sorted( - &self, - ) -> std::collections::btree_map::Iter> { - self.attributes.iter() - } - pub fn default() -> Self { - FeatureAttributeBuilder { - attributes: BTreeMap::new(), - locus_tag: None, - } - } - } - ///stores the sequences of the coding sequences (genes) and proteins. Also stores start, stop, codon_start and strand information - pub enum SequenceAttributes { - Start { value: RangeValue }, - Stop { value: RangeValue }, - SequenceFfn { value: String }, - SequenceFaa { value: String }, - CodonStart { value: u8 }, - Strand { value: i8 }, - } - #[automatically_derived] - impl ::core::fmt::Debug for SequenceAttributes { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - match self { - SequenceAttributes::Start { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Start", - "value", - &__self_0, - ) - } - SequenceAttributes::Stop { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Stop", - "value", - &__self_0, - ) - } - SequenceAttributes::SequenceFfn { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "SequenceFfn", - "value", - &__self_0, - ) - } - SequenceAttributes::SequenceFaa { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "SequenceFaa", - "value", - &__self_0, - ) - } - SequenceAttributes::CodonStart { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "CodonStart", - "value", - &__self_0, - ) - } - SequenceAttributes::Strand { value: __self_0 } => { - ::core::fmt::Formatter::debug_struct_field1_finish( - f, - "Strand", - "value", - &__self_0, - ) - } - } - } - } - #[automatically_derived] - impl ::core::cmp::Eq for SequenceAttributes { - #[inline] - #[doc(hidden)] - #[coverage(off)] - fn assert_receiver_is_total_eq(&self) -> () { - let _: ::core::cmp::AssertParamIsEq; - let _: ::core::cmp::AssertParamIsEq; - let _: ::core::cmp::AssertParamIsEq; - let _: ::core::cmp::AssertParamIsEq; - } - } - #[automatically_derived] - impl ::core::marker::StructuralPartialEq for SequenceAttributes {} - #[automatically_derived] - impl ::core::cmp::PartialEq for SequenceAttributes { - #[inline] - fn eq(&self, other: &SequenceAttributes) -> bool { - let __self_discr = ::core::intrinsics::discriminant_value(self); - let __arg1_discr = ::core::intrinsics::discriminant_value(other); - __self_discr == __arg1_discr - && match (self, other) { - ( - SequenceAttributes::Start { value: __self_0 }, - SequenceAttributes::Start { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SequenceAttributes::Stop { value: __self_0 }, - SequenceAttributes::Stop { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SequenceAttributes::SequenceFfn { value: __self_0 }, - SequenceAttributes::SequenceFfn { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SequenceAttributes::SequenceFaa { value: __self_0 }, - SequenceAttributes::SequenceFaa { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SequenceAttributes::CodonStart { value: __self_0 }, - SequenceAttributes::CodonStart { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - ( - SequenceAttributes::Strand { value: __self_0 }, - SequenceAttributes::Strand { value: __arg1_0 }, - ) => __self_0 == __arg1_0, - _ => unsafe { ::core::intrinsics::unreachable() } - } - } - } - #[automatically_derived] - impl ::core::hash::Hash for SequenceAttributes { - #[inline] - fn hash<__H: ::core::hash::Hasher>(&self, state: &mut __H) -> () { - let __self_discr = ::core::intrinsics::discriminant_value(self); - ::core::hash::Hash::hash(&__self_discr, state); - match self { - SequenceAttributes::Start { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SequenceAttributes::Stop { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SequenceAttributes::SequenceFfn { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SequenceAttributes::SequenceFaa { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SequenceAttributes::CodonStart { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - SequenceAttributes::Strand { value: __self_0 } => { - ::core::hash::Hash::hash(__self_0, state) - } - } - } - } - #[automatically_derived] - impl ::core::clone::Clone for SequenceAttributes { - #[inline] - fn clone(&self) -> SequenceAttributes { - match self { - SequenceAttributes::Start { value: __self_0 } => { - SequenceAttributes::Start { - value: ::core::clone::Clone::clone(__self_0), - } - } - SequenceAttributes::Stop { value: __self_0 } => { - SequenceAttributes::Stop { - value: ::core::clone::Clone::clone(__self_0), - } - } - SequenceAttributes::SequenceFfn { value: __self_0 } => { - SequenceAttributes::SequenceFfn { - value: ::core::clone::Clone::clone(__self_0), - } - } - SequenceAttributes::SequenceFaa { value: __self_0 } => { - SequenceAttributes::SequenceFaa { - value: ::core::clone::Clone::clone(__self_0), - } - } - SequenceAttributes::CodonStart { value: __self_0 } => { - SequenceAttributes::CodonStart { - value: ::core::clone::Clone::clone(__self_0), - } - } - SequenceAttributes::Strand { value: __self_0 } => { - SequenceAttributes::Strand { - value: ::core::clone::Clone::clone(__self_0), - } - } - } - } - } - impl SequenceAttributeBuilder { - pub fn get_start(&self, key: &str) -> Option<&RangeValue> { - self.seq_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SequenceAttributes::Start { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_stop(&self, key: &str) -> Option<&RangeValue> { - self.seq_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SequenceAttributes::Stop { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_sequence_ffn(&self, key: &str) -> Option<&String> { - self.seq_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SequenceAttributes::SequenceFfn { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_sequence_faa(&self, key: &str) -> Option<&String> { - self.seq_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SequenceAttributes::SequenceFaa { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_codon_start(&self, key: &str) -> Option<&u8> { - self.seq_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SequenceAttributes::CodonStart { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - pub fn get_strand(&self, key: &str) -> Option<&i8> { - self.seq_attributes - .get(key) - .and_then(|set| { - set.iter() - .find_map(|attr| { - if let SequenceAttributes::Strand { value } = attr { - Some(value) - } else { - None - } - }) - }) - } - } - ///builder for the sequence information on a per coding sequence (CDS) basis - pub struct SequenceAttributeBuilder { - pub seq_attributes: BTreeMap>, - pub locus_tag: Option, - } - #[automatically_derived] - impl ::core::fmt::Debug for SequenceAttributeBuilder { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - ::core::fmt::Formatter::debug_struct_field2_finish( - f, - "SequenceAttributeBuilder", - "seq_attributes", - &self.seq_attributes, - "locus_tag", - &&self.locus_tag, - ) - } - } - #[automatically_derived] - impl ::core::default::Default for SequenceAttributeBuilder { - #[inline] - fn default() -> SequenceAttributeBuilder { - SequenceAttributeBuilder { - seq_attributes: ::core::default::Default::default(), - locus_tag: ::core::default::Default::default(), - } - } - } - #[automatically_derived] - impl ::core::clone::Clone for SequenceAttributeBuilder { - #[inline] - fn clone(&self) -> SequenceAttributeBuilder { - SequenceAttributeBuilder { - seq_attributes: ::core::clone::Clone::clone(&self.seq_attributes), - locus_tag: ::core::clone::Clone::clone(&self.locus_tag), - } - } - } - impl SequenceAttributeBuilder { - pub fn new() -> Self { - SequenceAttributeBuilder { - seq_attributes: BTreeMap::new(), - locus_tag: None, - } - } - pub fn set_counter(&mut self, counter: String) -> &mut Self { - self.locus_tag = Some(counter); - self - } - pub fn insert_to(&mut self, value: SequenceAttributes) { - if let Some(counter) = &self.locus_tag { - self.seq_attributes - .entry(counter.to_string()) - .or_insert_with(HashSet::new) - .insert(value); - } else { - { - ::core::panicking::panic_fmt(format_args!("Counter key not set")); - }; - } - } - pub fn set_start(&mut self, value: RangeValue) -> &mut Self { - self.insert_to(SequenceAttributes::Start { value }); - self - } - pub fn set_stop(&mut self, value: RangeValue) -> &mut Self { - self.insert_to(SequenceAttributes::Stop { value }); - self - } - pub fn set_sequence_ffn(&mut self, value: String) -> &mut Self { - self.insert_to(SequenceAttributes::SequenceFfn { - value, - }); - self - } - pub fn set_sequence_faa(&mut self, value: String) -> &mut Self { - self.insert_to(SequenceAttributes::SequenceFaa { - value, - }); - self - } - pub fn set_codon_start(&mut self, value: u8) -> &mut Self { - self.insert_to(SequenceAttributes::CodonStart { - value, - }); - self - } - pub fn set_strand(&mut self, value: i8) -> &mut Self { - self.insert_to(SequenceAttributes::Strand { - value, - }); - self - } - pub fn build(self) -> BTreeMap> { - self.seq_attributes - } - pub fn iter_sorted( - &self, - ) -> std::collections::btree_map::Iter> { - self.seq_attributes.iter() - } - pub fn default() -> Self { - SequenceAttributeBuilder { - seq_attributes: BTreeMap::new(), - locus_tag: None, - } - } - } - ///product lines can contain difficult to parse punctuation such as biochemical symbols like unclosed single quotes, superscripts, single and double brackets etc. - ///here we substitute these for an underscore - pub fn substitute_odd_punctuation(input: String) -> Result { - let re = Regex::new(r"[/?()',`]|[α-ωΑ-Ω]")?; - let cleaned = input.trim_end_matches(&['\r', '\n'][..]); - Ok(re.replace_all(cleaned, "_").to_string()) - } - ///GFF3 field9 construct - pub struct GFFInner { - pub id: String, - pub name: String, - pub locus_tag: String, - pub gene: String, - pub product: String, - } - #[automatically_derived] - impl ::core::fmt::Debug for GFFInner { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - ::core::fmt::Formatter::debug_struct_field5_finish( - f, - "GFFInner", - "id", - &self.id, - "name", - &self.name, - "locus_tag", - &self.locus_tag, - "gene", - &self.gene, - "product", - &&self.product, - ) - } - } - impl GFFInner { - pub fn new( - id: String, - name: String, - locus_tag: String, - gene: String, - product: String, - ) -> Self { - GFFInner { - id, - name, - locus_tag, - gene, - product, - } - } - } - ///The main GFF3 construct - pub struct GFFOuter<'a> { - pub seqid: String, - pub source: String, - pub type_val: String, - pub start: u32, - pub end: u32, - pub score: f64, - pub strand: String, - pub phase: u8, - pub attributes: &'a GFFInner, - } - #[automatically_derived] - impl<'a> ::core::fmt::Debug for GFFOuter<'a> { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - let names: &'static _ = &[ - "seqid", - "source", - "type_val", - "start", - "end", - "score", - "strand", - "phase", - "attributes", - ]; - let values: &[&dyn ::core::fmt::Debug] = &[ - &self.seqid, - &self.source, - &self.type_val, - &self.start, - &self.end, - &self.score, - &self.strand, - &self.phase, - &&self.attributes, - ]; - ::core::fmt::Formatter::debug_struct_fields_finish( - f, - "GFFOuter", - names, - values, - ) - } - } - impl<'a> GFFOuter<'a> { - pub fn new( - seqid: String, - source: String, - type_val: String, - start: u32, - end: u32, - score: f64, - strand: String, - phase: u8, - attributes: &'a GFFInner, - ) -> Self { - GFFOuter { - seqid, - source, - type_val, - start, - end, - score, - strand, - phase, - attributes, - } - } - pub fn field9_attributes_build(&self) -> String { - let mut full_field9 = Vec::new(); - if !self.attributes.id.is_empty() { - full_field9 - .push( - ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("id={0}", self.attributes.id), - ); - res - }), - ); - } - if !self.attributes.name.is_empty() { - full_field9 - .push( - ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("name={0}", self.attributes.name), - ); - res - }), - ); - } - if !self.attributes.gene.is_empty() { - full_field9 - .push( - ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("gene={0}", self.attributes.gene), - ); - res - }), - ); - } - if !self.attributes.locus_tag.is_empty() { - full_field9 - .push( - ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("locus_tag={0}", self.attributes.locus_tag), - ); - res - }), - ); - } - if !self.attributes.product.is_empty() { - full_field9 - .push( - ::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("product={0}", self.attributes.product), - ); - res - }), - ); - } - full_field9.join(";") - } - } - ///formats the translation string which can be multiple lines, for gbk - pub fn format_translation(translation: &str) -> String { - let mut formatted = String::new(); - let cleaned_translation = translation.replace("\n", ""); - formatted.push_str(" /translation=\""); - let line_length: usize = 60; - let final_num = line_length - 15; - formatted - .push_str( - &::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!("{0}\n", &cleaned_translation[0..final_num]), - ); - res - }), - ); - for i in (47..translation.len()).step_by(60) { - let end = i + 60 - 1; - let valid_end = if end >= translation.len() { - &cleaned_translation.len() - 1 - } else { - end - }; - formatted - .push_str( - &::alloc::__export::must_use({ - let res = ::alloc::fmt::format( - format_args!( - " {0}", - &cleaned_translation[i..valid_end], - ), - ); - res - }), - ); - { - ::std::io::_print( - format_args!( - "cleaned translation leng is {0:?}\n", - &cleaned_translation[i..valid_end].len(), - ), - ); - }; - if *&cleaned_translation[i..valid_end].len() < 59 { - formatted.push('\"'); - } else { - formatted.push('\n'); - } - } - formatted - } - ///writes the DNA sequence in gbk format with numbering - pub fn write_gbk_format_sequence(sequence: &str, file: &mut File) -> io::Result<()> { - file.write_fmt(format_args!("ORIGIN\n"))?; - let mut formatted = String::new(); - let cleaned_input = sequence.replace("\n", ""); - let mut index = 1; - for (_i, chunk) in cleaned_input.as_bytes().chunks(60).enumerate() { - formatted - .push_str( - &::alloc::__export::must_use({ - let res = ::alloc::fmt::format(format_args!("{0:>5} ", index)); - res - }), - ); - for (j, sub_chunk) in chunk.chunks(10).enumerate() { - if j > 0 { - formatted.push(' '); - } - formatted.push_str(&String::from_utf8_lossy(sub_chunk)); - } - formatted.push('\n'); - index += 60; - } - file.write_fmt(format_args!("{0:>6}\n", &formatted))?; - file.write_fmt(format_args!("//\n"))?; - Ok(()) - } - ///saves the parsed data in genbank format - pub fn gbk_write( - seq_region: BTreeMap, - record_vec: Vec, - filename: &str, - ) -> io::Result<()> { - let now = Local::now(); - let formatted_date = now.format("%d-%b-%Y").to_string().to_uppercase(); - let mut file = OpenOptions::new() - .write(true) - .append(true) - .create(true) - .open(filename)?; - for (i, (key, _val)) in seq_region.iter().enumerate() { - let strain = match &record_vec[i].source_map.get_strain(key) { - Some(value) => value.to_string(), - None => "Unknown".to_string(), - }; - let organism = match &record_vec[i].source_map.get_organism(key) { - Some(value) => value.to_string(), - None => "Unknown".to_string(), - }; - let mol_type = match &record_vec[i].source_map.get_mol_type(key) { - Some(value) => value.to_string(), - None => "Unknown".to_string(), - }; - let type_material = match &record_vec[i].source_map.get_type_material(&key) { - Some(value) => value.to_string(), - None => "Unknown".to_string(), - }; - let db_xref = match &record_vec[i].source_map.get_db_xref(key) { - Some(value) => value.to_string(), - None => "Unknown".to_string(), - }; - let source_stop = match &record_vec[i].source_map.get_stop(key) { - Some(value) => value.get_value(), - None => { - { - { - ::std::io::_print(format_args!("stop value not found\n")); - }; - None - } - .expect("stop value not received") - } - }; - file.write_fmt( - format_args!( - "LOCUS {0} {1} bp DNA linear CON {2}\n", - &key, - &record_vec[i].sequence.len(), - &formatted_date, - ), - )?; - file.write_fmt(format_args!("DEFINITION {0} {1}.\n", &organism, &strain))?; - file.write_fmt(format_args!("ACCESSION {0}\n", &key))?; - file.write_fmt(format_args!("KEYWORDS .\n"))?; - file.write_fmt(format_args!("SOURCE {0} {1}\n", &organism, &strain))?; - file.write_fmt(format_args!(" ORGANISM {0} {1}\n", &organism, &strain))?; - file.write_fmt(format_args!("FEATURES Location/Qualifiers\n"))?; - file.write_fmt(format_args!(" source 1..{0}\n", &source_stop))?; - file.write_fmt( - format_args!(" /organism=\"{0}\"\n", &strain), - )?; - file.write_fmt( - format_args!(" /mol_type=\"{0}\"\n", &mol_type), - )?; - file.write_fmt( - format_args!(" /strain=\"{0}\"\n", &strain), - )?; - if type_material != *"Unknown".to_string() { - file.write_fmt( - format_args!( - " /type_material=\"{0}\"\n", - &type_material, - ), - )?; - } - file.write_fmt( - format_args!(" /db_xref=\"{0}\"\n", &db_xref), - )?; - for (locus_tag, _value) in &record_vec[i].cds.attributes { - let start = match &record_vec[i].cds.get_start(locus_tag) { - Some(value) => value.get_value(), - None => { - { - { - ::std::io::_print(format_args!("start value not found\n")); - }; - None - } - .expect("start value not received") - } - }; - let stop = match &record_vec[i].cds.get_stop(locus_tag) { - Some(value) => value.get_value(), - None => { - { - { - ::std::io::_print(format_args!("stop value not found\n")); - }; - None - } - .expect("stop value not received") - } - }; - let product = match &record_vec[i].cds.get_product(locus_tag) { - Some(value) => value.to_string(), - None => "unknown product".to_string(), - }; - let strand = match &record_vec[i].cds.get_strand(locus_tag) { - Some(value) => **value, - None => 0, - }; - let codon_start = match &record_vec[i].cds.get_codon_start(locus_tag) { - Some(value) => **value, - None => 0, - }; - let gene = match &record_vec[i].cds.get_gene(locus_tag) { - Some(value) => value.to_string(), - None => "unknown".to_string(), - }; - let translation = match &record_vec[i] - .seq_features - .get_sequence_faa(locus_tag) - { - Some(value) => value.to_string(), - None => "unknown".to_string(), - }; - if strand == 1 { - file.write_fmt( - format_args!(" gene {0}..{1}\n", &start, &stop), - )?; - } else { - file.write_fmt( - format_args!( - " gene complement({0}..{1})\n", - &start, - &stop, - ), - )?; - } - file.write_fmt( - format_args!(" /locus_tag=\"{0}\"\n", &locus_tag), - )?; - if strand == 1 { - file.write_fmt( - format_args!(" CDS {0}..{1}\n", &start, &stop), - )?; - } else { - file.write_fmt( - format_args!( - " CDS complement({0}..{1})\n", - &start, - &stop, - ), - )?; - } - file.write_fmt( - format_args!(" /locus_tag=\"{0}\"\n", &locus_tag), - )?; - file.write_fmt( - format_args!( - " /codon_start=\"{0}\"\n", - &codon_start, - ), - )?; - if gene != "unknown" { - file.write_fmt( - format_args!(" /gene=\"{0}\"\n", &gene), - )?; - } - if translation != "unknown" { - let formatted_translation = format_translation(&translation); - file.write_fmt(format_args!("{0}\n", &formatted_translation))?; - } - file.write_fmt( - format_args!(" /product=\"{0}\"\n", &product), - )?; - } - write_gbk_format_sequence(&record_vec[i].sequence, &mut file)?; - } - Ok(()) - } - ///saves the parsed data in gff3 format - #[allow(unused_assignments)] - #[allow(unused_variables)] - pub fn gff_write( - seq_region: BTreeMap, - mut record_vec: Vec, - filename: &str, - dna: bool, - ) -> io::Result<()> { - let mut file = OpenOptions::new().append(true).create(true).open(filename)?; - if file.metadata()?.len() == 0 { - file.write_fmt(format_args!("##gff-version 3\n"))?; - } - let mut full_seq = String::new(); - let mut prev_end: u32 = 0; - for (k, v) in seq_region.iter() { - file.write_fmt( - format_args!("##sequence-region\t{0}\t{1}\t{2}\n", &k, v.0, v.1), - )?; - } - for ((source_name, (seq_start, seq_end)), record) in seq_region - .iter() - .zip(record_vec.drain(..)) - { - if dna == true { - full_seq.push_str(&record.sequence); - } - for (locus_tag, _valu) in &record.cds.attributes { - let start = match record.cds.get_start(locus_tag) { - Some(value) => value.get_value(), - None => { - { - { - ::std::io::_print(format_args!("start value not found\n")); - }; - None - } - .expect("start value not received") - } - }; - let stop = match record.cds.get_stop(locus_tag) { - Some(value) => value.get_value(), - None => { - { - { - ::std::io::_print(format_args!("stop value not found\n")); - }; - None - } - .expect("stop value not received") - } - }; - let gene = match record.cds.get_gene(locus_tag) { - Some(value) => value.to_string(), - None => "unknown".to_string(), - }; - let product = match record.cds.get_product(locus_tag) { - Some(value) => value.to_string(), - None => "unknown product".to_string(), - }; - let strand = match record.cds.get_strand(locus_tag) { - Some(valu) => { - match valu { - 1 => "+".to_string(), - -1 => "-".to_string(), - _ => { - { - ::std::io::_print( - format_args!( - "unexpected strand value {0} for locus_tag {1}\n", - valu, - locus_tag, - ), - ); - }; - "unknownstrand".to_string() - } - } - } - None => "unknownvalue".to_string(), - }; - let phase = match record.cds.get_codon_start(locus_tag) { - Some(valuer) => { - match valuer { - 1 => 0, - 2 => 1, - 3 => 2, - _ => { - { - ::std::io::_print( - format_args!( - "unexpected phase value {0} in the bagging area for locus_tag {1}\n", - valuer, - locus_tag, - ), - ); - }; - 1 - } - } - } - None => 1, - }; - let gff_inner = GFFInner::new( - locus_tag.to_string(), - source_name.clone(), - locus_tag.to_string(), - gene, - product, - ); - let gff_outer = GFFOuter::new( - source_name.clone(), - ".".to_string(), - "CDS".to_string(), - start + prev_end, - stop + prev_end, - 0.0, - strand, - phase, - &gff_inner, - ); - let field9_attributes = gff_outer.field9_attributes_build(); - file.write_fmt( - format_args!( - "{0}\t{1}\t{2}\t{3:?}\t{4:?}\t{5}\t{6}\t{7}\t{8}\n", - gff_outer.seqid, - gff_outer.source, - gff_outer.type_val, - gff_outer.start, - gff_outer.end, - gff_outer.score, - gff_outer.strand, - gff_outer.phase, - field9_attributes, - ), - )?; - } - prev_end = *seq_end; - } - if dna { - file.write_fmt(format_args!("##FASTA\n"))?; - file.write_fmt(format_args!("{0}\n", full_seq))?; - } - Ok(()) - } - ///saves the parsed data in gff3 format - #[allow(unused_assignments)] - pub fn orig_gff_write( - seq_region: BTreeMap, - record_vec: Vec, - filename: &str, - dna: bool, - ) -> io::Result<()> { - let mut file = OpenOptions::new().append(true).create(true).open(filename)?; - if file.metadata()?.len() == 0 { - file.write_fmt(format_args!("##gff-version 3\n"))?; - } - let mut source_name = String::new(); - let mut full_seq = String::new(); - let mut prev_end: u32 = 0; - for (k, v) in seq_region.iter() { - file.write_fmt( - format_args!("##sequence-region\t{0}\t{1}\t{2}\n", &k, v.0, v.1), - )?; - } - for (i, (key, val)) in seq_region.iter().enumerate() { - source_name = key.to_string(); - if dna == true { - full_seq.push_str(&record_vec[i].sequence); - } - for (locus_tag, _valu) in &record_vec[i].cds.attributes { - let start = match record_vec[i].cds.get_start(locus_tag) { - Some(value) => value.get_value(), - None => { - { - { - ::std::io::_print(format_args!("start value not found\n")); - }; - None - } - .expect("start value not received") - } - }; - let stop = match record_vec[i].cds.get_stop(locus_tag) { - Some(value) => value.get_value(), - None => { - { - { - ::std::io::_print(format_args!("stop value not found\n")); - }; - None - } - .expect("stop value not received") - } - }; - let gene = match record_vec[i].cds.get_gene(locus_tag) { - Some(value) => value.to_string(), - None => "unknown".to_string(), - }; - let product = match record_vec[i].cds.get_product(locus_tag) { - Some(value) => value.to_string(), - None => "unknown product".to_string(), - }; - let strand = match record_vec[i].cds.get_strand(locus_tag) { - Some(valu) => { - match valu { - 1 => "+".to_string(), - -1 => "-".to_string(), - _ => { - { - ::std::io::_print( - format_args!( - "unexpected strand value {0} for locus_tag {1}\n", - valu, - locus_tag, - ), - ); - }; - "unknownstrand".to_string() - } - } - } - None => "unknownvalue".to_string(), - }; - let phase = match record_vec[i].cds.get_codon_start(locus_tag) { - Some(valuer) => { - match valuer { - 1 => 0, - 2 => 1, - 3 => 2, - _ => { - { - ::std::io::_print( - format_args!( - "unexpected phase value {0} in the bagging area for locus_tag {1}\n", - valuer, - locus_tag, - ), - ); - }; - 1 - } - } - } - None => 1, - }; - let gff_inner = GFFInner::new( - locus_tag.to_string(), - source_name.clone(), - locus_tag.to_string(), - gene, - product, - ); - let gff_outer = GFFOuter::new( - source_name.clone(), - ".".to_string(), - "CDS".to_string(), - start + prev_end, - stop + prev_end, - 0.0, - strand, - phase, - &gff_inner, - ); - let field9_attributes = gff_outer.field9_attributes_build(); - file.write_fmt( - format_args!( - "{0}\t{1}\t{2}\t{3:?}\t{4:?}\t{5}\t{6}\t{7}\t{8}\n", - gff_outer.seqid, - gff_outer.source, - gff_outer.type_val, - gff_outer.start, - gff_outer.end, - gff_outer.score, - gff_outer.strand, - gff_outer.phase, - field9_attributes, - ), - )?; - } - prev_end = val.1; - } - if dna { - file.write_fmt(format_args!("##FASTA\n"))?; - file.write_fmt(format_args!("{0}\n", full_seq))?; - } - Ok(()) - } - ///internal record containing data from a single source or contig. Has multiple features. - pub struct Record { - pub id: String, - pub length: u32, - pub sequence: String, - pub start: usize, - pub end: usize, - pub strand: i32, - pub cds: FeatureAttributeBuilder, - pub source_map: SourceAttributeBuilder, - pub seq_features: SequenceAttributeBuilder, - } - #[automatically_derived] - impl ::core::fmt::Debug for Record { - #[inline] - fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result { - let names: &'static _ = &[ - "id", - "length", - "sequence", - "start", - "end", - "strand", - "cds", - "source_map", - "seq_features", - ]; - let values: &[&dyn ::core::fmt::Debug] = &[ - &self.id, - &self.length, - &self.sequence, - &self.start, - &self.end, - &self.strand, - &self.cds, - &self.source_map, - &&self.seq_features, - ]; - ::core::fmt::Formatter::debug_struct_fields_finish( - f, - "Record", - names, - values, - ) - } - } - #[automatically_derived] - impl ::core::clone::Clone for Record { - #[inline] - fn clone(&self) -> Record { - Record { - id: ::core::clone::Clone::clone(&self.id), - length: ::core::clone::Clone::clone(&self.length), - sequence: ::core::clone::Clone::clone(&self.sequence), - start: ::core::clone::Clone::clone(&self.start), - end: ::core::clone::Clone::clone(&self.end), - strand: ::core::clone::Clone::clone(&self.strand), - cds: ::core::clone::Clone::clone(&self.cds), - source_map: ::core::clone::Clone::clone(&self.source_map), - seq_features: ::core::clone::Clone::clone(&self.seq_features), - } - } - } - impl Record { - /// Create a new instance. - pub fn new() -> Self { - Record { - id: "".to_owned(), - length: 0, - sequence: "".to_owned(), - start: 0, - end: 0, - strand: 0, - source_map: SourceAttributeBuilder::new(), - cds: FeatureAttributeBuilder::new(), - seq_features: SequenceAttributeBuilder::new(), - } - } - pub fn is_empty(&mut self) -> bool { - self.id.is_empty() && self.length == 0 - } - pub fn check(&mut self) -> Result<(), &str> { - if self.id().is_empty() { - return Err("Expecting id for Gbk record."); - } - Ok(()) - } - pub fn id(&mut self) -> &str { - &self.id - } - pub fn length(&mut self) -> u32 { - self.length - } - pub fn sequence(&mut self) -> &str { - &self.sequence - } - pub fn start(&mut self) -> u32 { - self.start.try_into().unwrap() - } - pub fn end(&mut self) -> u32 { - self.end.try_into().unwrap() - } - pub fn strand(&mut self) -> i32 { - self.strand - } - pub fn cds(&mut self) -> FeatureAttributeBuilder { - self.cds.clone() - } - pub fn source_map(&mut self) -> SourceAttributeBuilder { - self.source_map.clone() - } - pub fn seq_features(&mut self) -> SequenceAttributeBuilder { - self.seq_features.clone() - } - fn rec_clear(&mut self) { - self.id.clear(); - self.length = 0; - self.sequence.clear(); - self.start = 0; - self.end = 0; - self.strand = 0; - self.source_map = SourceAttributeBuilder::new(); - self.cds = FeatureAttributeBuilder::new(); - self.seq_features = SequenceAttributeBuilder::new(); - } - } - impl Default for Record { - fn default() -> Self { - Self::new() - } - } - #[allow(dead_code)] - pub struct Config { - filename: String, - } - impl Config { - pub fn new(args: &[String]) -> Result { - if args.len() < 2 { - { - ::core::panicking::panic_fmt( - format_args!("not enough arguments, please provide filename"), - ); - }; - } - let filename = args[1].clone(); - Ok(Config { filename }) - } - } -} diff --git a/mkdocs.yml b/mkdocs.yml deleted file mode 100644 index f4d9a86..0000000 --- a/mkdocs.yml +++ /dev/null @@ -1,39 +0,0 @@ -site_name: microBioRust docs -site_url: https://lcrossman.github.io/microBioRust/ -theme: - name: material - # logo: images/pc_specs.png - # favicon: images/pc_specs.png - font: - text: "Open Sans" - palette: - - scheme: default - toggle: - icon: material/weather-night - name: Switch to dark mode - primary: custom - accent: custom - - scheme: slate - toggle: - icon: material/weather-sunny - name: Switch to light mode - primary: custom - accent: custom -nav: - - Index: index.md - - Installation: installation.md - - Windows Install: windows_install.md - - Usage: usage.md - - Formats & Parsing: formats_and_parsing.md -features: - - navigation.tabs -markdown_extensions: - - attr_list - - md_in_html - - pymdownx.blocks.caption - - pymdownx.emoji: - emoji_index: !!python/name:material.extensions.emoji.twemoji - emoji_generator: !!python/name:material.extensions.emoji.to_svg -custom_dir: overrides -extra_css: - - stylesheets/extra.css diff --git a/seqmetrics/heatmap/Cargo.toml b/seqmetrics/heatmap/Cargo.toml new file mode 100644 index 0000000..5a1f1ea --- /dev/null +++ b/seqmetrics/heatmap/Cargo.toml @@ -0,0 +1,25 @@ +[package] +name = "microBioRust-heatmap" +license = "MIT" +keywords = ["bioinformatics","micro","bio","genomics","sequence-analysis"] +description = "Microbiology friendly bioinformatics Rust functions" +categories = ["science::bioinformatics::sequence-analysis", "science::bioinformatics::genomics", "science::bioinformatics","visualization","data-structures"] +readme = "README.md" +exclude = [".git",".gitignore"] +repository = "https://github.com/LCrossman/microBioRust" +version = "0.1.1-alpha" +edition = "2021" + +[lib] +crate-type = ["cdylib"] + +[package.metadata.wasm-pack.profile.release] +opt-level = "z" + +[dependencies] +serde-wasm-bindgen = "0.6.5" +serde = { version = "1.0.213", features = ["derive"] } +serde_derive = "1.0" +wasm-bindgen = "0.2.100" +web-sys = { version = "0.3.77", features = ["console","CanvasRenderingContext2d", "HtmlCanvasElement", "Document", "Window"] } +csv = "1.1" diff --git a/seqmetrics/heatmap/README.md b/seqmetrics/heatmap/README.md new file mode 100644 index 0000000..cbe8bdd --- /dev/null +++ b/seqmetrics/heatmap/README.md @@ -0,0 +1,23 @@ +# `heatmap` + +This is functionality for a heatmap data visualisation in Rust WebAssembly calling d3.js + +D3.js (D3 short for data-driven documents) is a Javascript library for dynamic, interactive data viz in browsers. +At the moment the heatmap data is coded into the Rust lib.rs as an example, so it is currently working with fixed data +and a rusty colour scheme + +To install, you can build with wasm-pack 📦✨ + +```shell +wasm-pack build --target web +``` + +And serve it locally, for example with: + +```shell +http-server . +``` + +## Installation +You can install http-server via brew on MacOSX +or with npm diff --git a/seqmetrics/heatmap/src/canvas/drawing.rs b/seqmetrics/heatmap/src/canvas/drawing.rs new file mode 100644 index 0000000..4f11d1b --- /dev/null +++ b/seqmetrics/heatmap/src/canvas/drawing.rs @@ -0,0 +1,141 @@ +use wasm_bindgen::JsValue; +use web_sys::{console, CanvasRenderingContext2d}; + +pub fn draw_responsive_heatmap( + context: &CanvasRenderingContext2d, + values: Vec>, + x_labels: Vec, + y_labels: Vec, + canvas_width: f64, + canvas_height: f64, + device_pixel_ratio: f64, +) -> Result<(), JsValue> { + let rows = values.len(); + let cols = values[0].len(); + console::log_1(&JsValue::from_str(&format!("up in the draw function"))); + // Get canvas dimensions + // Calculate dynamic padding and box size + let adj_canvas_width = canvas_width * device_pixel_ratio; + let adj_canvas_height = canvas_height * device_pixel_ratio; + let padding_left = adj_canvas_width * 0.05; + let padding_top = adj_canvas_height * 0.05; + let padding_bottom = adj_canvas_height * 0.05; + let _padding_right = adj_canvas_width * 0.05; + + // let box_width = (adj_canvas_width - padding_left - padding_right) / (cols as f64 * 1.1); + // let box_height = (adj_canvas_height - padding_top - padding_bottom) / (rows as f64 * 1.1); + + let box_width = 30.0; + let box_height = 30.0; + // Clear the canvas + console::log_1(&JsValue::from_str(&format!( + "pad left {} pad bottom {}", + &padding_left, &padding_bottom + ))); + context.clear_rect(0.0, 0.0, adj_canvas_width, adj_canvas_height); + println!("cleared rec"); + // Draw the heatmap + for row in 0..rows { + for col in 0..cols { + let value = values[row][col]; + + // Set color based on value + let color = match value { + 0 => "#fee0d2", + 1 => "#fc9272", + 2 => "#de2d26", + _ => "#FFFFFF", + }; + //context.set_fill_style_str(&JsValue::from(color)); + context.set_fill_style_str(color); + + let x = padding_left + (col as f64 * box_width); + let y = padding_top + (row as f64 * box_height); + context.fill_rect(x, y, box_width, box_height); + + // Draw box borders + //context.set_stroke_style(&JsValue::from("#FFFFFF")); + context.set_stroke_style_str("#FFFFFF"); + context.set_line_width(2.0 / device_pixel_ratio); + + if row < rows - 1 { + context.begin_path(); + context.move_to(x, y + box_height); + context.line_to(x + box_width, y + box_height); + context.stroke(); + } + + if col < cols - 1 { + context.begin_path(); + context.move_to(x + box_width, y); + context.line_to(x + box_width, y + box_height); + context.stroke(); + } + } + } + console::log_1(&JsValue::from_str(&format!( + "after the rows and cols padding bottom: {}, height: {}", + &padding_bottom, + &(box_height * rows as f64), + ))); + + // Draw X-axis + context.begin_path(); + //context.set_stroke_style_str(&JsValue::from("#000000")); + context.set_stroke_style_str("#000000"); + context.move_to(padding_left, (box_height * rows as f64) + padding_bottom); + context.line_to( + (box_height * rows as f64) + padding_bottom, + (box_height * rows as f64) + padding_left, + ); + context.stroke(); + + // Draw Y-axis + context.begin_path(); + context.move_to(padding_left, padding_top); + context.line_to(padding_left, (box_height * rows as f64) + padding_bottom); + context.stroke(); + + // Draw X-axis ticks and labels + let label_font_size = (box_height * 0.3).min(box_width * 0.3).max(12.0); + context.set_font(&format!("{}px Arial", label_font_size)); + context.set_text_align("center"); + context.set_text_baseline("top"); + + for col in 0..cols { + let x = padding_left + col as f64 * box_width + box_width / 2.0; + let y = (box_height * rows as f64) + padding_bottom + 5.0; // Position below the heatmap + context + .fill_text(&x_labels[col], x, y) + .map_err(|_| JsValue::from_str(&format!("Failed to draw text at column {}", col)))?; + + // Draw ticks + context.begin_path(); + context.move_to(x, (box_height * rows as f64) + padding_bottom); + context.line_to(x, (box_height * rows as f64) + padding_bottom + 5.0); + context.stroke(); + } + + // Draw Y-axis ticks and labels + context.set_text_align("right"); + context.set_text_baseline("middle"); + + for row in 0..rows { + let x = padding_left - 10.0; // Position to the left of the heatmap + let y = padding_top + row as f64 * box_height + box_height / 2.0; + context + .fill_text(&y_labels[row], x, y) + .map_err(|_| JsValue::from_str(&format!("Failed to draw text at row {}", row)))?; + + // Draw ticks + context.begin_path(); + context.move_to(padding_left, y); + context.line_to(padding_left - 5.0, y); + context.stroke(); + } + console::log_1(&JsValue::from_str(&format!( + "at the end of draw funct Canvas width: {}, height: {}", + &adj_canvas_width, &adj_canvas_height + ))); + Ok(()) +} diff --git a/seqmetrics/heatmap/src/canvas/mod.rs b/seqmetrics/heatmap/src/canvas/mod.rs new file mode 100644 index 0000000..05e4e43 --- /dev/null +++ b/seqmetrics/heatmap/src/canvas/mod.rs @@ -0,0 +1 @@ +pub mod drawing; diff --git a/seqmetrics/heatmap/src/heatmap_data.rs b/seqmetrics/heatmap/src/heatmap_data.rs new file mode 100644 index 0000000..5032dd0 --- /dev/null +++ b/seqmetrics/heatmap/src/heatmap_data.rs @@ -0,0 +1,34 @@ +//! This module contains the data structure for the heatmap + +use serde::{Deserialize, Serialize}; + +#[derive(Serialize, Deserialize, Clone, Debug)] +pub struct HeatmapData { + pub values: Vec>, + pub x_labels: Vec, + pub y_labels: Vec, +} + +impl HeatmapData { + // Constructor method + pub fn new() -> Self { + HeatmapData { + values: vec![vec![0]], + x_labels: Vec::new(), + y_labels: Vec::new(), + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_new() { + let heatmap_data = HeatmapData::new(); + assert_eq!(heatmap_data.values, vec![vec![0]]); + assert_eq!(heatmap_data.x_labels, Vec::::new()); + assert_eq!(heatmap_data.y_labels, Vec::::new()); + } +} diff --git a/seqmetrics/heatmap/src/lib.rs b/seqmetrics/heatmap/src/lib.rs new file mode 100644 index 0000000..a3ca5ec --- /dev/null +++ b/seqmetrics/heatmap/src/lib.rs @@ -0,0 +1,153 @@ +//! # A Heatmap in Rust web assembly calling d3.js +//! +//! You will need to use wasm-pack to build instead of cargo +//! wasm-pack build --target web +//! And some way of serving locally +//! http-server . +//! It requires the index.html in the static directory +//! Currently working with fixed data +//! and a rusty colour theme +#![allow(non_snake_case)] +pub mod canvas; +pub mod heatmap_data; + +// internal imports +use canvas::drawing::draw_responsive_heatmap; +use heatmap_data::HeatmapData; + +// external imports +use std::rc::Rc; +use wasm_bindgen::prelude::*; +use wasm_bindgen::JsValue; +use web_sys::console; +use web_sys::{window, CanvasRenderingContext2d, HtmlCanvasElement}; + +//returns a JsValue to javascript +#[wasm_bindgen(start)] +pub fn start() -> Result<(), JsValue> { + // Get the window and document + console::log_1(&JsValue::from_str(&format!("literal start"))); + let window = window().ok_or(JsValue::from_str("should have a window in this context"))?; + let window = Rc::new(window); + let window_clone = Rc::clone(&window); + let document = window.document().ok_or(JsValue::from_str("no document"))?; + console::log_1(&JsValue::from_str(&format!( + "up in the start of the function" + ))); + // Get the canvas element + let canvas = document + .get_element_by_id("heatmap") + .ok_or(JsValue::from_str("Canvas element not found"))? + .dyn_into::()?; + console::log_1(&JsValue::from_str(&format!("called the canvas"))); + let heatmap_values = vec![ + vec![2, 1, 0, 1, 0], // row 1 + vec![1, 2, 0, 0, 1], // row 2 + vec![2, 0, 1, 2, 1], // row 3 + vec![0, 0, 0, 2, 0], // row 4 + vec![1, 2, 0, 1, 1], // row 5 + ]; + console::log_1(&JsValue::from_str(&format!("called the heatmap vals"))); + let x_labels: Vec = vec!["A", "B", "C", "D", "E"] + .iter() + .map(|s| s.to_string()) + .collect(); + let y_labels: Vec = vec!["R1", "R2", "R3", "R4", "R5"] + .iter() + .map(|s| s.to_string()) + .collect(); + + let num_rows = heatmap_values.len(); // Should be 5 + let num_cols = heatmap_values[0].len(); // Should be 5 + let mut heatmap_data = HeatmapData::new(); + heatmap_data.values = heatmap_values.clone(); + heatmap_data.x_labels = x_labels.clone(); + heatmap_data.y_labels = y_labels.clone(); + let box_size = 100.0; + let device_pixel_ratio = window.device_pixel_ratio(); + console::log_1(&JsValue::from_str(&format!( + "num rows are {:?} num cols are {:?}", + &num_rows, &num_cols + ))); + + // Dynamically set canvas size based on number of rows and columns + let canvas_width = num_cols as f64 * box_size; // 6 columns * 50px + let canvas_height = num_rows as f64 * box_size; // 6 rows * 50px + canvas.set_width(canvas_width as u32); + canvas.set_height(canvas_height as u32); + console::log_1(&JsValue::from_str(&format!( + "Canvas width: {}, height: {}", + canvas.width(), + canvas.height() + ))); + + let context = canvas + .get_context("2d")? + .ok_or(JsValue::from_str("Context not found"))? + .dyn_into::()?; + + // Define the heatmap matrix (3x3) with values representing different colors + context.scale(device_pixel_ratio, device_pixel_ratio)?; + + draw_responsive_heatmap( + &context, + heatmap_values.clone(), + x_labels.clone(), + y_labels.clone(), + canvas_width, + canvas_height, + device_pixel_ratio, + )?; + + let handle_heatmap_resize = move || -> Result<(), JsValue> { + let new_width = window_clone + .inner_width() + .map_err(|_| JsValue::from_str("error getting inner width"))? + .as_f64() + .ok_or(JsValue::from_str("error converting width to f64"))?; + + let new_height = window_clone + .inner_height() + .map_err(|_| JsValue::from_str("error getting inner height"))? + .as_f64() + .ok_or(JsValue::from_str("error converting height to f64"))?; + + let canvas_new_width = (num_cols as f64 * box_size).min(new_width); + let canvas_new_height = (num_rows as f64 * box_size).min(new_height); + + canvas.set_width(canvas_new_width as u32); + canvas.set_height(canvas_new_height as u32); + + context + .set_transform(1.0, 0.0, 0.0, 1.0, 0.0, 0.0) + .map_err(|_| JsValue::from_str("error setting transform"))?; + context + .scale(device_pixel_ratio, device_pixel_ratio) + .map_err(|_| JsValue::from_str("error scaling context"))?; + + draw_responsive_heatmap( + &context, + heatmap_values.clone(), + x_labels.clone(), + y_labels.clone(), + canvas_new_width, + canvas_new_height, + device_pixel_ratio, + )?; + Ok(()) + }; + + // Wrap the closure_func to handle errors + let error_handled_heatmap_resize = move || { + if let Err(e) = handle_heatmap_resize() { + console::error_1(&e); + } + }; + + let closure = Closure::wrap(Box::new(error_handled_heatmap_resize) as Box); + + window.add_event_listener_with_callback("resize", closure.as_ref().unchecked_ref())?; + closure.forget(); + + Ok(()) +} diff --git a/seqmetrics/microBioRust/.dribble.example.embl b/seqmetrics/microBioRust/.dribble.example.embl new file mode 100644 index 0000000..6257828 --- /dev/null +++ b/seqmetrics/microBioRust/.dribble.example.embl @@ -0,0 +1,33 @@ +FT CDS 1..6114 +FT misc_feature 1..6114 +FT /colour=12 +FT misc_feature 1..6666 +FT /colour=12 +FT CDS 3811..6666 +FT /transl_table=11 +FT /locus_tag="pRL80004" +FT /product="hypothetical protein" +FT /note="no significant database hits" +FT /db_xref="EnsemblGenomes-Gn:pRL80004" +FT /db_xref="EnsemblGenomes-Tr:CAK02804" +FT /db_xref="InterPro:IPR003593" +FT /db_xref="InterPro:IPR027417" +FT /db_xref="UniProtKB/TrEMBL:Q1M9K2" +FT /protein_id="CAK02804.1" +FT /translation="MTEIVLPTENTIIAAAKKLDAAASQLVAETFFAIRHGMSINPIGR +FT NPDGQTIKGYPDITGRVPGEKKYLIEVTKDDWRTHLQSDLSKLSRLQKGAYAGFLLLCF +FT RKSESELTQSNRKKARETVQQAESRIEKLLGVQAGQVEFVFLGEFAREVRSAKYHRVLL +FT ALGLELVPAPFYTDLRFVQGLADFVPTAEEYEAESVVPRDEVSRTYERVFKNRLTLIEG +FT EGGSGKTSLALAVATEHRKQGEIFLFLDASVADWKSGSERARLVDVAAMFAESNVLIIL +FT DNVHLGDASGISELITNVQASGYDFRFLMTTRSSDEVEQWKRLGNIELLRRVPSGADVN +FT SAYHRLLTQKFPGSSFNDIPPAVTTRWSNQIPNLVILTLALEGLTKRGGYDRDWAIKVE +FT DAGTYLQAKFISKLSSDDVKQVGKIAALSLLEIPTSLRSLDHRVPKSAVDLGFVRLNSS +FT STTQRYELVHHELGKLITSFKDPDIKARLGEVMSADPFQATYIGLKLIGNGEASLAKEL +FT LSSVLSQSLTLSPDFSMGNSGGVFGILVQSNVTTYPEIERILLPDIGAFFDTKPDIVTG +FT LSSFLGAASENMERVYNAIVEKLAEQETIRRIEELLPSVGPTTFATLYRCANSRNLPFL +FT STLRKYLNRGKRIDSFAYRCRSESPSKVEICWGLIDEFFPHHKARFEVVLRSALAEGYI +FT ERLIPEELIESRSSRAVQTAIRCANSEVFKRYITFRDCSDATLLLLAHTMHDMGRNDLS +FT EVAADRVAGRTTSSIWYHRRTGGRALLTILRRASISAEGDVQKILMRLEAEGKMRAIVN +FT GMRPYRLANFIFVIWDRHEQFTSFISKTDLQEITNRRFKARAAEFSEERQASIYIAGIY +FT ALVGLDIPRDEWSAVDVTEDDFIGNQNNPVFWIGLKALEENGMIRLAHRSRFPTSVAAL +FT DTHSENTSRIMNDLKNWAATR" diff --git a/seqmetrics/microBioRust/Cargo.toml b/seqmetrics/microBioRust/Cargo.toml new file mode 100644 index 0000000..6da5838 --- /dev/null +++ b/seqmetrics/microBioRust/Cargo.toml @@ -0,0 +1,36 @@ +[package] +name = "microBioRust" +version = "0.1.2" +edition = "2021" +license = "MIT" +keywords = ["bioinformatics", "micro", "bio", "genomics", "sequence-analysis"] +description = "Microbiology friendly bioinformatics Rust functions" +categories = [ + "science::bioinformatics::sequence-analysis", + "science::bioinformatics::genomics", + "science::bioinformatics", + "science", + "data-structures", +] +readme = "README.md" +exclude = [".git", ".gitignore"] +repository = "https://github.com/LCrossman/microBioRust" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html +[lints.rust] +unsafe_code = "forbid" + +[lib] +path = "src/lib.rs" + +[dependencies] +paste = "1.0" +itertools = "0.14.0" +protein-translate = "0.2.0" +bio = "2.3.0" +anyhow = "1.0" +thiserror = "2.0.12" +regex = "1.5" +chrono = "0.4.38" +clap = { version = "4.5.19", features = ["derive"] } + diff --git a/seqmetrics/microBioRust/K12_ribo.gbk b/seqmetrics/microBioRust/K12_ribo.gbk new file mode 100644 index 0000000..7b6113a --- /dev/null +++ b/seqmetrics/microBioRust/K12_ribo.gbk @@ -0,0 +1,81 @@ +LOCUS NC_000913 913 bp DNA linear CON 01-Sep-2025 +DEFINITION Escherichia coli +ACCESSION NC_000913 +KEYWORDS . +SOURCE Escherichia coli str. K-12 substr. MG1655 + ORGANISM Escherichia coli str. K-12 substr. MG1655 + Bacteria; Pseudomonadati; Pseudomonadota; Gammaproteobacteria; + Enterobacterales; Enterobacteriaceae; Escherichia. +FEATURES Location/Qualifiers + source <1..>913 + /id="source_1" + /organism="Escherichia coli str. K-12 substr. MG1655" + /mol_type="genomic DNA" + /strain="K-12" + /sub_strain="MG1655" + /db_xref="taxon:511145" + source complement(1..913) + gene complement(10..363) + /gene="rplR" + /locus_tag="b3304" + /gene_synonym="ECK3291" + /db_xref="ASAP:ABE-0010825" + /db_xref="ECOCYC:EG10879" + /db_xref="GeneID:947804" + CDS complement(10..363) + /gene="rplR" + /locus_tag="b3304" + /gene_synonym="ECK3291" + /codon_start=1 + /transl_table=11 + /product="50S ribosomal subunit protein L18" + /protein_id="NP_417763.1" + /db_xref="UniProtKB/Swiss-Prot:P0C018" + /db_xref="ASAP:ABE-0010825" + /db_xref="ECOCYC:EG10879" + /db_xref="GeneID:947804" + /translation="MDKKSARIRRATRARRKLQELGATRLVVHRTPRHIYAQVIAPNG + SEVLVAASTVEKAIAEQLKYTGNKDAAAAVGKAVAERALEKGIKDVSFDRSGFQYHGR + VQALADAAREAGLQF" + gene complement(373..906) + /gene="rplF" + /locus_tag="b3305" + /gene_synonym="ECK3292" + /db_xref="ASAP:ABE-0010827" + /db_xref="ECOCYC:EG10869" + /db_xref="GeneID:947803" + CDS complement(373..906) + /gene="rplF" + /locus_tag="b3305" + /gene_synonym="ECK3292" + /codon_start=1 + /transl_table=11 + /product="50S ribosomal subunit protein L6" + /protein_id="NP_417764.1" + /db_xref="UniProtKB/Swiss-Prot:P0AG55" + /db_xref="ASAP:ABE-0010827" + /db_xref="ECOCYC:EG10869" + /db_xref="GeneID:947803" + /translation="MSRVAKAPVVVPAGVDVKINGQVITIKGKNGELTRTLNDAVEVK + HADNTLTFGPRDGYADGWAQAGTARALLNSMVIGVTEGFTKKLQLVGVGYRAAVKGNV + INLSLGFSHPVDHQLPAGITAECPTQTEIVLKGADKQVIGQVAADLRAYRRPEPYKGK + GVRYADEVVRTKEAKKK" +BASE COUNT 214 a 256 c 223 g 220 t +ORIGIN + 1 acctctacct tagaactgaa ggccagcttc acgggcagca tctgccagtg cctggacacg + 61 accatgatat tggaacccgg aacggtcaaa ggatacatct ttgatgcctt tttccagagc + 121 gcgttcagcg acagctttac ccacagctgc agccgcgtct ttgttaccgg tgtacttcag + 181 ttgttcagcg atagcttttt ctacagtaga agcagctacc agaacttcag aaccgttcgg + 241 tgcaattacc tgtgcgtaaa tgtgacgcgg ggtacgatgt accaccaggc gagttgcgcc + 301 cagctcctgg agcttgcggc gtgcgcgggt cgcacgacgg atacgagcag atttcttatc + 361 catagtgtta ccttacttct tcttagcctc tttggtacgc acgacttcgt cggcgtaacg + 421 aacacccttg cctttataag gctcaggacg acggtaggcg cgcagatccg ctgcaacctg + 481 gccgatcacc tgcttatcag cgcctttcag cacgatttca gtctgagtcg gacattcagc + 541 agtgataccc gcaggcagct gatggtcaac aggatgagag aaacccagag acaggttaat + 601 cacattgcct ttaaccgctg cacggtaacc tacaccaacc agctgcagct tcttagtgaa + 661 gccttcggta acaccgataa ccattgagtt cagcagggca cgcgcggtac cagcctgtgc + 721 ccaaccgtct gcgtaaccat cacgcggacc gaaggtcagg gtattatctg catgtttaac + 781 ttcaacagca tcgttgagag tacgagtcag ctcgccgttt ttacctttga tcgtaataac + 841 ctgaccgttg atttttacgt caacgccggc aggaacaacg accggtgctt tagcaacacg + 901 agacattttt tcc +// diff --git a/seqmetrics/microBioRust/README.md b/seqmetrics/microBioRust/README.md new file mode 100644 index 0000000..17d050d --- /dev/null +++ b/seqmetrics/microBioRust/README.md @@ -0,0 +1,229 @@ +# `microBioRust` + +## A Rust bioinformatics crate aimed at Microbial genomics
+ +The aim of this crate is to provide Microbiology friendly Rust functions for bioinformatics.
+ +To use a specific workspace, clone the project from GitHub, cd into the specific directory required and build the project from there + + You can parse genbank files and convert to a GFF (gff3) format as well as extracting DNA sequences, gene DNA sequences (ffn) and protein fasta sequences (faa) + You can also parse embl files and convert to a GFF (gff3) format as well as extracting the DNA sequences, gene DNA sequences (ffn) and protein fasta sequences (faa). You can also convert the embl to a gbk format. + There's now a pyo3 with Python Interop where you can import as a PyModule. + +The simple way: parsing your genbank or embl file using the genbank! or embl! macros + +```rust + +pub fn genbank_to_faa(filename: &str) -> Result<(), anyhow::Error> { + let records = genbank!(&filename); + for record in records.iter() { + for (k, _v) in &record.cds.attributes { + if let Some(seq) = record.seq_features.get_sequence_faa(k) { + println!(">{}|{}\n{}", &record.id, &k, seq); + } + } + } + return Ok(()); +} + +better for debugging + +```rust + +pub fn genbank_to_faa() -> Result<(), anyhow::Error> { + let args: Vec = env::args().collect(); + let config = Config::new(&args).unwrap_or_else(|err| { + println!("Problem with parsing file arguments: {}", err); + process::exit(1); + }); + let file_gbk = fs::File::open(config.filename)?; + let mut reader = Reader::new(file_gbk); + let mut records = reader.records(); + let mut cds_counter: u32 = 0; + loop { + //collect from each record advancing on a next record basis, count cds records + match records.next() { + Some(Ok(mut record)) => { + for (k, v) in &record.cds.attributes { + match record.seq_features.get_sequence_faa(&k) { + Some(value) => { + let seq_faa = value.to_string(); + println!(">{}|{}\n{}", &record.id, &k, seq_faa); + } + _ => (), + }; + } + cds_counter += 1; + } + Some(Err(e)) => { + println!("Error encountered - an err {:?}", e); + } + None => { + println!("finished iteration"); + break; + } + } + } + println!("Total records processed: {}", read_counter); + return Ok(()); +} +``` + + Example to save a provided multi- or single genbank file as a GFF file (by joining any multi-genbank) + +```rust +pub fn genbank_to_gff() -> io::Result<()> { + let args: Vec = env::args().collect(); + let config = Config::new(&args).unwrap_or_else(|err| { + println!("Problem with parsing file arguments: {}", err); + process::exit(1); + }); + let file_gbk = fs::File::open(&config.filename)?; + let prev_start: u32 = 0; + let mut prev_end: u32 = 0; + let mut reader = Reader::new(file_gbk); + let mut records = reader.records(); + let mut read_counter: u32 = 0; + let mut seq_region: BTreeMap = BTreeMap::new(); + let mut record_vec: Vec = Vec::new(); + loop { + match records.next() { + Some(Ok(mut record)) => { + //println!("next record"); + //println!("Record id: {:?}", record.id); + let source = record + .source_map + .source_name + .clone() + .expect("issue collecting source name"); + let beginning = match record.source_map.get_start(&source) { + Some(value) => value.get_value(), + _ => 0, + }; + let ending = match record.source_map.get_stop(&source) { + Some(value) => value.get_value(), + _ => 0, + }; + if ending + prev_end < beginning + prev_end {} + seq_region.insert(source, (beginning + prev_end, ending + prev_end)); + record_vec.push(record); + // Add additional fields to print if needed + read_counter += 1; + prev_end += ending; // create the joined record if there are multiple + } + Some(Err(e)) => { + println!("theres an err {:?}", e); + } + None => { + println!("finished iteration"); + break; + } + } + } + let output_file = format!("{}.gff", &config.filename); + gff_write(seq_region.clone(), record_vec, &output_file, true); + println!("Total records processed: {}", read_counter); + return Ok(()); +} +``` + + Example to create a completely new record, use of setters or set_ functionality + + To write into GFF format requires gff_write(seq_region, record_vec, filename, true or false) + + The seq_region is the region of interest to save with name and DNA coordinates such as ```seqregion.entry("source_1".to_string(), (1,897))``` + This makes it possible to save the whole file or to subset it + + record_vec is a list of the records. If there is only one record, include this as a vec using ``` vec![record] ``` + + The boolean true/false describes whether the DNA sequence should be included in the GFF3 file + + To write into genbank format requires gbk_write(seq_region, record_vec, filename), no true or false since genbank format will include the DNA sequence + + ```rust + pub fn create_new_record() -> Result<(), anyhow::Error> { + let filename = format!("new_record.gff"); + let mut record = Record::new(); + let mut seq_region: BTreeMap = BTreeMap::new(); + //example from E.coli K12 + seq_region.insert("source_1".to_string(), (1,897)); + //Add the source into SourceAttributes + record.source_map + .set_counter("source_1".to_string()) + .set_start(RangeValue::Exact(1)) + .set_stop(RangeValue::Exact(897)) + .set_organism("Escherichia coli".to_string()) + .set_mol_type("DNA".to_string()) + .set_strain("K-12 substr. MG1655".to_string()) + .set_type_material("type strain of Escherichia coli K12".to_string()) + .set_db_xref("PRJNA57779".to_string()); + //Add the features into FeatureAttributes, here we are setting two features, i.e. coding sequences or genes + record.cds + .set_counter("b3304".to_string()) + .set_start(RangeValue::Exact(1)) + .set_stop(RangeValue::Exact(354)) + .set_gene("rplR".to_string()) + .set_product("50S ribosomal subunit protein L18".to_string()) + .set_codon_start(1) + .set_strand(-1); + record.cds + .set_counter("b3305".to_string()) + .set_start(RangeValue::Exact(364)) + .set_stop(RangeValue::Exact(897)) + .set_gene("rplF".to_string()) + .set_product("50S ribosomal subunit protein L6".to_string()) + .set_codon_start(1) + .set_strand(-1); + //Add the sequences for the coding sequence (CDS) into SequenceAttributes + record.seq_features + .set_counter("b3304".to_string()) + .set_start(RangeValue::Exact(1)) + .set_stop(RangeValue::Exact(354)) + .set_sequence_ffn("ATGGATAAGAAATCTGCTCGTATCCGTCGTGCGACCCGCGCACGCCGCAAGCTCCAGGAG +CTGGGCGCAACTCGCCTGGTGGTACATCGTACCCCGCGTCACATTTACGCACAGGTAATT +GCACCGAACGGTTCTGAAGTTCTGGTAGCTGCTTCTACTGTAGAAAAAGCTATCGCTGAA +CAACTGAAGTACACCGGTAACAAAGACGCGGCTGCAGCTGTGGGTAAAGCTGTCGCTGAA +CGCGCTCTGGAAAAAGGCATCAAAGATGTATCCTTTGACCGTTCCGGGTTCCAATATCAT +GGTCGTGTCCAGGCACTGGCAGATGCTGCCCGTGAAGCTGGCCTTCAGTTCTAA".to_string()) + .set_sequence_faa("MDKKSARIRRATRARRKLQELGATRLVVHRTPRHIYAQVIAPNGSEVLVAASTVEKAIAE +QLKYTGNKDAAAAVGKAVAERALEKGIKDVSFDRSGFQYHGRVQALADAAREAGLQF".to_string()) + .set_codon_start(1) + .set_strand(-1); + record.seq_features + .set_counter("bb3305".to_string()) + .set_start(RangeValue::Exact(364)) + .set_stop(RangeValue::Exact(897)) + .set_sequence_ffn("ATGTCTCGTGTTGCTAAAGCACCGGTCGTTGTTCCTGCCGGCGTTGACGTAAAAATCAAC +GGTCAGGTTATTACGATCAAAGGTAAAAACGGCGAGCTGACTCGTACTCTCAACGATGCT +GTTGAAGTTAAACATGCAGATAATACCCTGACCTTCGGTCCGCGTGATGGTTACGCAGAC +GGTTGGGCACAGGCTGGTACCGCGCGTGCCCTGCTGAACTCAATGGTTATCGGTGTTACC +GAAGGCTTCACTAAGAAGCTGCAGCTGGTTGGTGTAGGTTACCGTGCAGCGGTTAAAGGC +AATGTGATTAACCTGTCTCTGGGTTTCTCTCATCCTGTTGACCATCAGCTGCCTGCGGGT +ATCACTGCTGAATGTCCGACTCAGACTGAAATCGTGCTGAAAGGCGCTGATAAGCAGGTG +ATCGGCCAGGTTGCAGCGGATCTGCGCGCCTACCGTCGTCCTGAGCCTTATAAAGGCAAG +GGTGTTCGTTACGCCGACGAAGTCGTGCGTACCAAAGAGGCTAAGAAGAAGTAA".to_string()) + .set_sequence_faa("MSRVAKAPVVVPAGVDVKINGQVITIKGKNGELTRTLNDAVEVKHADNTLTFGPRDGYAD +GWAQAGTARALLNSMVIGVTEGFTKKLQLVGVGYRAAVKGNVINLSLGFSHPVDHQLPAG +ITAECPTQTEIVLKGADKQVIGQVAADLRAYRRPEPYKGKGVRYADEVVRTKEAKKK".to_string()) + .set_codon_start(1) + .set_strand(-1); + //Add the full sequence of the entire record into the record.sequence + record.sequence = "TTAGAACTGAAGGCCAGCTTCACGGGCAGCATCTGCCAGTGCCTGGACACGACCATGATA +TTGGAACCCGGAACGGTCAAAGGATACATCTTTGATGCCTTTTTCCAGAGCGCGTTCAGC +GACAGCTTTACCCACAGCTGCAGCCGCGTCTTTGTTACCGGTGTACTTCAGTTGTTCAGC +GATAGCTTTTTCTACAGTAGAAGCAGCTACCAGAACTTCAGAACCGTTCGGTGCAATTAC +CTGTGCGTAAATGTGACGCGGGGTACGATGTACCACCAGGCGAGTTGCGCCCAGCTCCTG +GAGCTTGCGGCGTGCGCGGGTCGCACGACGGATACGAGCAGATTTCTTATCCATAGTGTT +ACCTTACTTCTTCTTAGCCTCTTTGGTACGCACGACTTCGTCGGCGTAACGAACACCCTT +GCCTTTATAAGGCTCAGGACGACGGTAGGCGCGCAGATCCGCTGCAACCTGGCCGATCAC +CTGCTTATCAGCGCCTTTCAGCACGATTTCAGTCTGAGTCGGACATTCAGCAGTGATACC +CGCAGGCAGCTGATGGTCAACAGGATGAGAGAAACCCAGAGACAGGTTAATCACATTGCC +TTTAACCGCTGCACGGTAACCTACACCAACCAGCTGCAGCTTCTTAGTGAAGCCTTCGGT +AACACCGATAACCATTGAGTTCAGCAGGGCACGCGCGGTACCAGCCTGTGCCCAACCGTC +TGCGTAACCATCACGCGGACCGAAGGTCAGGGTATTATCTGCATGTTTAACTTCAACAGC +ATCGTTGAGAGTACGAGTCAGCTCGCCGTTTTTACCTTTGATCGTAATAACCTGACCGTT +GATTTTTACGTCAACGCCGGCAGGAACAACGACCGGTGCTTTAGCAACACGAGACAT".to_string(); + gff_write(seq_region, vec![record], &filename, true); + return Ok(()); + } +``` diff --git a/seqmetrics/microBioRust/example.embl b/seqmetrics/microBioRust/example.embl new file mode 100644 index 0000000..db04b56 --- /dev/null +++ b/seqmetrics/microBioRust/example.embl @@ -0,0 +1,281 @@ +ID AM236082; SV 1; linear; genomic DNA; STD; PRO; 6666 BP. +XX +AC AM236082; +XX +PR Project:PRJNA344; +XX +DT 04-MAY-2006 (Rel. 87, Created) +DT 06-FEB-2015 (Rel. 123, Last updated, Version 9) +XX +DE Rhizobium leguminosarum bv. viciae plasmid pRL8 complete genome, strain +DE 3841 +XX +KW complete genome. +XX +OS Rhizobium leguminosarum bv. viciae 3841 +OC Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales; Rhizobiaceae; +OC Rhizobium/Agrobacterium group; Rhizobium. +OG Plasmid pRL8 +XX +RN [1] +RP 1-147463 +RA Crossman L.C.; +RT ; +RL Submitted (21-FEB-2006) to the INSDC. +RL Crossman L.C., Pathogen Sequencing Unit, The Wellcome Trust Sanger +RL Institute, Hinxton, Cambridge, Cambridgeshire, CB10 1SA, UNITED KINGDOM. +XX +RN [2] +RX DOI; 10.1186/gb-2006-7-4-r34. +RX PUBMED; 16640791. +RA Young J.W., Crossman L.C., Johnston A.W.B., Thomson N.R., Ghazoui Z.F., +RA Hull K.H., Wexler M., Curson A.R.J., Todd J.D., Poole P.S., Mauchline T.H., +RA East A.K., Quail M.A., Churcher C., Arrowsmith C., Cherevach A., +RA Chillingworth T., Clarke K., Cronin A., Davis P., Fraser A., Hance Z., +RA Hauser H., Jagels K., Moule S., Mungall K., Norbertczak H., +RA Rabbinowitsch E., Sanders M., Simmonds M., Whitehead S., Parkhill J.; +RT "The genome of Rhizobium leguminosarum has recognizable core and accessory +RT components"; +RL Genome Biol. 7(4):R34-R34(2006). +XX +DR MD5; 8fe097fb2b9f874c5d043fe59cea066c. +DR BioSample; SAMEA1705944. +DR EnsemblGenomes-Gn; EBG00001182864. +DR EnsemblGenomes-Gn; pRL80017. +DR EnsemblGenomes-Gn; pRL80039. +DR EnsemblGenomes-Gn; pRL80039A. +DR EnsemblGenomes-Gn; pRL80050. +DR EnsemblGenomes-Gn; pRL80051. +DR EnsemblGenomes-Gn; pRL80055. +DR EnsemblGenomes-Gn; pRL80058. +DR EnsemblGenomes-Gn; pRL80089. +DR EnsemblGenomes-Gn; pRL80091. +DR EnsemblGenomes-Gn; pRL80106. +DR EnsemblGenomes-Tr; EBT00001761573. +DR EnsemblGenomes-Tr; pRL80017. +DR EnsemblGenomes-Tr; pRL80039. +DR EnsemblGenomes-Tr; pRL80039A. +DR EnsemblGenomes-Tr; pRL80050. +DR EnsemblGenomes-Tr; pRL80051. +DR EnsemblGenomes-Tr; pRL80055. +DR EnsemblGenomes-Tr; pRL80058. +DR EnsemblGenomes-Tr; pRL80089. +DR EnsemblGenomes-Tr; pRL80091. +DR EnsemblGenomes-Tr; pRL80106. +DR RFAM; RF00490; S-element. +XX +FH Key Location/Qualifiers +FH +FT source 1..>6666 +FT /organism="Rhizobium leguminosarum bv. viciae 3841" +FT /plasmid="pRL8" +FT /strain="3841" +FT /mol_type="genomic DNA" +FT /country="United Kingdom" +FT /db_xref="taxon:216596" +FT CDS 1..1197 +FT /transl_table=11 +FT /gene="repAp8" +FT /locus_tag="pRL80001" +FT /product="replication protein RepA" +FT /db_xref="EnsemblGenomes-Gn:pRL80001" +FT /db_xref="EnsemblGenomes-Tr:CAK02801" +FT /db_xref="GOA:Q1M9K5" +FT /db_xref="InterPro:IPR000551" +FT /db_xref="InterPro:IPR017818" +FT /db_xref="InterPro:IPR025669" +FT /db_xref="InterPro:IPR027417" +FT /db_xref="UniProtKB/TrEMBL:Q1M9K5" +FT /protein_id="CAK02801.1" +FT /translation="MENPAQLQKAIHKLIAAHARDLSGALHEHRVKLYPPEARKTLRSF +FT SSIEAAKLIGVNDGYLRHLSLEGKGPQPEIGNNNRRSYSVETIQALREYLDENGKGDRR +FT YSPRRSGREHLQVITAVNFKGGSGKTTTAAHLAQYLALNGYRVLAIDLDPQASMSALHG +FT FQPEFDVGDNETLYGAVRYDEERRPLKDIIKKTYFANLDLVPGNLELMEFEHDTAKVLG +FT SNDRKNIFFTRMDDAIASVADDYDVVVVDCPPQLGFLTISALCAATAVLVTVHPQMLDV +FT MSMCQFLLMTSELLSVVADAGGSMNYDWMRYLVTRYEPGDGPQNQMVSFMRTMFGDHVL +FT NHPMLKSTAISDAGITKQTLYEVSRDQFTRATYDRAMESLDNVNSEIEQLIQSSWGRK" +FT misc_feature 1..6666 +FT /colour=12 +FT CDS 1321..2280 +FT /transl_table=11 +FT /gene="repBp8" +FT /locus_tag="pRL80002" +FT /product="replication protein RepB" +FT /db_xref="EnsemblGenomes-Gn:pRL80002" +FT /db_xref="EnsemblGenomes-Tr:CAK02802" +FT /db_xref="GOA:Q1M9K4" +FT /db_xref="InterPro:IPR003115" +FT /db_xref="InterPro:IPR004437" +FT /db_xref="InterPro:IPR011111" +FT /db_xref="InterPro:IPR017819" +FT /db_xref="InterPro:IPR036086" +FT /db_xref="InterPro:IPR037972" +FT /db_xref="UniProtKB/TrEMBL:Q1M9K4" +FT /protein_id="CAK02802.1" +FT /translation="MARKHLLSDLKAPASSSTEFDEARAADVPTPQYAPRGAIGAVSRS +FT IEALKSQGLSELDPELIDAPSVTDRLDEDGAQFEEFARNIRENGQQVPILVRPHPTVEG +FT RYQIAYGRRRLRAVKAAGLKVKAAIRNLTDDELVLAQGQENSARQDLSFIERALYAAQL +FT EASGYQRPVIMAALAVDKSNLSRLIQAATQLPDDVIRLIGAAPKTGRDRWYELSSRLAA +FT EGAAEKARALLSTSEVGSLGSDERFVRVFDAVAPKKSKKEKVQADVWQADDGVKAASFR +FT QDKRTLTLMIDKKAAPEFGEYLMSALPEIYASFKKSKQ" +FT CDS 2455..3672 +FT /transl_table=11 +FT /gene="repCp8" +FT /locus_tag="pRL80003" +FT /product="replication RepC protein" +FT /db_xref="EnsemblGenomes-Gn:pRL80003" +FT /db_xref="EnsemblGenomes-Tr:CAK02803" +FT /db_xref="InterPro:IPR005090" +FT /db_xref="InterPro:IPR021760" +FT /db_xref="UniProtKB/TrEMBL:Q1M9K3" +FT /protein_id="CAK02803.1" +FT /translation="METGYITTPFGRRPMTLALVKRQVKTEQAIADGSVDKWRVFRDIS +FT DARSRLGLQDRALAVLNALLTFFPVAELSNERNLVVFPSNAQLSARTNGIAGTTLRKCL +FT GSLVEAGVIIRKDSPNGKRYARKGKEGNIEDAYGFSLAPLLARAGEFASLAQDVAAEQR +FT RFRITKDRLTIVRRDVRKLITVGMEENLAGDWIAAETCFVEIVGRFVRHPTLQDLISSL +FT DEMSLLHEEVSRMLEIKEETAKSDGNAIPDGCHIQNSNTESCHELEPRSEKKQGEKSEP +FT NKKTERKDEPEAFPLSMVLRACPEINAFGPGGSIGSWREMMSAAVTVRSMLGVSPSAYQ +FT EACEVMGQAGAAIAIACIYQRGGHINSAGGYLRDLTGKARRGEFSLGPMLFTQLRANSG +FT TVKASA" +FT CDS 3811..6666 +FT /transl_table=11 +FT /locus_tag="pRL80004" +FT /product="hypothetical protein" +FT /note="no significant database hits" +FT /db_xref="EnsemblGenomes-Gn:pRL80004" +FT /db_xref="EnsemblGenomes-Tr:CAK02804" +FT /db_xref="InterPro:IPR003593" +FT /db_xref="InterPro:IPR027417" +FT /db_xref="UniProtKB/TrEMBL:Q1M9K2" +FT /protein_id="CAK02804.1" +FT /translation="MTEIVLPTENTIIAAAKKLDAAASQLVAETFFAIRHGMSINPIGR +FT NPDGQTIKGYPDITGRVPGEKKYLIEVTKDDWRTHLQSDLSKLSRLQKGAYAGFLLLCF +FT RKSESELTQSNRKKARETVQQAESRIEKLLGVQAGQVEFVFLGEFAREVRSAKYHRVLL +FT ALGLELVPAPFYTDLRFVQGLADFVPTAEEYEAESVVPRDEVSRTYERVFKNRLTLIEG +FT EGGSGKTSLALAVATEHRKQGEIFLFLDASVADWKSGSERARLVDVAAMFAESNVLIIL +FT DNVHLGDASGISELITNVQASGYDFRFLMTTRSSDEVEQWKRLGNIELLRRVPSGADVN +FT SAYHRLLTQKFPGSSFNDIPPAVTTRWSNQIPNLVILTLALEGLTKRGGYDRDWAIKVE +FT DAGTYLQAKFISKLSSDDVKQVGKIAALSLLEIPTSLRSLDHRVPKSAVDLGFVRLNSS +FT STTQRYELVHHELGKLITSFKDPDIKARLGEVMSADPFQATYIGLKLIGNGEASLAKEL +FT LSSVLSQSLTLSPDFSMGNSGGVFGILVQSNVTTYPEIERILLPDIGAFFDTKPDIVTG +FT LSSFLGAASENMERVYNAIVEKLAEQETIRRIEELLPSVGPTTFATLYRCANSRNLPFL +FT STLRKYLNRGKRIDSFAYRCRSESPSKVEICWGLIDEFFPHHKARFEVVLRSALAEGYI +FT ERLIPEELIESRSSRAVQTAIRCANSEVFKRYITFRDCSDATLLLLAHTMHDMGRNDLS +FT EVAADRVAGRTTSSIWYHRRTGGRALLTILRRASISAEGDVQKILMRLEAEGKMRAIVN +FT GMRPYRLANFIFVIWDRHEQFTSFISKTDLQEITNRRFKARAAEFSEERQASIYIAGIY +FT ALVGLDIPRDEWSAVDVTEDDFIGNQNNPVFWIGLKALEENGMIRLAHRSRFPTSVAAL +FT DTHSENTSRIMNDLKNWAATR" +SQ Sequence 6666 BP; 1576 A; 1743 C; 1876 G; 1471 T; 0 other; + gtggagaatc ccgctcagct tcagaaggct attcataaac tgatagcggc ccacgcgcga 60 + gatctctcgg gcgcgcttca cgagcatcgt gtgaagcttt atccgcctga agctcgaaag 120 + acgcttcggt cattttcgtc gatagaggct gcgaagctca ttggcgtcaa cgatggctat 180 + ctccgccatc tttcgctcga gggtaagggg ccgcagcctg agatcggaaa taacaatcgc 240 + cgttcgtatt cggtcgagac tattcaggcg ctccgcgagt atctcgacga gaacggcaag 300 + ggtgaccgtc ggtactcacc acgccggagc ggtcgtgagc atttgcaggt tataaccgca 360 + gtgaacttca agggaggcag cggtaagacc acgacggctg ctcatcttgc tcagtatctt 420 + gcgcttaatg gataccgggt tcttgcgatt gatcttgatc cgcaggccag catgtccgct 480 + ttgcacggat tccagcctga gtttgacgtt ggcgacaacg aaacgctcta cggcgccgtt 540 + cgttatgatg aagagcggcg cccgctgaag gatataatca agaaaaccta ctttgcgaac 600 + cttgatctcg ttccgggcaa cctcgagctt atggaattcg agcacgacac cgctaaagtg 660 + ctcggctcta acgaccgcaa gaacatcttc ttcacgcgaa tggatgacgc aatcgcgtca 720 + gtggcggacg actatgacgt tgtcgtcgtc gactgccctc cccagctcgg ctttctgacg 780 + atctcggctc tatgcgcggc aaccgccgtt cttgttactg tacatcctca gatgctcgat 840 + gtgatgtcga tgtgccagtt tctgctgatg acctcagaac ttctgagcgt cgttgcggat 900 + gctggcggga gcatgaacta cgattggatg cgttatctcg ttacgcgcta cgagccggga 960 + gacggaccgc aaaaccagat ggtgtcgttc atgcgcacga tgtttggcga ccatgtcctg 1020 + aaccacccga tgctcaagag cacagccatt tcagacgcgg ggattactaa gcagactctc 1080 + tatgaggtga gccgcgacca gttcacgcga gcaacatacg accgagccat ggaatcgctc 1140 + gacaacgtga acagcgaaat cgaacaactc attcaatcat cttggggtcg caaatgatgg 1200 + ctctagagat ctcagaaaac gcgacattga tggagaagtt gccagccgga aacttttcgg 1260 + aatttgcact ctctatgtcg aggaatccgg cttgtcacga gtacctcagg ggaaagcaag 1320 + atggctagaa aacacctcct ttcagatttg aaagctcctg cttcatcatc tacggagttc 1380 + gatgaagcta gggctgcaga cgtccctact ccgcagtatg cgcctcgagg tgcaatcggt 1440 + gccgtctcgc gatcgattga agctttgaag tcgcagggac tgagtgaact cgatcccgaa 1500 + ctgatagatg cgccgtccgt tactgatcgc cttgatgagg atggggctca gtttgaggag 1560 + ttcgctcgca acatccgtga gaatgggcag caggttccga ttcttgtccg gcctcacccg 1620 + accgtggaag gacggtatca gattgcctac ggccggagac ggttgagagc ggtcaaggcg 1680 + gccggcctca aggtcaaagc cgcaatcaga aatctgacag atgacgagct tgtactggcg 1740 + caaggtcagg aaaacagcgc gcgtcaggat ctgtcgttta tcgagcgggc gctctatgca 1800 + gcccagctcg aagcgagtgg ctaccagcgt cccgtcatca tggcagcgct ggctgtcgac 1860 + aaaagtaacc tttcgcggtt gattcaggct gcgacccaat tgccggacga cgtcatccga 1920 + ctaattggtg ctgcgcctaa gaccggccgt gatcgctggt acgagctatc atcgcggttg 1980 + gctgcagaag gtgctgcgga gaaggcgcgc gctcttcttt cgactagcga ggttggctcc 2040 + ctgggttctg atgagcgatt tgttcgcgtt ttcgacgcgg ttgcgccgaa gaaatctaag 2100 + aaggaaaaag ttcaggcgga tgtctggcaa gctgacgatg gggtcaaggc tgcgagtttc 2160 + cgccaggaca aacgaacact gacattgatg atcgacaaga aggcagcgcc ggaattcggt 2220 + gagtacctga tgtcggctct ccccgagatc tacgcttcgt tcaagaagtc gaagcaatag 2280 + atgagtcgta acgaagaaag gtgccgatag cgcaaagaaa aagccctccg aaacggtgtt 2340 + ccagaaggcc tctctcagtt tggtcgctta gagaatcgca tttcccggaa tcacagtcaa 2400 + gagtcaacgc cacaccggcg tagccttttc tttgccttgc gaaaggtgaa ggacatggaa 2460 + acgggttata tcacgacgcc ctttgggcgg cggccgatga cgcttgctct ggtgaagcgt 2520 + caggttaaga ccgagcaggc aatagcggat ggctcggtcg acaagtggcg cgtgtttcgc 2580 + gacataagcg acgcccgctc acgccttggc cttcaagatc gagccttggc ggtcttgaat 2640 + gcacttttaa cattcttccc agttgctgaa ctcagcaatg agaggaacct ggtcgtcttt 2700 + ccatcaaatg ctcagctatc agcccgcaca aacggtatcg ctgggacaac tctgcgcaag 2760 + tgcctcggtt cgctggtgga ggccggtgta atcatccgca aggatagccc taacggtaag 2820 + cgatatgctc gaaaaggcaa agaaggaaac atagaggacg cctacggctt cagtctggca 2880 + ccgcttcttg cgcgcgccgg cgagtttgct agcctcgccc aagacgtggc tgctgaacag 2940 + cgccgcttcc gcatcacgaa agaccgcctc acgatcgttc ggcgagatgt ccgcaagctg 3000 + atcaccgtcg ggatggaaga gaaccttgcc ggcgattgga ttgccgcgga aacgtgcttt 3060 + gtcgagattg tgggaaggtt cgttcggcac ccgacgctcc aggacctgat ttcgagcctc 3120 + gacgagatga gccttcttca cgaagaagtc tccaggatgc tggaaattaa agaagaaacc 3180 + gcaaaaagtg atggcaatgc catcccggac ggatgccaca tacagaattc aaataccgaa 3240 + tcctgccatg aacttgaacc ccgctccgaa aagaagcagg gcgaaaagtc cgagccaaac 3300 + aagaaaacgg agcggaaaga cgaaccggaa gcgtttccgt tgtccatggt gttgcgtgcc 3360 + tgcccggaga tcaacgcatt tggccctggt ggatcgattg gaagctggcg cgaaatgatg 3420 + tcagcggcgg taacggttcg gtccatgctt ggcgtcagcc cctctgccta tcaggaggca 3480 + tgcgaggtga tggggcaggc cggagcggcg atagcaatag cttgcattta ccagcgtggc 3540 + gggcacatca actcggcggg gggatatctt cgggatctaa cggggaaggc gcggcgaggg 3600 + gagttttcac ttgggccaat gctgtttacg caattgcggg cgaactcggg caccgtcaag 3660 + gcgtcagcgt aggtcaaagt atcatgattg tttagcctaa ccggttgaac taattaacct 3720 + attttgacta gtttccggct ggcaacttta tctcgatcta aagcgtcgag tgaatggcag 3780 + aagataatct tcctgatggg cgtccgtata atgaccgaaa ttgtgcttcc gaccgaaaac 3840 + acgatcatcg cggcagccaa aaaacttgac gcggccgcat cgcagctggt ggcagagacg 3900 + ttctttgcca ttcggcatgg gatgtcaatc aatccaattg gtcgcaaccc ggatgggcag 3960 + accatcaagg gataccctga cattactggg cgggtgccgg gtgagaagaa gtacctgatc 4020 + gaagtcacga aggacgactg gcgcacacat cttcagagcg atctatcaaa actgtcccgc 4080 + ctgcagaaag gagcctacgc gggtttccta cttctctgct tccgaaagtc cgagtccgaa 4140 + ctcactcaaa gcaacaggaa gaaggcacgg gaaaccgtcc agcaggccga gagccggatt 4200 + gaaaagcttt tgggtgtcca ggcaggacag gtagaattcg tctttcttgg cgagttcgcg 4260 + cgtgaggtca gatcggcgaa ataccaccgc gtattgctgg ctctgggtct cgagcttgtg 4320 + ccagcgccat tctacacgga tttgcgcttc gtgcagggct tagccgattt cgtaccgacc 4380 + gctgaggaat atgaggctga gagtgttgtt cctcgcgatg aggtaagccg gacctatgag 4440 + cgggtcttca aaaacagact aacgttgatc gaaggcgagg gcggtagcgg caaaacaagc 4500 + ctggccctag ccgttgcgac ggagcatcgg aagcaaggcg agatctttct gttcttagac 4560 + gcctctgtcg ctgactggaa gagcggttcg gagcgagctc gcctcgttga cgtagcggcg 4620 + atgttcgcgg aatcgaatgt cctgattata ttggacaacg tacatctggg cgatgcgtcc 4680 + ggcatttctg aactgattac aaatgtccag gcgtccggtt atgatttccg ctttttgatg 4740 + acgacgcgca gcagcgacga agttgaacaa tggaagcgcc tgggaaatat cgagcttctc 4800 + cgcagagttc cgtctggagc cgatgtcaac tctgcctatc accgcctgct cactcaaaag 4860 + tttcccggaa gcagtttcaa cgatattccc ccagcggtga ccacacgatg gtcaaatcaa 4920 + attcccaatc tggttattct cacgcttgct cttgaaggtc tcacaaagag aggcggctat 4980 + gatcgcgatt gggcgatcaa ggttgaggac gcaggcacat accttcaagc taagttcatc 5040 + tcgaagctgt cgtccgacga cgtcaaacag gtgggcaaga tcgctgcgct ctcacttctg 5100 + gaaattccca cctcgctcag gtcgctcgac caccgggttc caaagtctgc tgtggatctg 5160 + ggcttcgttc gtctgaactc gagttcaaca actcagcgat atgagctcgt tcaccacgaa 5220 + ctgggcaagc tgatcacgtc cttcaaagat ccggatatca aggcgcggct gggagaggtg 5280 + atgtccgctg atcccttcca ggcaacatat atcgggctga agcttatcgg aaacggagaa 5340 + gccagcctgg caaaggaatt gttgtcgtca gtcctttctc aatcactcac actctcgcca 5400 + gatttctcga tgggaaactc cggcggagtc ttcggtatcc tggtccagtc caacgtgact 5460 + acctatcccg aaattgagcg tatccttctt cctgatatcg gcgccttttt cgatacaaag 5520 + ccggatattg taaccggcct tagctccttc ctcggggctg cctccgaaaa catggagcgc 5580 + gtatacaatg ccattgtgga aaaacttgcc gaacaggaaa cgattcgacg gatcgaagag 5640 + cttctcccat ccgtcggccc gacgactttc gcgacacttt accgatgcgc gaactcacgg 5700 + aacctcccgt ttctttcaac gcttcgaaaa tatctcaaca gagggaagcg tatagattcc 5760 + tttgcctatc gatgcaggtc tgaaagtccg agtaaggtcg agatctgctg gggcctgatt 5820 + gatgagttct ttccacacca caaggcccgg tttgaagttg tgcttcgctc tgccctcgcc 5880 + gagggataca tcgagcgcct tatcccggaa gagcttattg agtctcgctc ttcaagggct 5940 + gttcagacgg cgatccgatg cgcaaatagc gaagttttca aacggtacat cacgttccgt 6000 + gactgcagcg acgcgacgct gttgcttctg gcccacacga tgcacgacat gggcaggaat 6060 + gatctctcgg aggtcgcagc tgaccgagtt gcaggcagga cgacctcttc aatctggtat 6120 + catcgtcgca ccggtggcag ggcgttgctg actattttgc ggagagcatc gatatctgca 6180 + gaaggagatg ttcagaaaat tctgatgcgg cttgaggctg aaggaaaaat gagggccatt 6240 + gtgaatggaa tgcggcctta tcgcctagcg aattttattt tcgtgatctg ggatcggcac 6300 + gagcaattta cttcattcat ctcgaagaca gatcttcagg aaattacaaa ccgccggttc 6360 + aaagcgcgag cggcagagtt ctctgaagag cgacaagcgt ccatctacat tgcaggaatc 6420 + tatgcgctgg taggcctcga cataccgcgg gacgagtgga gcgcggtcga cgtcactgaa 6480 + gacgatttca ttggaaacca gaacaacccg gtcttctgga tcggtctcaa ggctctggaa 6540 + gaaaatggca tgatacgcct tgcccatcga agcagatttc cgacatctgt cgcggcgcta 6600 + gatactcatt cggaaaacac cagccggatc atgaacgatt tgaaaaactg ggctgcgacc 6660 + aggtaa 6666 +// diff --git a/seqmetrics/microBioRust/rhizexample.gbk b/seqmetrics/microBioRust/rhizexample.gbk new file mode 100644 index 0000000..29e7c40 --- /dev/null +++ b/seqmetrics/microBioRust/rhizexample.gbk @@ -0,0 +1,5439 @@ +LOCUS AM236082 147463 bp DNA circular BCT 14-JUL-2016 +DEFINITION Rhizobium leguminosarum bv. viciae plasmid pRL8 complete genome, + strain 3841. +ACCESSION AM236082 +VERSION AM236082.1 +DBLINK BioProject: PRJNA344 + BioSample: SAMEA1705944 +KEYWORDS complete genome. +SOURCE Rhizobium johnstonii 3841 + ORGANISM Rhizobium johnstonii 3841 + Bacteria; Pseudomonadati; Pseudomonadota; Alphaproteobacteria; + Hyphomicrobiales; Rhizobiaceae; Rhizobium/Agrobacterium group; + Rhizobium; Rhizobium johnstonii. +REFERENCE 1 + AUTHORS Young,J.P., Crossman,L.C., Johnston,A.W., Thomson,N.R., + Ghazoui,Z.F., Hull,K.H., Wexler,M., Curson,A.R., Todd,J.D., + Poole,P.S., Mauchline,T.H., East,A.K., Quail,M.A., Churcher,C., + Arrowsmith,C., Cherevach,I., Chillingworth,T., Clarke,K., + Cronin,A., Davis,P., Fraser,A., Hance,Z., Hauser,H., Jagels,K., + Moule,S., Mungall,K., Norbertczak,H., Rabbinowitsch,E., Sanders,M., + Simmonds,M., Whitehead,S. and Parkhill,J. + TITLE The genome of Rhizobium leguminosarum has recognizable core and + accessory components + JOURNAL Genome Biol. 7 (4), R34 (2006) + PUBMED 16640791 +REFERENCE 2 (bases 1 to 147463) + AUTHORS Crossman,L.C. + TITLE Direct Submission + JOURNAL Submitted (21-FEB-2006) Crossman L.C., Pathogen Sequencing Unit, + The Wellcome Trust Sanger Institute, Hinxton, Cambridge, + Cambridgeshire, CB10 1SA, UNITED KINGDOM +FEATURES Location/Qualifiers + source 1..147463 + /organism="Rhizobium johnstonii 3841" + /mol_type="genomic DNA" + /strain="3841" + /type_material="type strain of Rhizobium johnstonii" + /db_xref="taxon:216596" + /plasmid="pRL8" + /geo_loc_name="United Kingdom" + /note="biovar: viciae 3841" + gene 1..1197 + /gene="repAp8" + /locus_tag="pRL80001" + CDS 1..1197 + /gene="repAp8" + /locus_tag="pRL80001" + /codon_start=1 + /transl_table=11 + /product="replication protein RepA" + /protein_id="CAK02801.1" + /db_xref="EnsemblGenomes-Gn:pRL80001" + /db_xref="EnsemblGenomes-Tr:CAK02801" + /db_xref="InterPro:IPR002586" + /db_xref="InterPro:IPR017818" + /db_xref="InterPro:IPR027417" + /db_xref="UniProtKB/TrEMBL:Q1M9K5" + /translation="MENPAQLQKAIHKLIAAHARDLSGALHEHRVKLYPPEARKTLRS + FSSIEAAKLIGVNDGYLRHLSLEGKGPQPEIGNNNRRSYSVETIQALREYLDENGKGD + RRYSPRRSGREHLQVITAVNFKGGSGKTTTAAHLAQYLALNGYRVLAIDLDPQASMSA + LHGFQPEFDVGDNETLYGAVRYDEERRPLKDIIKKTYFANLDLVPGNLELMEFEHDTA + KVLGSNDRKNIFFTRMDDAIASVADDYDVVVVDCPPQLGFLTISALCAATAVLVTVHP + QMLDVMSMCQFLLMTSELLSVVADAGGSMNYDWMRYLVTRYEPGDGPQNQMVSFMRTM + FGDHVLNHPMLKSTAISDAGITKQTLYEVSRDQFTRATYDRAMESLDNVNSEIEQLIQ + SSWGRK" + Source 1..147463 + /id="source_1" + gene 1321..2280 + /gene="repBp8" + /locus_tag="pRL80002" + CDS 1321..2280 + /gene="repBp8" + /locus_tag="pRL80002" + /codon_start=1 + /transl_table=11 + /product="replication protein RepB" + /protein_id="CAK02802.1" + /db_xref="EnsemblGenomes-Gn:pRL80002" + /db_xref="EnsemblGenomes-Tr:CAK02802" + /db_xref="GOA:Q1M9K4" + /db_xref="InterPro:IPR003115" + /db_xref="InterPro:IPR004437" + /db_xref="InterPro:IPR011111" + /db_xref="InterPro:IPR017819" + /db_xref="UniProtKB/TrEMBL:Q1M9K4" + /translation="MARKHLLSDLKAPASSSTEFDEARAADVPTPQYAPRGAIGAVSR + SIEALKSQGLSELDPELIDAPSVTDRLDEDGAQFEEFARNIRENGQQVPILVRPHPTV + EGRYQIAYGRRRLRAVKAAGLKVKAAIRNLTDDELVLAQGQENSARQDLSFIERALYA + AQLEASGYQRPVIMAALAVDKSNLSRLIQAATQLPDDVIRLIGAAPKTGRDRWYELSS + RLAAEGAAEKARALLSTSEVGSLGSDERFVRVFDAVAPKKSKKEKVQADVWQADDGVK + AASFRQDKRTLTLMIDKKAAPEFGEYLMSALPEIYASFKKSKQ" + gene 2455..3672 + /gene="repCp8" + /locus_tag="pRL80003" + CDS 2455..3672 + /gene="repCp8" + /locus_tag="pRL80003" + /codon_start=1 + /transl_table=11 + /product="replication RepC protein" + /protein_id="CAK02803.1" + /db_xref="EnsemblGenomes-Gn:pRL80003" + /db_xref="EnsemblGenomes-Tr:CAK02803" + /db_xref="InterPro:IPR005090" + /db_xref="InterPro:IPR021760" + /db_xref="UniProtKB/TrEMBL:Q1M9K3" + /translation="METGYITTPFGRRPMTLALVKRQVKTEQAIADGSVDKWRVFRDI + SDARSRLGLQDRALAVLNALLTFFPVAELSNERNLVVFPSNAQLSARTNGIAGTTLRK + CLGSLVEAGVIIRKDSPNGKRYARKGKEGNIEDAYGFSLAPLLARAGEFASLAQDVAA + EQRRFRITKDRLTIVRRDVRKLITVGMEENLAGDWIAAETCFVEIVGRFVRHPTLQDL + ISSLDEMSLLHEEVSRMLEIKEETAKSDGNAIPDGCHIQNSNTESCHELEPRSEKKQG + EKSEPNKKTERKDEPEAFPLSMVLRACPEINAFGPGGSIGSWREMMSAAVTVRSMLGV + SPSAYQEACEVMGQAGAAIAIACIYQRGGHINSAGGYLRDLTGKARRGEFSLGPMLFT + QLRANSGTVKASA" + gene 3811..6666 + /locus_tag="pRL80004" + CDS 3811..6666 + /locus_tag="pRL80004" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="hypothetical protein" + /protein_id="CAK02804.1" + /db_xref="EnsemblGenomes-Gn:pRL80004" + /db_xref="EnsemblGenomes-Tr:CAK02804" + /db_xref="InterPro:IPR003593" + /db_xref="InterPro:IPR027417" + /db_xref="UniProtKB/TrEMBL:Q1M9K2" + /translation="MTEIVLPTENTIIAAAKKLDAAASQLVAETFFAIRHGMSINPIG + RNPDGQTIKGYPDITGRVPGEKKYLIEVTKDDWRTHLQSDLSKLSRLQKGAYAGFLLL + CFRKSESELTQSNRKKARETVQQAESRIEKLLGVQAGQVEFVFLGEFAREVRSAKYHR + VLLALGLELVPAPFYTDLRFVQGLADFVPTAEEYEAESVVPRDEVSRTYERVFKNRLT + LIEGEGGSGKTSLALAVATEHRKQGEIFLFLDASVADWKSGSERARLVDVAAMFAESN + VLIILDNVHLGDASGISELITNVQASGYDFRFLMTTRSSDEVEQWKRLGNIELLRRVP + SGADVNSAYHRLLTQKFPGSSFNDIPPAVTTRWSNQIPNLVILTLALEGLTKRGGYDR + DWAIKVEDAGTYLQAKFISKLSSDDVKQVGKIAALSLLEIPTSLRSLDHRVPKSAVDL + GFVRLNSSSTTQRYELVHHELGKLITSFKDPDIKARLGEVMSADPFQATYIGLKLIGN + GEASLAKELLSSVLSQSLTLSPDFSMGNSGGVFGILVQSNVTTYPEIERILLPDIGAF + FDTKPDIVTGLSSFLGAASENMERVYNAIVEKLAEQETIRRIEELLPSVGPTTFATLY + RCANSRNLPFLSTLRKYLNRGKRIDSFAYRCRSESPSKVEICWGLIDEFFPHHKARFE + VVLRSALAEGYIERLIPEELIESRSSRAVQTAIRCANSEVFKRYITFRDCSDATLLLL + AHTMHDMGRNDLSEVAADRVAGRTTSSIWYHRRTGGRALLTILRRASISAEGDVQKIL + MRLEAEGKMRAIVNGMRPYRLANFIFVIWDRHEQFTSFISKTDLQEITNRRFKARAAE + FSEERQASIYIAGIYALVGLDIPRDEWSAVDVTEDDFIGNQNNPVFWIGLKALEENGM + IRLAHRSRFPTSVAALDTHSENTSRIMNDLKNWAATR" + gene complement(6817..8418) + /locus_tag="pRL80005" + CDS complement(6817..8418) + /locus_tag="pRL80005" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="hypothetical protein" + /protein_id="CAK02805.1" + /db_xref="EnsemblGenomes-Gn:pRL80005" + /db_xref="EnsemblGenomes-Tr:CAK02805" + /db_xref="UniProtKB/TrEMBL:Q1M9K1" + /translation="MPVQEAYFHLQDVPYLQAGSWVPLRQITFSTIASDDPDILRLEE + WIGIGSAAVMAERRKDVEALAWDDLGIQPHRSVVENDGYRSADVFRNWQGGDLGVNLV + IVQLIDEVDHEIWHLHTDLVVALHLVREGDVWKRPEEDWIDVARLKRDSEGKPTVLEI + RKEYLGDYLAARGMDLYCSSYRERTMITATKPSYAFEDTPSETRGRDSWDRTTNEAGY + PFPAGSFFTRGAIWRTEWVQSGGQSVRVRGDKDNHAAAFALDTDGNRVLGPQLIGSST + WLFFNPSVADALLSRRGSKLHWYSAETGALGANEPVHFGLNNLGLITVFAKDIGLLPQ + WEQRLWSAYNVTPEGGVSSELFAAQMMVQPAATVAPEREMPSVLEAVEAAFKTKYGRS + LLRENEAVPSLLRRLHRFQAVKTDGILELAKDLTRLFIERIDADAIASQLTLSKNEKR + PGSLKLLERLLSTRLTAAEAGNLMSPLFGIYDLRLADAHLGSARIESGRTRTGIDETL + PAPMQGRQLLHSFNETLKKVNAALV" + gene 8597..9706 + /locus_tag="pRL80006" + CDS 8597..9706 + /locus_tag="pRL80006" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="hypothetical protein" + /protein_id="CAK02806.1" + /db_xref="EnsemblGenomes-Gn:pRL80006" + /db_xref="EnsemblGenomes-Tr:CAK02806" + /db_xref="UniProtKB/TrEMBL:Q1M9K0" + /translation="MMAVHSIAYRIFPASACPPALLHLSPAGHLCEVDTDQVGLSALS + EIIQTAPGSNPLPLLFDELLSPGTAQLVWKGSGPGRGVEIRRAFSGLFGRFFARAYLE + RYYGFTWFSPISGSPYNLSNRLRVVRQPGREFDLPDWIMAGPGVLAIGEAKGSHAKGP + APTSGLPGPLRTAYEQISRVWVQKVDPAGVWVNRQVKGWGVMSRWGVESPARRAYHCV + LDPDTEGEPLSGEELEEAIQDVARSHVALLLDGLGRPDLVDKRASPGFSPQQVSATIE + GLGERTFIGGIVNNFGFLPMSIDDARAVQASLPERLRPTVRFLGLETDVVEQYRSGSV + IKAQPFRIDASGPSLSSDGMMLAPLERIDPVPSTI" + gene 9775..10689 + /locus_tag="pRL80007" + CDS 9775..10689 + /locus_tag="pRL80007" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="hypothetical protein" + /protein_id="CAK02807.1" + /db_xref="EnsemblGenomes-Gn:pRL80007" + /db_xref="EnsemblGenomes-Tr:CAK02807" + /db_xref="UniProtKB/TrEMBL:Q1M9J9" + /translation="MKKLFKQLGLHKIEVCLKLQATEIIQLERKSELLPDFDFVSYGG + LSEKDYPIWHEFEFGNEFLFIHRLRMSDYGYFVDDAPEDDDNFELPKYTSGRAGYGEA + GILKLELHTVDRFGCRAMLRGLLSGSSIEMNMDVRFKFIAAGYREGETRLRHAAEAIA + EGWAFEEEGKLKQAFFSYYAALDSFIDAERIKLNGGLDDDEIDEEIHENIIAPDIRLN + EKLRHVVKRNLPTNLNGLNGLRIWGEVFGRFNRITETRNAIAHNTKIAVITHDDVNVC + FSTLAIIIAIVQEQCFDEPAILECYGLS" + gene complement(10741..11883) + /locus_tag="pRL80008" + CDS complement(10741..11883) + /locus_tag="pRL80008" + /note="N-terminus is truncated relative to the A. + tumefaciens and R. etli putative integrase matches" + /codon_start=1 + /transl_table=11 + /product="putative phage integrase" + /protein_id="CAK02808.1" + /db_xref="EnsemblGenomes-Gn:pRL80008" + /db_xref="EnsemblGenomes-Tr:CAK02808" + /db_xref="GOA:Q1M9J8" + /db_xref="InterPro:IPR002104" + /db_xref="InterPro:IPR010998" + /db_xref="InterPro:IPR011010" + /db_xref="InterPro:IPR013762" + /db_xref="InterPro:IPR023109" + /db_xref="UniProtKB/TrEMBL:Q1M9J8" + /translation="MPPLQKSAVDRRAEELDTIAAVLPLERRDELAELLTDHDVETLR + HLVNQGMGDNTLRALTSDLTYLEAWGLAATGRSLPWPAPEALLLKFVAHHLWDPRHRE + TDADHGMPADVDESLRSQGFLKSVGPHAPATVRRRLANWSTLTKWRGLDGAFASPALK + SAIRLAIRAAPRQRLRKSAKAVTGDVLARLLATCATDSLRDLRDRAILMVAFASGGRR + RSEIAGLRREQLTVEPPIPVEGGSPLPSLAIHLGRTKTTSGDEDDVVYLTGRPVDALN + AWMVAAKIDSGSVFRAIGRWGTVSRRAIDPQSVNAIIKQRVELAGLERGEFSAHGIRS + GYLTEAANRGIPLPEAMEQSRHRSVQQASSYYNNATRRSGRAARML" + gene 12082..13221 + /locus_tag="pRL80009" + CDS 12082..13221 + /locus_tag="pRL80009" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02809.1" + /db_xref="EnsemblGenomes-Gn:pRL80009" + /db_xref="EnsemblGenomes-Tr:CAK02809" + /db_xref="InterPro:IPR011670" + /db_xref="InterPro:IPR021068" + /db_xref="UniProtKB/TrEMBL:Q1M9J7" + /translation="MAYDLAKISMTALMQPAFDAAIALTRLDERIARSPVGAGWIERT + HFADACASLWVDGELVHLEDLVLHDATRDIRTPTHELTIARDVLRTRRRIAAQSPDWA + LSTEGIRNLRQTSDSNPAGAEAGQPSDVIRPAVAIDPEGEGDDFDDIENLPGVDYAAI + DAVLARSEAAIESATRPGDAGGNRAAEKDPMIYDLDWDEDERLEEWRTVLRQTENLPA + VFRAIVALDAWNEIAVLQHSPWLGRLFSASILRQAGATSGAHLAAVNLGLKTIPVDRR + RHRDRETRLLAIAHGFLATAEIGMKEHDRLALAKKMMERKLEGRRTSSKLPDLVELVM + AKPLVSAGMVAKTLDVTPQAARRIVLELGLREMTGRGRFRAWGII" + gene complement(13298..13591) + /locus_tag="pRL80010" + CDS complement(13298..13591) + /locus_tag="pRL80010" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="hypothetical protein" + /protein_id="CAK02810.1" + /db_xref="EnsemblGenomes-Gn:pRL80010" + /db_xref="EnsemblGenomes-Tr:CAK02810" + /db_xref="UniProtKB/TrEMBL:Q1M9J6" + /translation="MRRPLHQRREEMQFEGALVREQGVTFAIVIVKPHVINSGPQADE + VAYSFQPAFPGVPIVLMAQNSHGVPTYRGRRDLVDFLSRVPTQAIPWKRFTLN" + gene complement(13610..13834) + /locus_tag="pRL80011" + CDS complement(13610..13834) + /locus_tag="pRL80011" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02811.1" + /db_xref="EnsemblGenomes-Gn:pRL80011" + /db_xref="EnsemblGenomes-Tr:CAK02811" + /db_xref="InterPro:IPR018691" + /db_xref="UniProtKB/TrEMBL:Q1M9J5" + /translation="MTNKNQHVVPHNGEWAVRGAGNQRVTSTHGTQADAAAAARKIAI + NQQSEVVIHRPNGQIRDKDSYGRDPFPPRG" + gene 13951..15075 + /locus_tag="pRL80012" + CDS 13951..15075 + /locus_tag="pRL80012" + /codon_start=1 + /transl_table=11 + /product="putative AAA-family ATPase protein" + /protein_id="CAK02812.1" + /db_xref="EnsemblGenomes-Gn:pRL80012" + /db_xref="EnsemblGenomes-Tr:CAK02812" + /db_xref="GOA:Q1M9J4" + /db_xref="InterPro:IPR003593" + /db_xref="InterPro:IPR003959" + /db_xref="InterPro:IPR027417" + /db_xref="UniProtKB/TrEMBL:Q1M9J4" + /translation="MTIEMKNVLELADAALSADYTRVRRAANALARDLDKNGETSIAK + ELKALVRKRGVPLKASGYVESLPVDSKSRLPLVEEQTWPDTPIFLNEGGWHVFSDFIA + DARRIDDLSAKGLASRLGLLLSGPPGTGKSLLAGHIAAQLSRPLYVVRLDSVISSLLG + DTAKNIRSVFDFVPARNAVLFLDEMDAVAKLRDDRHELGELKRVVNTVIQALDGLDPS + SIVVAATNHAHLLDPAIWRRFPYKIELGLPDESVRADLWRHFLFEDKDEEGRAELFGV + VSEGLSGADIETMSLSARRHAVHESRNIDFGAVVAALLEPRSGRTVPVQRQPLDAEQK + RQVAIALKEKYAIGGADTARILGVSRQAIYAYLKQQEGEV" + gene 15080..17335 + /locus_tag="pRL80013" + CDS 15080..17335 + /locus_tag="pRL80013" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02813.1" + /db_xref="EnsemblGenomes-Gn:pRL80013" + /db_xref="EnsemblGenomes-Tr:CAK02813" + /db_xref="GOA:Q1M9J3" + /db_xref="InterPro:IPR000209" + /db_xref="UniProtKB/TrEMBL:Q1M9J3" + /translation="MEPRDQPLLYPVLSLQMDPALRSPTGRGKGIDSIVKERLGRQQD + VLASETRDIYEHRTELPTYSGLTHLVVRMFSEDSLAPTHTPDDLFSQRHGCRLVAPLP + GGYLIEADVKELPRLLHAIEHPIGYAVQADISRVSSLGQFDAKSRLRGRSVNELWNSA + PEDDDGRLFVVWLAPFRDRDAKAEVLERIQGFANERLVMPTFTSVRLTLGTSEETEEP + RSLTTPRQSSIARAMRDYRNTGVGRATVRIPNKEGLRQLIASGASYRIDPVRPIRVAA + PGEGAEPPAPVIDENAPIVAVVDGGLHARSYTAAEAFRATPFVTNAQADKPHGNSVSS + LVIHGHAWNKNRSLPELNCRIGTVQAVPHRNANRRFDERELVDYLAEVARLYPEARVW + NISANQDGAGLDPSEVSVLGHEISLLARSAGFLPVISVGNVSPDNNSRPNPPADCEAA + IVVGGRQALPDGTPGDRCPACLPGPGPDGMMKPDLSWFSNLRMLGGVVDTGSSYATPL + VSSLAAHTFDSLREPTPDLVKALLINSAERSEHDPNLGWGTPYQGHLPWTCVPGSVTL + AWRAQLEPGTAYYWNDIPIPPELVRDGKLFGRASLTAVLRPLVSPFGGANYFASRLET + SLAYQSGADKWPSLLGSMKESTLPENDARDELRKWQPIRRHCRDFSKGSGLGFSGPHL + RLYARVFMRDLYQFGWTHHSQAGAQEVAFVLTLSSADGEKSIYDSTARALGNFVESAV + LNQDIEVSNEL" + gene 17643..18071 + /locus_tag="pRL80014" + CDS 17643..18071 + /locus_tag="pRL80014" + /note="C-terminus is similar to the C-terminus of + Rhodopseudomonas palustris Duf81 rpa3560 SWALL:CAE29001 + (EMBL:BX572604) (261 aa) fasta scores: E(): 3.2e-10, + 60.93% id in 64 aa" + /codon_start=1 + /transl_table=11 + /product="putative transmembrane protein" + /protein_id="CAK02814.1" + /db_xref="EnsemblGenomes-Gn:pRL80014" + /db_xref="EnsemblGenomes-Tr:CAK02814" + /db_xref="GOA:Q1M9J2" + /db_xref="UniProtKB/TrEMBL:Q1M9J2" + /translation="MLGTAPAITLECGKCEAGDLTARGQSAWPMLTAAKAAADHCDID + TVCIDLNPFEHKEVAIFRNRLAALSPHDRNYWRGTLYSRDQSFAVIRSNRVFLLVMVA + GSVVGAFIGGQLPGIVPSAVLLPGLTLILVISAIKIWRHS" + gene complement(18236..19204) + /locus_tag="pRL80015" + CDS complement(18236..19204) + /locus_tag="pRL80015" + /codon_start=1 + /transl_table=11 + /product="putative AraC-family transcriptional regulator" + /protein_id="CAK02815.1" + /db_xref="EnsemblGenomes-Gn:pRL80015" + /db_xref="EnsemblGenomes-Tr:CAK02815" + /db_xref="GOA:Q1M9J1" + /db_xref="InterPro:IPR009057" + /db_xref="InterPro:IPR018060" + /db_xref="InterPro:IPR025628" + /db_xref="InterPro:IPR029062" + /db_xref="UniProtKB/TrEMBL:Q1M9J1" + /translation="MKIVVLAHNNVFDTGLAAVLDTLATANELAELESLRIPRFECVM + AGVRETVQTSLGLSVPVQPAATIRQPDWVIVPALGTKMPGPLLELLDGRESHDAADQL + QTWHADGARIGAACIGTFLLAESGLLAGHEATTAWWLGPLFRQQYPDVRLDERRMLVP + SGDVVTAGAAMGHLELALWLVRQTSPALADMVARYLLVDTRPSQAPYMIPMHLANADP + LIQHFERWARNRLHEGFSLDVAADALHVSKRTLQRRMEAVLGKSPLSYFQDLRVERAV + HLLRTSRKDIESIATEIGYADGVTLRTLLRRRLGRGVRELRAVYNP" + gene 19312..19623 + /locus_tag="pRL80017" + /pseudo + CDS join(19312..19383,19399..19623) + /locus_tag="pRL80017" + /pseudo + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein, pseudogene" + /db_xref="PSEUDO:CAK02816.3" + gene 19675..20304 + /locus_tag="pRL80018" + CDS 19675..20304 + /locus_tag="pRL80018" + /note="putative alternative start site at codon 9" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02817.1" + /db_xref="EnsemblGenomes-Gn:pRL80018" + /db_xref="EnsemblGenomes-Tr:CAK02817" + /db_xref="InterPro:IPR005247" + /db_xref="InterPro:IPR008914" + /db_xref="UniProtKB/TrEMBL:Q1M9J0" + /translation="MTLNVSTRLSLLVALLASATGLASAPAVAGSTDKFELSSPDIAP + GSKIDDKFVLNGFGCKGGNISPALQWKNAPAGTKSFVLQVYDPDAPTGSGFWHWTVNN + IPANVTQLTQGAGNAPANLPAGAYGGVNDFQDTGATGGNGNYGGPCPPAGDKPHRYEF + SLFALAVDDIDAAAGVPKTGTAALHGFVLNKGLGDKLLGKASFTAAYGH" + gene 20331..20705 + /locus_tag="pRL80019" + CDS 20331..20705 + /locus_tag="pRL80019" + /codon_start=1 + /transl_table=11 + /product="putative lipoprotein" + /protein_id="CAK02818.1" + /db_xref="EnsemblGenomes-Gn:pRL80019" + /db_xref="EnsemblGenomes-Tr:CAK02818" + /db_xref="InterPro:IPR005297" + /db_xref="InterPro:IPR014558" + /db_xref="UniProtKB/TrEMBL:Q1M9I9" + /translation="MKKLILVPILLASMTGVVFAATPFKTVKTEKGVVLSGEKGLTLY + TFKKDEAGASNCYDECAQNWPSAIAAGNAKANGAYSIVTRKDGTKQWAKDGKPLYYWV + KDAKQGDVTGDGVGGVWDAAKP" + gene 21088..21396 + /locus_tag="pRL80020" + CDS 21088..21396 + /locus_tag="pRL80020" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02819.1" + /db_xref="EnsemblGenomes-Gn:pRL80020" + /db_xref="EnsemblGenomes-Tr:CAK02819" + /db_xref="InterPro:IPR011008" + /db_xref="UniProtKB/TrEMBL:Q1M9I8" + /translation="MIAALVLFPVPAGTTMEQIKEAYELSAPRFTGMPGLLSKHYLFD + GAGQGGAFYVWSTRADAEALYTEEWRQSLTQRYGAPPTLSIYEVPVAIDNAAASRTFS + " + gene 21653..22129 + /gene="cutL" + /locus_tag="pRL80021" + /gene_synonym="coxG" + CDS 21653..22129 + /gene="cutL" + /locus_tag="pRL80021" + /gene_synonym="coxG" + /EC_number="1.2.99.2" + /codon_start=1 + /transl_table=11 + /product="putative carbon monoxide dehydrogenase subunit G + protein" + /protein_id="CAK02820.1" + /db_xref="EnsemblGenomes-Gn:pRL80021" + /db_xref="EnsemblGenomes-Tr:CAK02820" + /db_xref="GOA:Q1M9I7" + /db_xref="InterPro:IPR010419" + /db_xref="InterPro:IPR023393" + /db_xref="UniProtKB/TrEMBL:Q1M9I7" + /translation="MMAVLLSGNYSLPASQAEVYAALNDADVLRECIPGCEELEARAD + GIFAAVVRLELGPLKTRFRGKVRLEDLDPPNGYRIIGEGDGGIAGFAKGGAALKLAPD + GEGGTLLSYEAEANVNGKIAQLGQRLIASTSKKIADRFFETLVKRLQNETVTAEAK" + gene 22184..23116 + /locus_tag="pRL80022" + CDS 22184..23116 + /locus_tag="pRL80022" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02821.1" + /db_xref="EnsemblGenomes-Gn:pRL80022" + /db_xref="EnsemblGenomes-Tr:CAK02821" + /db_xref="InterPro:IPR029058" + /db_xref="InterPro:IPR029059" + /db_xref="UniProtKB/TrEMBL:Q1M9I6" + /translation="MPKVTELTFFSEGLKLKGLLYEPDDLKPGEKRPTVVCCHGYTGM + KDVYLLPVPERLAVHGYVAFAFDHRGFGKSEGVRARLIPPEQVEDIRNAITFVSTLPS + VDTDRIALYGTSFGGGNVVVATATDDRVRCVVSVVPVGNGERWLKSLRKHWEWLKFQD + VLAEDRRQRVLTGESRRVDVTELMPGDPHSRQVIQEKVKAAETYTQGYPLENAEATLR + WKPEDFAHAIAPRPILFMHTECDGLVPIDECYALHSKAKEPKKLVTIPNADHYDVYQF + VNPDVFEKVIAESIKWYDRYLKADAPQERIAEIA" + gene 23132..23998 + /gene="cutM" + /locus_tag="pRL80023" + /gene_synonym="coxM" + CDS 23132..23998 + /gene="cutM" + /locus_tag="pRL80023" + /gene_synonym="coxM" + /EC_number="1.2.99.2" + /codon_start=1 + /transl_table=11 + /product="putative carbon monoxide dehydrogenase subunit M + protein" + /protein_id="CAK02822.1" + /db_xref="EnsemblGenomes-Gn:pRL80023" + /db_xref="EnsemblGenomes-Tr:CAK02822" + /db_xref="GOA:Q1M9I5" + /db_xref="InterPro:IPR002346" + /db_xref="InterPro:IPR005107" + /db_xref="InterPro:IPR016166" + /db_xref="InterPro:IPR016167" + /db_xref="InterPro:IPR016169" + /db_xref="UniProtKB/TrEMBL:Q1M9I5" + /translation="MALPVFDYFAPKSIEEACAALASNPDGAKLLAGGQSILRVMKFR + IMAPELLVDVKAIPGLRYIEGDADTLRIGATSTQSDVLRNDVVRKEFPLLAEAIARIA + TTAVRNTATIVGNICVGHTASDPSAALLALDAELVVVSLEGERILPISEFFVGHMSTS + LDAAELVREVRIRRRNDKPGMAYLAHAGRAAMETPLVAAGAIVSTRNGICSSATIALA + GADETPVRISRAEEALIGCKLDDVAILKAAAIAAEDCSPDTDVYASGEYRRRLVGVYV + RDALRAAASRVA" + gene 24022..24504 + /gene="cutC" + /locus_tag="pRL80024" + /gene_synonym="coxS" + /gene_synonym="cutS" + CDS 24022..24504 + /gene="cutC" + /locus_tag="pRL80024" + /gene_synonym="coxS" + /gene_synonym="cutS" + /EC_number="1.2.99.2" + /codon_start=1 + /transl_table=11 + /product="putative iron-sulphur cluster carbon monooxide + dehydrogenase subunit S protein" + /protein_id="CAK02823.1" + /db_xref="EnsemblGenomes-Gn:pRL80024" + /db_xref="EnsemblGenomes-Tr:CAK02823" + /db_xref="GOA:Q1M9I4" + /db_xref="InterPro:IPR001041" + /db_xref="InterPro:IPR002888" + /db_xref="InterPro:IPR006058" + /db_xref="InterPro:IPR012675" + /db_xref="UniProtKB/TrEMBL:Q1M9I4" + /translation="MRKNITLVINGASHSLDVPANTLLLDLLRWEVGLTGTKEGCGEG + VCGSCTVNVNGDLVRSCLTLAVQVDGKSITTIEGMADGDTLHPLQRKFLELGAVQCGF + CSPGLIVTADALLKSNPDPTEAEVRDALRGNLCRCTGYVKIIDAVLAAASEMRSHAHE + " + gene 24497..26875 + /gene="coxL" + /locus_tag="pRL80025" + CDS 24497..26875 + /gene="coxL" + /locus_tag="pRL80025" + /codon_start=1 + /transl_table=11 + /product="putative carbon monoxide dehydrogenase subunit L + protein" + /protein_id="CAK02824.1" + /db_xref="EnsemblGenomes-Gn:pRL80025" + /db_xref="EnsemblGenomes-Tr:CAK02824" + /db_xref="GOA:Q1M9I3" + /db_xref="InterPro:IPR000674" + /db_xref="InterPro:IPR008274" + /db_xref="UniProtKB/TrEMBL:Q1M9I3" + /translation="MNNPDQEFNVIGKNVIREEGPGKVTGLGKYAIDLEFPRMLWAKI + KRSTRPHAKIINIDISRAQALPGVHAVIVDKDCPQTLFGFGCYDEPLLARGKVRYIGE + PVAAVAAESEAIAEQACDLIEIDYEDLPAIFDPWEAFEADPKVIIHEDQANYRRVPIG + PAQYDPKHPNAFGYYRIRTGEVSQGFAEADVVLEKTYSNAMMAHATMERHNSISLWDA + DGKVTAWSSAQAAYPLLNQISEALDIPHSRVRVIIPKYVGGGFGGKIEMKAEGLCAVL + SRAAGHRHIKIIYTREESLCWAGVQHPFEMRIKSGVRKDGVITACEMFVLVNGGAYAQ + HGFLVTRQASYGPLGSYRFPHFKLDNYVVYTNNPPGVAYRGFGNTQIHFGLESHIDEL + AHAIGMDPYEIRRKNVLKENEINAAGEIQHSVAGAELLDEIKAGLERHGPLQREDGPW + RRGRGIAFANKDSVAPSASSAIVKIHNDETVEIRHSAGNIGQGSSTTLIQITAEFFKV + GPERVRTAEVDTWVTPYDQLTGSSRLTFAAGNAVLMACEDVKNQILTMAAQMMQATPE + ELDLADMVVFVKENPGRSMRVKDLFRTVFFTGSFLPTGGELLGKATFTVPSSKIDPET + GHAANDGMRKIFSFCTRAAQAVEVAVNIETGQVKLEKIAIANDLGKAINPMSCEGQMH + AAISMGLGQAISEELQISEGSVANGDFSSYRFLTAKDAPSNDHVSTHIVEIPQFDGPY + KAKGFSEATTSPTAPAIANAIFDAVGLRLRHMPMTPERVLEGLDRLTSADRD" + gene 26894..28153 + /gene="livJ" + /locus_tag="pRL80026" + CDS 26894..28153 + /gene="livJ" + /locus_tag="pRL80026" + /codon_start=1 + /transl_table=11 + /product="putative branched-chain amino acid ABC + transporter binding component" + /protein_id="CAK02825.1" + /db_xref="EnsemblGenomes-Gn:pRL80026" + /db_xref="EnsemblGenomes-Tr:CAK02825" + /db_xref="GOA:Q1M9I2" + /db_xref="InterPro:IPR000709" + /db_xref="InterPro:IPR028081" + /db_xref="InterPro:IPR028082" + /db_xref="UniProtKB/TrEMBL:Q1M9I2" + /translation="MSDFTKRELLRVMALTMGALALTQPAWSEDQPIKIGSSMALSGP + LAGGGRQSQLALQMWVEDVNSRGGLLGRKVELVTYDDQGSPAQSPGIFSKLIDLDNAD + LLIAPYGTVPAAAVMPLVKERGRLLIGQIGYQINSKVHHDMWFNNSPWNDAESWVGGF + FKLGETVGVKKVAFLAADQEFSQNILAGAKALAGKAGFETVYEQTYPPTTVDFSAMIR + AIRAASPDMVFVASYPADSTAIIRAVNEIGVGSSVKLFGGGMVGLQYASVMQALGSQL + NGVVNYHTYVPEKTMAFPGVKEFLDRYAEKAKAAKVETLGYYVAPFSYASGQILEQAV + KATGSLDNAELAKYLRTNEVQTIVGPIRWGTDGEWSQPRVVMVQFRDVKDGDAEQFRQ + EGKQVIVYPDKYKTGDLVSPLSGAQGR" + gene 28175..29053 + /gene="livM" + /locus_tag="pRL80027" + CDS 28175..29053 + /gene="livM" + /locus_tag="pRL80027" + /codon_start=1 + /transl_table=11 + /product="putative branched-chain amino acid ABC + transporter permease component" + /protein_id="CAK02826.1" + /db_xref="EnsemblGenomes-Gn:pRL80027" + /db_xref="EnsemblGenomes-Tr:CAK02826" + /db_xref="GOA:Q1M9I1" + /db_xref="InterPro:IPR001851" + /db_xref="UniProtKB/TrEMBL:Q1M9I1" + /translation="MTMVSIDLLIEGLVFGVLVGCFYAAVSIGLSIAFGLLDVPHIAH + ASFLVLAAYMTFLLGSFGIDPLLAGALILIPFFFLGAAVYRFYYEAFEKRGTDAGVRG + IAFFFGIAFIVQVVLSLVFGLDQQSVSAPYIGSSLALGEMRIPWRLIVALVVAVGLVL + SLNLYLSRTFRGRAIRAVAQDPWALKVIGANPVLTKQWAFGLATAATAVGGALLIIVS + PVEPSLDRVYIGRTFCVVVLAGLGSMNGTLIAGILLGVMESLVLTAFGASWAPAVAFG + LLLLVLGLRPQGLFGR" + gene 29050..30042 + /gene="livH" + /locus_tag="pRL80028" + CDS 29050..30042 + /gene="livH" + /locus_tag="pRL80028" + /note="Similar to codons 320 to 640 of Brucella melitensis + high-affinity branched-chain amino acid transport system + permease protein LivH / high-affinity branched-chain amino + acid transport ATP-binding protein LivG BmeII0874 + SWALL:Q8YBM5 (EMBL:AE009721) (916 aa) fasta scores: E(): + 3.3e-26, 35.12% id in 316 aa, and entire protein is + similar to entire protein of Rhodopseudomonas palustris + putative branched-chain amino acid transport system + permease protein precursor rpa1750 SWALL:CAE27191 + (EMBL:BX572598) (328 aa) fasta scores: E(): 3.9e-24, + 31.03% id in 319 aa" + /codon_start=1 + /transl_table=11 + /product="putative branched chain amino acid ABC + transporter permease component" + /protein_id="CAK02827.1" + /db_xref="EnsemblGenomes-Gn:pRL80028" + /db_xref="EnsemblGenomes-Tr:CAK02827" + /db_xref="GOA:Q1M9I0" + /db_xref="InterPro:IPR001851" + /db_xref="UniProtKB/TrEMBL:Q1M9I0" + /translation="MRNNSIPFWLLALALPVLAFILPKLGLNEYYLYVGYVILQYVVL + ATAWNILGGYAGYVNFGTGAFFGLGAYTALVLMKAFGAPLPIQIAGAAIVGALLGIGA + GLLTLRLKGIFFSIATIAAAIVIETFILNWRFVGGATGMQIIRPEVPLGFDTYTRLLL + FVMTVLTVIAIIVARYIERSWLGRGLHAVRDAEAAAECSGVPTLRLRLIACAISGALM + AAAGAPFPLYTSFVEPSSTFSLNYSVMALSMAVVGGMSRWWGPVLGAILIASSQQLAA + SASPELHLLVVGLLMVIFVIMAPEGLVGLAKLARKSLQPGAKSIRLVEGAKAHD" + gene 30035..30766 + /gene="livG" + /locus_tag="pRL80029" + CDS 30035..30766 + /gene="livG" + /locus_tag="pRL80029" + /codon_start=1 + /transl_table=11 + /product="putative branched-chain amino acid ABC + transporter ATP-binding component" + /protein_id="CAK02828.1" + /db_xref="EnsemblGenomes-Gn:pRL80029" + /db_xref="EnsemblGenomes-Tr:CAK02828" + /db_xref="GOA:Q1M9H9" + /db_xref="InterPro:IPR003439" + /db_xref="InterPro:IPR003593" + /db_xref="InterPro:IPR027417" + /db_xref="UniProtKB/TrEMBL:Q1M9H9" + /translation="MTSLLKVDNVTKRFGGFTALTDVNLDIAKGERLGLIGPNGSGKT + TLINCISGVLPIEAGAIAFDGADISKLATYRRAKAGLARTFQIPKPFHSMTVIENLMV + PLEYIVHRLVDAKNQNAAHSEASDLLRRVRLADRMSAPAGQLSQVELRKLELARAVAA + RPKLLICDEAMAGLATKEVHEILDILMDLNSSGITIVMVEHILQAVMRFSQRIVCLTA + GRIICDGAPADVMANPEVRRAYLGS" + gene 30771..31487 + /gene="livF" + /locus_tag="pRL80030" + CDS 30771..31487 + /gene="livF" + /locus_tag="pRL80030" + /codon_start=1 + /transl_table=11 + /product="putative high-affinity branched-chain amino acid + transport ATP-binding protein LivF" + /protein_id="CAK02829.1" + /db_xref="EnsemblGenomes-Gn:pRL80030" + /db_xref="EnsemblGenomes-Tr:CAK02829" + /db_xref="GOA:Q1M9H8" + /db_xref="InterPro:IPR003439" + /db_xref="InterPro:IPR003593" + /db_xref="InterPro:IPR017871" + /db_xref="InterPro:IPR027417" + /db_xref="UniProtKB/TrEMBL:Q1M9H8" + /translation="MVRDLHAGYGTVQVLHGLSIEAREGETVVLLGTNGNGKSTLMKC + LIGDVRPTQGSITLELDGQAIDLTHLGTDQIVEYGISIVPEGRRLFPQLTVEENLLLG + AYRKAARSKIAANLEFCYGAFPILKERRRQLSGSMSGGQQQMLALGRAIMSSPRILLV + DEPSVGLAPIMVSQAIAKIGELKEQFGLTVVMAEQNFQEAMRIADRGYVLVHGEVAFS + AETAAELRDSELISQLYLGG" + gene 31566..33389 + /gene="mcpS" + /locus_tag="pRL80031" + CDS 31566..33389 + /gene="mcpS" + /locus_tag="pRL80031" + /codon_start=1 + /transl_table=11 + /product="probable MCP type chemoreceptor." + /protein_id="CAK02830.1" + /db_xref="EnsemblGenomes-Gn:pRL80031" + /db_xref="EnsemblGenomes-Tr:CAK02830" + /db_xref="GOA:Q1M9H7" + /db_xref="InterPro:IPR003660" + /db_xref="InterPro:IPR004010" + /db_xref="InterPro:IPR004089" + /db_xref="UniProtKB/TrEMBL:Q1M9H7" + /translation="MSNFRISSKLILLTTGLVIVFALAAVFLIEAATETIYSERKDAL + KTQVDIAYSIVTTLHSDETAGKISREEAIAQATALVSQIHYEPNGVIFGYDYSGVRVI + NPGNAGVGKNFMALTDKNGTPLIKNIIDAGRAGGGFSEYLWPKPGAGDDATSVKVSYS + KAFDPWQLVLGTGAYLDDIDEKINQVYVQALGIVAAVLVISLIGALAVVRGITRPLTR + IHSSLSAVSNDDVSIAIPHTNLTNEIGMMARATKILQDKVRDRLTMEQREADQQRLIE + QERSEASRIQEEEAAGQAHVVKQLSQALAALSEGDLTVRCSDLGSRYDVLRANFNSAI + SRLLQAMQAVARNAGAITAGSEQIRSASDELSKRTEQQAASVEETAAALEEITTTVAQ + ASRRAEEAGRLVRRTRENAEVSGTVVGRAIEAMSKIEASSAEISGIIGVIDEIAFQTN + LLALNAGVEAARAGDAGKGFAVVAQEVRELAQRSAKAAQQINQLIAVSNVHVQTGVAL + VGETGSALSTIVSQVKQVSDNVEGIVEAAKEQSLGIAEINQAINVVDRGTQQNAAMVE + ESAAAAHSLAAEAAALLRLLAQFNVGGGATSHVTPKMAVAG" + gene complement(33393..34373) + /locus_tag="pRL80032" + CDS complement(33393..34373) + /locus_tag="pRL80032" + /codon_start=1 + /transl_table=11 + /product="putative LysR family transcriptional regulator" + /protein_id="CAK02831.1" + /db_xref="EnsemblGenomes-Gn:pRL80032" + /db_xref="EnsemblGenomes-Tr:CAK02831" + /db_xref="GOA:Q1M9H6" + /db_xref="InterPro:IPR000847" + /db_xref="InterPro:IPR005119" + /db_xref="InterPro:IPR011991" + /db_xref="UniProtKB/TrEMBL:Q1M9H6" + /translation="MNLRQIRYFLEIAEHGNFTRASEVLNIAQPALSRQIRLLEEDLG + ETLFIRSGHGVALTAAGKALRERATVLVYEFDKLRDVVSDQSEPRGHLTVGLPPAISH + MVSIELIDAYCRQYADVHLHIREGISLDLIAGIQQSKIDCAIVVSDENQGLRSEFLFR + ETLFLVAPASEKYDLNQLLTLDSAAGKPLILTNRMNNFRVAIEDAFGRNQLPMRVLAD + SNSTSMIAALVAKGTAYSILPYCAIDAALSHGLISASPIEGLSVDWAFVSPPQTELTI + PAKLFHQMLLDQIHRKVEGNNWLGAVLTTTPAPISADNQQASIDMLGNGP" + gene complement(34591..35262) + /locus_tag="pRL80033" + CDS complement(34591..35262) + /locus_tag="pRL80033" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02832.1" + /db_xref="EnsemblGenomes-Gn:pRL80033" + /db_xref="EnsemblGenomes-Tr:CAK02832" + /db_xref="InterPro:IPR025877" + /db_xref="InterPro:IPR029044" + /db_xref="UniProtKB/TrEMBL:Q1M9H5" + /translation="MSNTPRADHPAGKRRTADQSAIPRVAILLLAAGQAARMGPTGGH + KLLAEFDGIPLVRRMAAVALGSNAAAVILVTGHRRTEIETAVTGLNLEMVENPHYLTG + MASSLVAGVSYLEDRQIDGALVMLADMPGLLSSHLNQLIATFQLSGQDCIVRAACQGR + PGNPVILPTSLNQQILQLKGDIGARQIIENASLPVFQVEIGEAALLDLDTPEAVKGAG + GIIKY" + gene complement(35280..36332) + /gene="moaA1" + /locus_tag="pRL80034" + CDS complement(35280..36332) + /gene="moaA1" + /locus_tag="pRL80034" + /codon_start=1 + /transl_table=11 + /product="putative molybdenum cofactor biosynthesis + protein A" + /protein_id="CAK02833.1" + /db_xref="EnsemblGenomes-Gn:pRL80034" + /db_xref="EnsemblGenomes-Tr:CAK02833" + /db_xref="GOA:Q1M9H4" + /db_xref="InterPro:IPR000385" + /db_xref="InterPro:IPR006638" + /db_xref="InterPro:IPR007197" + /db_xref="InterPro:IPR010505" + /db_xref="InterPro:IPR013483" + /db_xref="InterPro:IPR013785" + /db_xref="UniProtKB/TrEMBL:Q1M9H4" + /translation="MTFGERHMLAFSGSRSVLRTPPMIDPFGRAVTYLRVSVTDRCDF + RCTYCMAENMTFLPKKDLLTLEELNRLCSAFIAKGARKIRLTGGEPLVRKNIMFLVRE + LGEKIGSGLDELTLTTNGSQLARHADELYDCGVRRINVSLDTLDPDKFRAITRWGDFA + KVTEGIDAAQKAGLKIKLNAVALKDFNEAEMPDLLRFAHGRGMDLTVIETMPMGEIEE + DRTDRYLPLSKLRADLEQQFTFADILYKTGGPARYVDVAETGGRLGFITPMTHNFCES + CNRVRLTCTGTLYMCLGQNDAADLRVALRATEDDGLLYAAIDEAITRKPKGHDFIIDR + TYKRPAVARHMSVTGG" + gene complement(37172..37888) + /locus_tag="pRL80035" + CDS complement(37172..37888) + /locus_tag="pRL80035" + /codon_start=1 + /transl_table=11 + /product="putative TetR family transcriptional regulator" + /protein_id="CAK02834.1" + /db_xref="EnsemblGenomes-Gn:pRL80035" + /db_xref="EnsemblGenomes-Tr:CAK02834" + /db_xref="GOA:Q1M9H3" + /db_xref="InterPro:IPR001647" + /db_xref="InterPro:IPR009057" + /db_xref="InterPro:IPR015893" + /db_xref="InterPro:IPR023772" + /db_xref="UniProtKB/TrEMBL:Q1M9H3" + /translation="MIEGSHQPHPDFRRTMVPKHPTKRRRLPPTERREEILQKAINLF + SEHGFESSTREIARQLGITQPLLYRYFPSKEDLIREAYRSVYLERWDVEWDRLLCDRT + KPLDWRLKAFYDSYTHAIFTRDWMRIYLFSGLKGADINRWYIGLLEERILQRIIKEYR + HLAGLDGERQPEAEELELAWALHSGIFYLGVREHIFSLPPPKDRQKIIGNVVDVFHHG + IQAYFSKLSAMKPAKLAAQS" + gene 37962..38912 + /locus_tag="pRL80036" + CDS 37962..38912 + /locus_tag="pRL80036" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02835.1" + /db_xref="EnsemblGenomes-Gn:pRL80036" + /db_xref="EnsemblGenomes-Tr:CAK02835" + /db_xref="GOA:Q1M9H2" + /db_xref="InterPro:IPR007325" + /db_xref="UniProtKB/TrEMBL:Q1M9H2" + /translation="MTFTRQAIYDAAKKVSNWGRWGDDDQIGTLNNIEPTDIVAAASL + VRKGKTFSLGLSLKEPIQSGLFGGRWNPIHTMLATGTDAAAGNQDEPAPYLRYADDAI + NMPCQASTQWDALCHIFLDDKMYNGYDARLVDVKGAKKLGIEHYRDKMVGRGVLLDIA + RWKSVASLDDGYAITPADLDGCAASQGVEIRKGDFVIVRTGHQERCLAKGSWEGYAGG + DAPGMGFDTCFWLRDKDVAGICTDTWGCEVRPNQTKEANQPWHWVVIPAMGIAMGEIF + YLKELAEDCAGDKVYEFFFLAPPLHLPGGAGSPINPQAIK" + gene 38920..39687 + /locus_tag="pRL80037" + CDS 38920..39687 + /locus_tag="pRL80037" + /codon_start=1 + /transl_table=11 + /product="Putative short-chain dehydrogenase" + /protein_id="CAK02836.1" + /db_xref="EnsemblGenomes-Gn:pRL80037" + /db_xref="EnsemblGenomes-Tr:CAK02836" + /db_xref="GOA:Q1M9H1" + /db_xref="InterPro:IPR002198" + /db_xref="InterPro:IPR002347" + /db_xref="InterPro:IPR016040" + /db_xref="InterPro:IPR020904" + /db_xref="UniProtKB/TrEMBL:Q1M9H1" + /translation="MSGFNILERFSLSGRRALVTGAGRGLGRSIAEGLASAGAEVTLC + ARTESEVEEGARCIRDHGFKAEALVADVSDIAGFRATVDAMHAHDIFVNNAGTNRPKP + LSDVTIEDFDAVIGLNLRAAVFAAQAVTARMANLGIQGSVINMSSQMGHVGAANRTIY + CASKWALEGFTKALAVELGPVGIRVNTVAPTFIETPMTTPFLEDPAARNAIVSKIKLG + RLGTPEDVVGAVLFLASDASALVTGSALLVDGGWTAD" + gene 39952..40956 + /locus_tag="pRL80038" + CDS 39952..40956 + /locus_tag="pRL80038" + /codon_start=1 + /transl_table=11 + /product="putative exported xanthine dehydrogenase/CoxI + family protein" + /protein_id="CAK02837.1" + /db_xref="EnsemblGenomes-Gn:pRL80038" + /db_xref="EnsemblGenomes-Tr:CAK02837" + /db_xref="InterPro:IPR003777" + /db_xref="InterPro:IPR027051" + /db_xref="UniProtKB/TrEMBL:Q1M9H0" + /translation="MSMQCELQAAPLLSAAHPALADPWELAINASGDVVMAVLTETRG + PAYRLPGAAMAILPDGSFSGAITSGCVEADLILNASDVRNTGDVRVLRYGEGSTFIDI + RLPCGGGIEVMLFPLLDVEVLGKLAKARKLRRPVSLQISKSGRLTLGPITETKNDAHG + FALGFEPPLQFLTFGAGPEASVFAALVEGLGYEQQLVSHDAMTLASTRASGNKCRELT + NLSELFALQIDARTAAVLFYHDHDYEPEIIKHLLSTPAFYIGAQGSRATQRSRLQRLE + EIGVSLNRSSRVRGPIGVIPSSRDPKSLAVSVLAEIMAAGSQVSKASEQPNLKKECCL + " + gene 41369..42216 + /locus_tag="pRL80039" + /pseudo + CDS join(41369..42067,42067..42216) + /locus_tag="pRL80039" + /pseudo + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein, pseudogene" + gene complement(42495..42740) + /locus_tag="pRL80039A" + /pseudo + CDS complement(42495..42740) + /locus_tag="pRL80039A" + /pseudo + /codon_start=1 + /transl_table=11 + gene complement(42920..43900) + /locus_tag="pRL80040" + CDS complement(42920..43900) + /locus_tag="pRL80040" + /codon_start=1 + /transl_table=11 + /product="putative HTH transcriptional regulatory protein" + /protein_id="CAK02840.1" + /db_xref="EnsemblGenomes-Gn:pRL80040" + /db_xref="EnsemblGenomes-Tr:CAK02840" + /db_xref="GOA:Q1M9G9" + /db_xref="InterPro:IPR000843" + /db_xref="InterPro:IPR010982" + /db_xref="InterPro:IPR028082" + /db_xref="UniProtKB/TrEMBL:Q1M9G9" + /translation="MKPTAKQVAHVAGVSIAAVSRAFTPGAPINPEKKKKIFAVAEEI + GYISPARRTAKAVAASTITLVAGDLHNPFYPLVLDTLARHLQESGRQLLVYALPSDCN + IDAVTDQVLAARPSGIIVTSASLTSNMARACRQNQIKAVLLNRIQRDMRINAVSCDNY + QGGRDIGRLLLDRGCQRISLIGGVANTSTHAERVRGFRDVLAEAGRTIHAQASGNYQY + EIGKQAAVHLLSAVQPPDAIFCCNDIMALAVIDAAKERGLRIPQDLAVAGFDDIPMAS + WSSYQLTTIRQPVERMVQEAVSLIDDPNIKASDNGFTRILSGMLVLRSSA" + gene 43991..45301 + /gene="hisD2" + /locus_tag="pRL80041" + CDS 43991..45301 + /gene="hisD2" + /locus_tag="pRL80041" + /EC_number="1.1.1.23" + /codon_start=1 + /transl_table=11 + /product="putative histidinol dehydrogenase" + /protein_id="CAK02841.1" + /db_xref="EnsemblGenomes-Gn:pRL80041" + /db_xref="EnsemblGenomes-Tr:CAK02841" + /db_xref="GOA:Q1M9G8" + /db_xref="InterPro:IPR001692" + /db_xref="InterPro:IPR012131" + /db_xref="InterPro:IPR016161" + /db_xref="InterPro:IPR022695" + /db_xref="UniProtKB/TrEMBL:Q1M9G8" + /translation="MATTYIKRGKPENERSEDDQKVRFTVENILKDIEARGDAAVREL + SEKFDKYSPASFKLSASEIEALMNRVSARDMEDIKFAQAQVRNFAQAQRDSMLDIEVE + TLPGVILGHKNIPVQSVGCYIPGGKFPMVASAHMSVATASVAGVPRIAAATPAFKGEP + NPAVIAAMYLGGAHEIYVLGGIQAIGALAIGTETIEPVHMLVGPGNAFVAEAKRQLYG + RVGIDLFAGPTETMVIADETVDAEICATDLLGQAEHGYNSPAVLVTNSHKLAQATLTE + IDRILKILPTADTASKSWADYGEIIVCDTYEEMLDVANDIASEHVQVMTDRDDWFLAN + MHSYGALFLGPRTNVANGDKVIGTNHTLPTKKAGRYTGGLWVGKFIKTHSYQKVLTDE + AAAMIGEYCSRLCMLEGFIGHAEQANVRVRRYGGRNVGYGTAAE" + gene 45465..47039 + /gene="pntA" + /locus_tag="pRL80042" + CDS 45465..47039 + /gene="pntA" + /locus_tag="pRL80042" + /EC_number="1.6.1.2" + /codon_start=1 + /transl_table=11 + /product="putative NAD(P) transhydrogenase subunit A" + /protein_id="CAK02842.1" + /db_xref="EnsemblGenomes-Gn:pRL80042" + /db_xref="EnsemblGenomes-Tr:CAK02842" + /db_xref="GOA:Q1M9L0" + /db_xref="InterPro:IPR007698" + /db_xref="InterPro:IPR007886" + /db_xref="InterPro:IPR024605" + /db_xref="InterPro:IPR026255" + /db_xref="UniProtKB/TrEMBL:Q1M9L0" + /translation="MRIGTPREICADEARVAMTPDSAAHLQKLGHTCLIESGAGLLAG + FTDEAYRDAGVEIVESATELFRSADVIAKVRPPELIEIDSIDAGKTMISFFYPAQNDV + LLARAIDRGVNVIAMDMVPRISRAQKMDALSSMANIAGYRAVIEAGSNFGRFFTGQVT + AAGKVPPAKVLVIGAGVAGLAAIGTATSLGAITYAFDVRPEVAEQIESMGAEFVFLDF + GDQQQDGAASGGYATPSSPEFRQKQLDTFRSLTPEIDIVITTALISGRDAPKLWLADM + VAMMKPGSVVVDLAAERGGNCELTVEGTRIVSGNGVIVIGYTDFPSRMATQSSTLYAT + NIRHMLADLTPARDGILVHNMEDDVIRGATVAFESAVTFPPPPPKVQAIAVQKVKQKP + KEPSREERRQREAAAFRAQTRSQVALLAFATLLLLVAGAYAPASFMNHLIVFALSCFI + GFQVIWNVSHALHTPLMAVTNAVSGIVILGALLQIGSASIPVTILASIAVLISTINIV + GGFFVTRRMLAMFQKS" + gene 47061..48500 + /gene="pntB2" + /locus_tag="pRL80043" + CDS 47061..48500 + /gene="pntB2" + /locus_tag="pRL80043" + /EC_number="1.6.1.2" + /codon_start=1 + /transl_table=11 + /product="NAD(P)(+) transhydrogenase subunit B" + /protein_id="CAK02843.1" + /db_xref="EnsemblGenomes-Gn:pRL80043" + /db_xref="EnsemblGenomes-Tr:CAK02843" + /db_xref="GOA:Q1M9K9" + /db_xref="InterPro:IPR012136" + /db_xref="InterPro:IPR029035" + /db_xref="UniProtKB/TrEMBL:Q1M9K9" + /translation="MTIGIISAANIAAAILFILSLGGLSGQESAKRAVWYGMAGMGLA + LVAAIFGAAGGSWLNLILMIGGGALIGYALARRVQMTEMPQLVAALHSFVGLAAVFIS + FNTHLEAARVAALDETSRTGLAGYPAILAQKDAVELLIMKAEIFIGVFIGAVTFTGSV + IAFGKLAGKVDGKARKLAGGHGLNAAAALLSVALLIIYYQSGGLLPLAVMTALALFIG + FHLIMGIGGADMPVVVSMLNSYSGWAAAAIGFTLGNDLLIVTGALVGSSGAILSYIMC + KAMNRSFISVILGGFGTKAGPLLEITGEQVAIDSAGVAAALNDAQSIIIVPGYGMAVA + QAQQAVSDLTRRLRATGKTVRFAIHPVAGRLPGHMNVLLAEAKVPYDIVLEMDEINED + FASTDVVIVIGSNDIVNPAAQEDANSPIAGMPVLEVWKAKQVFVSKRGQGTGYSGIEN + PLFFRENTRMFYGDARRSLEELMPKVVSV" + gene 48652..50592 + /gene="acsA2" + /locus_tag="pRL80044" + CDS 48652..50592 + /gene="acsA2" + /locus_tag="pRL80044" + /EC_number="6.2.1.1" + /codon_start=1 + /transl_table=11 + /product="putative acetyl-coenzyme A synthetase" + /protein_id="CAK02844.1" + /db_xref="EnsemblGenomes-Gn:pRL80044" + /db_xref="EnsemblGenomes-Tr:CAK02844" + /db_xref="GOA:Q1M9K8" + /db_xref="InterPro:IPR000873" + /db_xref="InterPro:IPR011904" + /db_xref="InterPro:IPR020845" + /db_xref="InterPro:IPR025110" + /db_xref="UniProtKB/TrEMBL:Q1M9K8" + /translation="MSDIVRYPPAKSTVARALIDKEKYLKWYEESVENPDKFWGKHGR + RIDWFKPYTKVKNTSFTGKVSIKWFEDGQTNVSYNCIDRHLKTNGDQVAIIWEGDNPY + IDKKVTYNELYEHVCRMANVLKKHGVKKGDRVTIYMPMIPEAAYAMLACARIGAVHSV + VFGGFSPEALAGRIVDCESTFVITCDEGVRGGKPVPLKDNTDTAIDIAARQHVRVSKV + LVVRRTGGKTGWAPGRDLWHHQEIATVKPECPPVKMKAEDPLFILYTSGSTGKPKGVL + HTTGGYLVYAAMTHEYVFDYHDGDVYWCTADVGWVTGHSYIVYGPLANCATTLMFEGV + PNFPDQGRFWEVIDKHKVNIFYTAPTAIRSLMGAGDDFVTRSSVRLLGTVGEPINPEA + WEWYYNVVGDKRCPVIDTWWQTETGGHMITPLPGAIDLKPGSATVPFFGIKPQLVDNE + GKVLEGAADGNLCITDSWPGQMRTVYGDHDRFIQTYFSTYKGKYFTGDGCRRDADGYY + WITGRVDDVLNVSGHRLGTAEVESALVSHNLVSEAAVVGYPHPIKGQGIYCYVTLMAG + HEGTDTLRQELVKHVRAEIGPIAAPDKIQFAPGLPKTRSGKIMRRILRKIAEDDFGAL + GDTSTLADPAVVDDLIANRQNR" + gene complement(50779..51627) + /locus_tag="pRL80045" + CDS complement(50779..51627) + /locus_tag="pRL80045" + /codon_start=1 + /transl_table=11 + /product="putative FAA hydrolase family protein" + /protein_id="CAK02845.1" + /db_xref="EnsemblGenomes-Gn:pRL80045" + /db_xref="EnsemblGenomes-Tr:CAK02845" + /db_xref="GOA:Q1M9K7" + /db_xref="InterPro:IPR002529" + /db_xref="InterPro:IPR011234" + /db_xref="UniProtKB/TrEMBL:Q1M9K7" + /translation="MFRLVHKMKLATIIVDGQRKVVVIDPSGERYTPAAEAFPDLAPS + THEDMVSLIAEIGRQKRSAPLEGSTAIAIDDLLPPITNPPHNVFCVGKNYHAHAHEFT + KSGFDAGATAAEAIPEHPIIFTKPSSSLARPFGDIPLWPGLDEAVDYEAELAVVIGKA + GRFITAERALDHVFGYTVFNDVTARDLQKKHKQWFLGKGIDGFGPIGPWIVTKDELDI + ANVEITCTVNGEERQKTSTKDLIFDIPTLIEVISRSVTLLPGDIIATGTPAGVGSGSA + LSHPGS" + gene complement(51679..52368) + /locus_tag="pRL80046" + CDS complement(51679..52368) + /locus_tag="pRL80046" + /codon_start=1 + /transl_table=11 + /product="putative TetR family transcriptional regulatory + protein" + /protein_id="CAK02846.1" + /db_xref="EnsemblGenomes-Gn:pRL80046" + /db_xref="EnsemblGenomes-Tr:CAK02846" + /db_xref="GOA:Q1M9K6" + /db_xref="InterPro:IPR001647" + /db_xref="InterPro:IPR009057" + /db_xref="InterPro:IPR011075" + /db_xref="InterPro:IPR015893" + /db_xref="UniProtKB/TrEMBL:Q1M9K6" + /translation="MPSRPPAALSPKGVVSIFEAVATPTAALSKRDAILAAATRVFLR + QGYEGTSMDLVAQESGAARRTLYNQFPDGKEELFRAVVERVWSAFPVLDIVAEAETQA + DPNVGLRRIAAAVAAFWEPPLAIAFLRMIIAEGTRFPDLTESFFKHGKTPAMGAVRAY + IEMQAERGLLTIKDGERAARQFLGLIDEPLLWIRVLGRDEKFSQSERQAVIDEAVDIF + LGHYRTRRAGR" + gene 52474..53688 + /locus_tag="pRL80047" + CDS 52474..53688 + /locus_tag="pRL80047" + /codon_start=1 + /transl_table=11 + /product="putative major facilitator superfamily (MFS) + transporter transmembrane component" + /protein_id="CAK02847.1" + /db_xref="EnsemblGenomes-Gn:pRL80047" + /db_xref="EnsemblGenomes-Tr:CAK02847" + /db_xref="GOA:Q1M9G7" + /db_xref="InterPro:IPR011701" + /db_xref="InterPro:IPR016196" + /db_xref="InterPro:IPR020846" + /db_xref="UniProtKB/TrEMBL:Q1M9G7" + /translation="MAGESTPHSQLTRGTLPMLAAACGITVGNVYLCQPLLDQMAVSL + RVPEQTAGLVAVGAQVGYALGILFVLPLADVIASRRLVRTLLVLTSLFLLAAAFSSRT + SLLAAASVALTASTVVPQILIPIVSGMTAPEHRGRTIGALQTGLILGILLSRTASGSL + AQVTGTWRSPYLLAAVLTGLLVLIVPRLIPERETKPRHTGYLSLLRSLPPLLQHRPLR + LSMTLGFLVFGAFSALWATLAFYLAGPDFGFGPATAGLFGLYGAPGAILAPMAGRLSD + RVGSSKINLVSLAASGIALALAGWLGGGSLLILVVAVNLLDFGLQSGQIANQTRILGL + GDDIRARLNTLYMAATFGGGAAGSFAGMLAWSFGGWTSACGLSLALIAAAASTLVLNW + KNEYRSSHMKGE" + gene 53690..54421 + /locus_tag="pRL80048" + CDS 53690..54421 + /locus_tag="pRL80048" + /codon_start=1 + /transl_table=11 + /product="putative short-chain dehydrogenase/reductase" + /protein_id="CAK02848.1" + /db_xref="EnsemblGenomes-Gn:pRL80048" + /db_xref="EnsemblGenomes-Tr:CAK02848" + /db_xref="GOA:Q1M9G6" + /db_xref="InterPro:IPR002198" + /db_xref="InterPro:IPR002347" + /db_xref="InterPro:IPR016040" + /db_xref="UniProtKB/TrEMBL:Q1M9G6" + /translation="MSDLTGRVALVTGASRGIGRDIAYALSSAGASVAVGYHSDRTGA + EAVAETIRQEGGRAVAVGGDVSDPQIAVDLVRETEAQLGPLGIVVNNAGINPSRPLDQ + ITAADWDETIRVNLTSAFHVTQAAVPGLRERKWGRIITISSVAAQLGGVIGPHYAASK + AGLIGLAHYYAAALAKEGITSNAIAPALIETEMLKSNSAIQPTLIPVGRFGQTHEVSS + VVVLLAGNGYITGQTISVNGGWYMS" + gene 54437..55252 + /locus_tag="pRL80049" + CDS 54437..55252 + /locus_tag="pRL80049" + /codon_start=1 + /transl_table=11 + /product="putative aldo/keto reductase family protein" + /protein_id="CAK02849.1" + /db_xref="EnsemblGenomes-Gn:pRL80049" + /db_xref="EnsemblGenomes-Tr:CAK02849" + /db_xref="InterPro:IPR001395" + /db_xref="InterPro:IPR023210" + /db_xref="UniProtKB/TrEMBL:Q1M9G5" + /translation="MLTRIIPATKEALPVIGLGTYRGFDVTLNAPGEERLSNVLDTLF + AAGGTLLDSSPMYGRAEEVVGALLTRQPRADSPFLATKVWTSGREAGVRQIEQSFRLL + RSDVIDLIQVHNLQDWQTHLQTLRGLKEAGRIRYIGITHYTRSGYAEVERVLNTTPVD + FLQINYSVEEREAEKRLLPLAEDKGVAVLCNRPFGGGDLLRRLKAKPLPDWAEEVGAT + SWAQLALKFVLGHRAITCAIPGTGNPASMIDNTKAASGSVLTPKQRAELIETV" + gene 56210..57219 + /locus_tag="pRL80050" + /pseudo + CDS join(56210..57043,57043..57219) + /locus_tag="pRL80050" + /pseudo + /codon_start=1 + /transl_table=11 + /product="putative exported lipase, pseudogene" + gene 57389..58106 + /locus_tag="pRL80051" + /pseudo + CDS join(57389..57706,57768..58106) + /locus_tag="pRL80051" + /pseudo + /codon_start=1 + /transl_table=11 + /product="putative glutathione-S-transferase, pseudogene" + gene complement(58223..58624) + /locus_tag="pRL80052" + CDS complement(58223..58624) + /locus_tag="pRL80052" + /note="Codons 35 to the C-terminus are similar to entire + protein of Streptomyces avermitilis hypothetical protein + sav442 SWALL:Q82QR3 (EMBL:AP005022) (101 aa) fasta scores: + E(): 2.4e-19, 62.24% id in 98 aa, and codons 35 to the + C-terminus are similar to Nitrosomonas europaea + hypothetical protein Ne2512 SWALL:Q82S47 (EMBL:BX321864) + (100 aa) fasta scores: E(): 3e-11, 44.68% id in 94 aa" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical exported protein" + /protein_id="CAK02852.1" + /db_xref="EnsemblGenomes-Gn:pRL80052" + /db_xref="EnsemblGenomes-Tr:CAK02852" + /db_xref="InterPro:IPR011008" + /db_xref="UniProtKB/TrEMBL:Q1M9G4" + /translation="MVSLKYSATAMVVALNISAALILGAPAHAAEPTSTAQANVGLWV + VLDAKAGKEEDVAQFLRGGRAIVQDEPATIAWYAVRLSKTQFAIFDTFPDEAGRSAHL + AGKVAAALMAKAPELLDHAPTISKIDIMATK" + gene complement(58664..59185) + /locus_tag="pRL80053" + CDS complement(58664..59185) + /locus_tag="pRL80053" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="putative transmembrane protein" + /protein_id="CAK02853.1" + /db_xref="EnsemblGenomes-Gn:pRL80053" + /db_xref="EnsemblGenomes-Tr:CAK02853" + /db_xref="GOA:Q1M9G3" + /db_xref="UniProtKB/TrEMBL:Q1M9G3" + /translation="MWPVGPIDILEICSDCPIVMSRQRLDLLSGTSPFSRFYKIGRAS + TLLFLKCCDEFWWSPQLRRSGRRGCIDRPAGHVNRRSCELDAFFRERLLETSLAWFDT + LGSIEVIQTIGASPILGDLFIAAWLGSVLIALIFVIWLRTRVKDRPLADAKLGDNDIK + DVIIATRAASSLA" + gene complement(59434..59883) + /locus_tag="pRL80054" + CDS complement(59434..59883) + /locus_tag="pRL80054" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02854.1" + /db_xref="EnsemblGenomes-Gn:pRL80054" + /db_xref="EnsemblGenomes-Tr:CAK02854" + /db_xref="InterPro:IPR024311" + /db_xref="UniProtKB/TrEMBL:Q1M9G2" + /translation="MTMVAHSIVHPAFADTKTLTGTWKLKRWVLEDVDTKEQKPSPFG + ERPAGCVFFSSGRILVLITAEDRKPAEGADGQSAAFRSLYTYSGKYRLENDRFITKID + IAGDQNWVGSEQQRTYRVDGDTLIIESVPATQAGKTLRGILEWEREP" + gene complement(60050..60301) + /locus_tag="pRL80055" + /pseudo + CDS complement(60050..60301) + /locus_tag="pRL80055" + /pseudo + /codon_start=1 + /transl_table=11 + gene 60290..60745 + /locus_tag="pRL80056" + CDS 60290..60745 + /locus_tag="pRL80056" + /note="Similar, but truncated 75 residues at the + N-terminus, to Rhizobium etli hypothetical protein + yi10a-iI SWALL:Q8KL99 (EMBL:U80928) (221 aa) fasta scores: + E(): 2.1e-45, 87.21% id in 133 aa, and similar, but + truncated at the N-terminus, to Rhizobium etli + hypothetical protein yi10b-iI SWALL:Q8KKR5 (EMBL:U80928) + (187 aa) fasta scores: E(): 3e-44, 86.46% id in 133 aa" + /codon_start=1 + /transl_table=11 + /product="putative transposase-related protein" + /protein_id="CAK02856.1" + /db_xref="EnsemblGenomes-Gn:pRL80056" + /db_xref="EnsemblGenomes-Tr:CAK02856" + /db_xref="GOA:Q1M9G1" + /db_xref="InterPro:IPR001584" + /db_xref="InterPro:IPR012337" + /db_xref="UniProtKB/TrEMBL:Q1M9G1" + /translation="MLRHMATPSVTTRTRSQQPVTARRPSPWAYGEDVVQTLERVCRN + VGYPKTIRVDQGTEFVSRDLDLWAYAKGATLDFSRPGKPTDNAFIEAFNGRFRAECLN + LHWFLTLADAREKMEDWRRYYNEERPHGAIGNKPPISLMNSGGATSSPP" + gene complement(61029..61217) + /locus_tag="pRL80056A" + CDS complement(61029..61217) + /locus_tag="pRL80056A" + /codon_start=1 + /transl_table=11 + /product="putative transcriptional regulator" + /protein_id="CAK02857.1" + /db_xref="EnsemblGenomes-Gn:pRL80056A" + /db_xref="EnsemblGenomes-Tr:CAK02857" + /db_xref="GOA:Q1M9G0" + /db_xref="InterPro:IPR001387" + /db_xref="InterPro:IPR010982" + /db_xref="UniProtKB/TrEMBL:Q1M9G0" + /translation="MRALRQARKMSQEELAHRASVDRTYISSLERCVYSPSIEVLDRF + AAVLGVEPADLLRKPNKE" + gene 61298..62716 + /locus_tag="pRL80057" + CDS 61298..62716 + /locus_tag="pRL80057" + /note="Similar, but truncated at the N-terminus and + extended at the C-terminus, to Escherichia coli DNA + polymerase IV DinB or DinP or b0231 SWALL:DPO4_ECOLI + (SWALL:Q47155) (351 aa) fasta scores: E(): 0.0017, 24.84% + id in 330 aa" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02858.1" + /db_xref="EnsemblGenomes-Gn:pRL80057" + /db_xref="EnsemblGenomes-Tr:CAK02858" + /db_xref="GOA:Q1M9F9" + /db_xref="InterPro:IPR001126" + /db_xref="UniProtKB/TrEMBL:Q1M9F9" + /translation="MIARSGSKRWVSAADAAARKAGVHVGMPAAKAQALFRGLMLVDA + DPVKDAAALERITLWALTLYSPIVAVDGIDGIVMDTEGADHLQGGELPMVTKIANQFL + AKKLTPRVAIADTWGAAHACARAISRETVIVPIGETVRAVEKLPISLLRLPGKVVSDL + RTLGFQTIGELANTPRAPLTLRFGPEIGRRLDQMFGRVSEPIDPIRTAELIEVSRAFA + EPIGAAETINKYVGRLVVQLIEELQKRGLGVRRADLIVEKVDGARQAIRAGAVKPVRD + VAWLTKLFRDRTEKIEPGFGIEKLTLVAVIVEPLEERQRSSSLVEEEVKDVTPLIDIY + GNRGQRVYRVAPVASDVPERSVQRISPAADPVEVTWVSHWRRPVRLLARPELIEAIAL + LPDRPPVSITWRGKRRKVKRADGPERIFGEWWRRDAEMEAVRDYFVIEDEAGERLWVF + RSGDGIDPETGNHRWFCHGIFA" + gene 62713..63579 + /locus_tag="pRL80058" + /pseudo + CDS 62713..63579 + /locus_tag="pRL80058" + /pseudo + /codon_start=1 + /transl_table=11 + gene complement(63550..64737) + /locus_tag="pRL80059" + CDS complement(63550..64737) + /locus_tag="pRL80059" + /codon_start=1 + /transl_table=11 + /product="putative NifS-like cysteine + desulfurase/selenocysteine lyase" + /protein_id="CAK02860.1" + /db_xref="EnsemblGenomes-Gn:pRL80059" + /db_xref="EnsemblGenomes-Tr:CAK02860" + /db_xref="GOA:Q1M9F8" + /db_xref="InterPro:IPR000192" + /db_xref="InterPro:IPR015421" + /db_xref="InterPro:IPR015422" + /db_xref="InterPro:IPR015424" + /db_xref="InterPro:IPR020578" + /db_xref="UniProtKB/TrEMBL:Q1M9F8" + /translation="MNAHASTRHQTYLNAAELAVIRSQYPIVKECIYWNNAAVSPISI + GVRDAIARQATLHATDTSGIMVASAPTCDKGRSLAAKLVGSTAERIAYIQNTSHGLSL + VALGIDWKPGDNLVVPELEFPSNFLIWETLSQQGVEIRTMKARNGALAPDDLRFLVDS + RTKLVAVSHVQFYSGFRVDLAGFSQICADHDALLVVDGTQSVGVLSVDMEGEGVDVLV + VSAHKWMLGPVGIGFAAFSERAFERIKPRIVGWLSVNDAFSFHRVLDFLPDAKRFEPG + TENGAGIFGLAKRLEEIDSIGMEKIESYVMELGARIRTQAKSAGYEIMSDWPEESQSG + IILLKNPKMATSVLFADLDAADVKCSVRNDAVRFSPHYYSNEDELALVSDVLRGAASR + ANA" + gene complement(64863..65702) + /locus_tag="pRL80060" + CDS complement(64863..65702) + /locus_tag="pRL80060" + /codon_start=1 + /transl_table=11 + /product="putative exported solute-binding protein" + /protein_id="CAK02861.1" + /db_xref="EnsemblGenomes-Gn:pRL80060" + /db_xref="EnsemblGenomes-Tr:CAK02861" + /db_xref="GOA:Q1M9F7" + /db_xref="InterPro:IPR001638" + /db_xref="UniProtKB/TrEMBL:Q1M9F7" + /translation="MKATHLGIFVALTLAQNAIGAEVPVTESVKSSGTLTIANGLDYA + PFEFVDANGQPAGLDVDLAHEAAKLITAKLDLQRIPFASQIPSLSAGRVKVAWATFTV + KEDRLKQVDFVTFLQSGTVAMVLPDKKDSISDAKSLCGKRVAVQTGSAADFTTDKLSE + DCAKSNLPKIDKVIYPEQKDTIQAVLTGRADARFDDSTAAGYYEQTSNGKLVVAPGVY + DVLPLGVAVQKGDKASAEMMQAIFQELIANGKYETVLDKYGMSLAAVKKSRIITSVDQ + IAE" + gene 65868..66821 + /locus_tag="pRL80061" + CDS 65868..66821 + /locus_tag="pRL80061" + /codon_start=1 + /transl_table=11 + /product="putative LysR-family transcriptional regulator" + /protein_id="CAK02862.1" + /db_xref="EnsemblGenomes-Gn:pRL80061" + /db_xref="EnsemblGenomes-Tr:CAK02862" + /db_xref="GOA:Q1M9F6" + /db_xref="InterPro:IPR000847" + /db_xref="InterPro:IPR005119" + /db_xref="InterPro:IPR011991" + /db_xref="UniProtKB/TrEMBL:Q1M9F6" + /translation="MAEFSLNQIDLNLLRTFDVLMRERSVTRAADRLGRTQSAISHSL + GRLRDVFKDDLFTREAGIMEPTARAKELAEVISQALHEIRVAVDRHLNFDPTTTSRNF + RIGLSDYTAVTYLPELIENFSMLAPNASLNVVHAREPDALGSLKNREVECAVLGNPKL + DAEHFEVVELSRDRMVCAGWTGNPAMADMSLDRYLASPHLQISADGIAAGVADITLQK + LGLHRKVVATIPHYLVAPWVIKGTELISAFGDGVLLALSEESETAIVPPPLELPDVTI + SLIFDRSNELDPGHVWFRNLIKDVSDRQRTLKQGVYERLEL" + gene complement(66886..67359) + /locus_tag="pRL80062" + CDS complement(66886..67359) + /locus_tag="pRL80062" + /codon_start=1 + /transl_table=11 + /product="putative endoribonuclease" + /protein_id="CAK02863.1" + /db_xref="EnsemblGenomes-Gn:pRL80062" + /db_xref="EnsemblGenomes-Tr:CAK02863" + /db_xref="InterPro:IPR006175" + /db_xref="InterPro:IPR013813" + /db_xref="UniProtKB/TrEMBL:Q1M9F5" + /translation="MPMNSSFEQRIIELAIDLPKPPGSVANYVATHQVGNLLFISGQL + CVSPGGTLVALGSLGENVSVEDGIRAARAAAINVIAQARASLGSLDKIKRVVRLGGFI + AATSAFSEHARVMNGASDVIVEIFGEQGRHARSTVGVSSLPLQAAVEVEALFELD" + gene complement(67431..68234) + /locus_tag="pRL80063" + CDS complement(67431..68234) + /locus_tag="pRL80063" + /codon_start=1 + /transl_table=11 + /product="putative ATP-binding component of ABC + transporter" + /protein_id="CAK02864.1" + /db_xref="EnsemblGenomes-Gn:pRL80063" + /db_xref="EnsemblGenomes-Tr:CAK02864" + /db_xref="GOA:Q1M9F4" + /db_xref="InterPro:IPR003439" + /db_xref="InterPro:IPR003593" + /db_xref="InterPro:IPR017871" + /db_xref="InterPro:IPR027417" + /db_xref="UniProtKB/TrEMBL:Q1M9F4" + /translation="MTAPGTHHERFGIASTTMVHAKGIRKSYGHLEVLKGVDLTVPSG + SVACIIGPSGSGKSTFLRCINHLEEINGGLMLVDGDFVGYRLEGNKLYELPPSAICQR + RAEIGMVFQQFNLFPHMTVIENLMEAPLRVKREPVAQATAKAIELLKRVGLAEKRDAY + PRQLSGGQQQRVAIARALAMNPKVLLFDEPTSALDPELVGEVLEVMKSLAREGITMVV + VTHEIGFAREVADQLIFMDGGLVVESGNPREMIANPQSPRTREFLARVL" + gene complement(68231..69172) + /locus_tag="pRL80064" + CDS complement(68231..69172) + /locus_tag="pRL80064" + /codon_start=1 + /transl_table=11 + /product="putative permease component of ABC transporter" + /protein_id="CAK02865.1" + /db_xref="EnsemblGenomes-Gn:pRL80064" + /db_xref="EnsemblGenomes-Tr:CAK02865" + /db_xref="GOA:Q1M9F3" + /db_xref="InterPro:IPR000515" + /db_xref="InterPro:IPR010065" + /db_xref="UniProtKB/TrEMBL:Q1M9F3" + /translation="MTGSQSTRRSADKIDPEQLTIVPLRHPWRWVAVALVLVSVAGMI + RSAISNPEFQWPIVAQYLFNPLILDGLWQTVLITAVVMALATLGGTIAALMMLSPSKL + LSVPAAAFVWFFRGTPALVQLIIWYNLSIVVKDITLWLPGVGTVYSVSTNDIMTPLVS + AIVALSLHEAGYMAEIVRSGLKSVNKGQYEASACLGMNPSLALRRIVLPQAMRIIIPP + TGNETINLLKTTSLVSIIAVGDLLYSAQSIYARTFETIPLLLVVSFWYLAVVSIMSGG + QFYLEKHFSRDEHRINPGITRAILGNLAKIGRKEALA" + gene 69345..69542 + /locus_tag="pRL80065" + CDS 69345..69542 + /locus_tag="pRL80065" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="hypothetical protein" + /protein_id="CAK02866.1" + /db_xref="EnsemblGenomes-Gn:pRL80065" + /db_xref="EnsemblGenomes-Tr:CAK02866" + /db_xref="UniProtKB/TrEMBL:Q1M9F2" + /translation="MRTRNVAGPPAGQSSKARGLALKMGEKLRRRVLRVRYDSGIPSG + HHVLAQNTRTNCQIGAHEKSI" + gene complement(69825..70028) + /locus_tag="pRL80066" + CDS complement(69825..70028) + /locus_tag="pRL80066" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="hypothetical protein" + /protein_id="CAK02867.1" + /db_xref="EnsemblGenomes-Gn:pRL80066" + /db_xref="EnsemblGenomes-Tr:CAK02867" + /db_xref="UniProtKB/TrEMBL:Q1M9F1" + /translation="MPRAVVYTTRPPARSTCGTGRPRQQPVADRSTITTGLREVDKRK + HQKAAAENTSLLRQRLMDVNEPP" + gene complement(70284..70775) + /locus_tag="pRL80067" + CDS complement(70284..70775) + /locus_tag="pRL80067" + /codon_start=1 + /transl_table=11 + /product="putative AsnC family transcriptional regulatory + protein" + /protein_id="CAK02868.1" + /db_xref="EnsemblGenomes-Gn:pRL80067" + /db_xref="EnsemblGenomes-Tr:CAK02868" + /db_xref="GOA:Q1M9F0" + /db_xref="InterPro:IPR000485" + /db_xref="InterPro:IPR011008" + /db_xref="InterPro:IPR011991" + /db_xref="InterPro:IPR019885" + /db_xref="InterPro:IPR019887" + /db_xref="InterPro:IPR019888" + /db_xref="UniProtKB/TrEMBL:Q1M9F0" + /translation="MRYLLDRIDEQILSALRTNARASHAELSGKVNLSRNAVRVRIER + LEREGFIKGYTIVTGDGGNDSPITTALMFVYRQDRMRGGEIIQALRTIPEVVACDVMT + GDFDLVVRVESPRADRIRQIWQLISELPGVRDTLTAFALSSVVRRDEVRRGQNPAASN + EPL" + gene complement(70772..71578) + /locus_tag="pRL80068" + CDS complement(70772..71578) + /locus_tag="pRL80068" + /codon_start=1 + /transl_table=11 + /product="putative kinase/phosphotransferase" + /protein_id="CAK02869.1" + /db_xref="EnsemblGenomes-Gn:pRL80068" + /db_xref="EnsemblGenomes-Tr:CAK02869" + /db_xref="GOA:Q1M9E9" + /db_xref="InterPro:IPR002575" + /db_xref="InterPro:IPR011009" + /db_xref="UniProtKB/TrEMBL:Q1M9E9" + /translation="MAVFTEISDEDRNSIAAAYGMTSLSSVIGIADGDRETTYLFRTA + GGEFIVTLFENGAEPLDLERAFATMEKLNNRGIPCPKPTRTVDGDATFQAAGRLVAIV + SFVAGSSTNNPTPEKSENLGRLMARIHVILQGGRKHSLDELPTGALHGALVPSNVFFL + GENVSGVINFRLRHDDVLISEIADVLISWASQPAGELDEQKARAILAGYESVRQLTEA + EKKALPAFVLASAARHYASKKEKIYMLEAAVRAFESSGFGQVARQPACRA" + gene complement(71562..72614) + /locus_tag="pRL80069" + CDS complement(71562..72614) + /locus_tag="pRL80069" + /note="C-terminus from codon 192 is similar to Pseudomonas + fluorescens Aspartate ammonia-lyase AspA SWALL:ASPA_PSEFL + (SWALL:P07346) (478 aa) fasta scores: E(): 4.9e-42, 50.18% + id in 273 aa, and C-terminus from codon 192 is similar to + Brucella suis aspartate ammonia-lyase AspA or br1958 + SWALL:Q8FYC6 (EMBL:AE014485) (483 aa) fasta scores: E(): + 5e-49, 55.55% id in 270 aa" + /codon_start=1 + /transl_table=11 + /product="putative fumarate lyase-family protein" + /protein_id="CAK02870.1" + /db_xref="EnsemblGenomes-Gn:pRL80069" + /db_xref="EnsemblGenomes-Tr:CAK02870" + /db_xref="GOA:Q1M9E8" + /db_xref="InterPro:IPR000362" + /db_xref="InterPro:IPR002220" + /db_xref="InterPro:IPR008948" + /db_xref="InterPro:IPR013785" + /db_xref="InterPro:IPR018951" + /db_xref="InterPro:IPR020557" + /db_xref="InterPro:IPR022761" + /db_xref="UniProtKB/TrEMBL:Q1M9E8" + /translation="MFKGSITAPVTPFADGHVDEGALRDLIEWQIDEGSFGLVPCGTT + GESPTLSHAASSVKVGRTQLQDAAPVTAGKEFEAFAEFIDEDIVQVHYASKLLKEINL + CGSASGTAINVPSGFAQAVCDHLSQKSGYRLIPARNFIEATSDTGGFVSFVLEGIAFN + LTKICNDLRLLSSGPRGGLGEIRLPPVQAGSSIMPGKGNPFIPEMMNQISFQVIGNDL + TVTLAASAGQLQLNAMEPVIVLNILQSMRMLTRGMVILMERCIDGIEVDVDRCQALLD + QSIVLAKPLAKLIGYSKAADLTKKALVEKRNLRDVVEEERGLTESQVHQIYGDTAEFA + NFSRSGWTEGKDGRFH" + gene complement(72660..74075) + /locus_tag="pRL80070" + CDS complement(72660..74075) + /locus_tag="pRL80070" + /codon_start=1 + /transl_table=11 + /product="putative aldehyde dehydrogenase" + /protein_id="CAK02871.1" + /db_xref="EnsemblGenomes-Gn:pRL80070" + /db_xref="EnsemblGenomes-Tr:CAK02871" + /db_xref="GOA:Q1M9E7" + /db_xref="InterPro:IPR015590" + /db_xref="InterPro:IPR016161" + /db_xref="InterPro:IPR016162" + /db_xref="InterPro:IPR016163" + /db_xref="UniProtKB/TrEMBL:Q1M9E7" + /translation="MHNLALTQATGLEEIAVTSPFDGTLVGTVVETCAASVDALLERA + RRGAQIARTLPRHKRSAILETAAAAIEASREEFALLIAKEAGKTITQARKETIRCVNT + LKLSADEAKRNAGEVIPFDAYAGSESRQGWYTREPLGIIAAITPYNDPLNLVAHKLGP + AIAGGNAVLLKPSEFTPLSAVKLVEVMIESGLPEEIITVAIGGPELGKALVAARDVRM + ISFTGGFATGEAIAKTAGLKKLAMDLGGNAPVIVMENCNFDAAVEACVSGAYWAAGQN + CIGTQRILIQRPIYQQFKAKFVADTKKLKTGNPLDADTDVGPLISEKAVSRAIAMVER + ALAAGATLLCGHRPAGNLYPPTVLEDVPISCDVWGEEVFAPIVILQPFDGLADAIGLA + NGPDYSLHAAIFTNDLEAALETASKIEAGGVMVNDSSDYRFDAMPFGGFKYGSMGREG + VRFAYEDMTQPKVVCINRLKP" + gene complement(74075..75136) + /gene="hom2" + /locus_tag="pRL80071" + CDS complement(74075..75136) + /gene="hom2" + /locus_tag="pRL80071" + /EC_number="1.1.1.3" + /codon_start=1 + /transl_table=11 + /product="putative homoserine dehydrogenase" + /protein_id="CAK02872.1" + /db_xref="EnsemblGenomes-Gn:pRL80071" + /db_xref="EnsemblGenomes-Tr:CAK02872" + /db_xref="GOA:Q1M9E6" + /db_xref="InterPro:IPR001342" + /db_xref="InterPro:IPR005106" + /db_xref="InterPro:IPR016040" + /db_xref="InterPro:IPR019811" + /db_xref="InterPro:IPR022697" + /db_xref="UniProtKB/TrEMBL:Q1M9E6" + /translation="MTIYNIALIGFGGVNRALTELIAAKNQLWERDLGFRLNIVAVSD + LYLGSVISPNGLDAKTLVDAKFEKGGFGQLSGGSAEADNETIIKTAPADIVVEATYTN + PKDGEPAVSHCRWALETGKHVVTTNKGPVAIAAPALKAFAKTNGVRFEYEGAVMSGTP + VIRMAERTLAGAELKGFEGILNGTSNFVLGRMESGLDFASAVKEAQKLGYAEADPTAD + VEGFDVRLKVVILANELLGANLKPEDISCKGVSGLSLSDIEEAAKANSRWKLIGAAVR + NDDGSVTGSVSPKRLGLDHPLAGVNGATNAVSLDTELLGAVTITGPGAGRIETAYALL + SDIVAIHSAGASSTAKEAA" + gene complement(75248..75613) + /locus_tag="pRL80072" + CDS complement(75248..75613) + /locus_tag="pRL80072" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="hypothetical protein" + /protein_id="CAK02873.1" + /db_xref="EnsemblGenomes-Gn:pRL80072" + /db_xref="EnsemblGenomes-Tr:CAK02873" + /db_xref="UniProtKB/TrEMBL:Q1M9E5" + /translation="MMVGKSGPTASSMAATATLGKRGRWVEEMATRTAACPDSSSPDG + FLPAIQVPSPFSEPNIQRKFAAPFGRLHRVRHCRLQLQPARSCLLGTTILGEQASPMR + LADGALIVSGMPVMKLSRT" + gene complement(76316..77521) + /locus_tag="pRL80073" + CDS complement(76316..77521) + /locus_tag="pRL80073" + /note="Similar to Synechococcus sp. SufS SWALL:AAQ82450 + (EMBL:AY375041) (420 aa) fasta scores: E(): 3.7e-26, + 28.39% id in 398 aa, and to Bradyrhizobium japonicum + Blr4730 protein blr4730 SWALL:Q89L18 (EMBL:AP005952) (580 + aa) fasta scores: E(): 2.6e-86, 57.53% id in 398 aa" + /codon_start=1 + /transl_table=11 + /product="putative cysteine desulfurase" + /protein_id="CAK02874.1" + /db_xref="EnsemblGenomes-Gn:pRL80073" + /db_xref="EnsemblGenomes-Tr:CAK02874" + /db_xref="GOA:Q1M9E4" + /db_xref="InterPro:IPR000192" + /db_xref="InterPro:IPR015421" + /db_xref="InterPro:IPR015422" + /db_xref="InterPro:IPR015424" + /db_xref="InterPro:IPR020578" + /db_xref="UniProtKB/TrEMBL:Q1M9E4" + /translation="MSQINRDARKALDLPRLRADTPGTRNRNHLNNAGAALMPSPVID + AVIGYLSREGEIGGYEAAAEANSLLEGAYDSLATFVNCARDEIAIAENATIAWQRAFY + SLSFGPGDRILTASAEFAANYVAFLQVAKRTGVSIEVIPNDASGVLDPDALTKMIDER + VRLIAVTWIPTNGGLINPAAAIGRIARDNGILYLLDACQAAGQTPIDVNALGCDILTA + TGRKFLRAPRGTGFMYMRKSVLEKIEPAMIDLYGAPWTAPDRYELRPDARRFETWEKN + CSVRLGLRAAVDYALEIGLENIEARCSHLSSRLREGLRGMRAVSVHDLGAPLASIISF + TVNGWDSPAVMAYLAGKGINVSVSPPSSTPVDAYTRQLPPVVRASPHYYNSEEEIEAF + LEAIAGIAA" + gene complement(77684..78604) + /locus_tag="pRL80074" + CDS complement(77684..78604) + /locus_tag="pRL80074" + /codon_start=1 + /transl_table=11 + /product="putative LysR-family transcriptional regulator" + /protein_id="CAK02875.1" + /db_xref="EnsemblGenomes-Gn:pRL80074" + /db_xref="EnsemblGenomes-Tr:CAK02875" + /db_xref="GOA:Q1M9E3" + /db_xref="InterPro:IPR000847" + /db_xref="InterPro:IPR005119" + /db_xref="InterPro:IPR011991" + /db_xref="UniProtKB/TrEMBL:Q1M9E3" + /translation="MATAIDHLDWDDLKLFLIVVRCKSVTGAARELKVSHSTVSRRLA + RLEYTVGGALVERTRDGLLLTPAGLVTMRRAEEIENGVNALRSDVSNRDEVRGTVRLA + TMEGIATLYLSERLVELSSRYPDLDIELVTSPQTVRVARREADLFLSFFKPHGTALDS + QLIGRFKTGLFASQAYLERNGVPSQAADLREHRFVGYIEELVLLESVLWLEELVPAPT + MAFSSNSMMSQMFAASAGAGIVALPEFARSLKLGLIPVLEELSGEREIWMSAHQDLAY + LPRVRAVKQFVKALVRRDEQRLLGNVPWSR" + gene 78784..79167 + /locus_tag="pRL80075" + CDS 78784..79167 + /locus_tag="pRL80075" + /codon_start=1 + /transl_table=11 + /product="putative endoribonuclease L-PSP family protein" + /protein_id="CAK02876.1" + /db_xref="EnsemblGenomes-Gn:pRL80075" + /db_xref="EnsemblGenomes-Tr:CAK02876" + /db_xref="GOA:Q1M9E2" + /db_xref="InterPro:IPR006056" + /db_xref="InterPro:IPR006175" + /db_xref="InterPro:IPR013813" + /db_xref="UniProtKB/TrEMBL:Q1M9E2" + /translation="MSRRTVNASNAAAVGPYSHATWAGNLLFCSGQTPLDSSTGKLVD + GTVADQTRQCFDNLFEVLEAAGLGSDDVVSVNVYLTDMDDFGQMNEIYATRFSSPYPA + RTTIGCASLPLGARIEIGLTAKRQS" + gene 79231..80253 + /locus_tag="pRL80076" + CDS 79231..80253 + /locus_tag="pRL80076" + /EC_number="3.5.5.7" + /codon_start=1 + /transl_table=11 + /product="putative aliphatic nitrilase" + /protein_id="CAK02877.1" + /db_xref="EnsemblGenomes-Gn:pRL80076" + /db_xref="EnsemblGenomes-Tr:CAK02877" + /db_xref="GOA:Q1M9E1" + /db_xref="InterPro:IPR003010" + /db_xref="UniProtKB/TrEMBL:Q1M9E1" + /translation="MGNMKFWAAAAHIAPVYLDPGASAEKACSVIAEAARNGASLVVF + SESFLPGFPVWAALYPPIQSHEHFKRFLTASVYIDGPEIERVRKAASDNGVFVSIGFS + ERNPASVGGLWNSNVLISDTGQILIHHRKLVATFFEKLVWDPGDGAGLVVANTRIGRI + GGLICGENTNPLARYSLMTQGEQVHISSYPPIWPTRVPTESDNYDNRAANRIRASAHC + FEAKCFGIIVAGHLDEVARKSIALDDPAIEAIIDASPRATSFFLGPTGAATGDEMIDE + GIGYAQIDLDDCVEPKRFHDVVAGYNRFDIFDVTVNRVRRNPIRFLEGRAEDALTSPE + AVAVPE" + gene 80256..81245 + /locus_tag="pRL80077" + CDS 80256..81245 + /locus_tag="pRL80077" + /codon_start=1 + /transl_table=11 + /product="putative molybdenum-binding oxidoreductase" + /protein_id="CAK02878.1" + /db_xref="EnsemblGenomes-Gn:pRL80077" + /db_xref="EnsemblGenomes-Tr:CAK02878" + /db_xref="GOA:Q1M9E0" + /db_xref="InterPro:IPR000572" + /db_xref="InterPro:IPR005066" + /db_xref="InterPro:IPR008335" + /db_xref="InterPro:IPR014756" + /db_xref="UniProtKB/TrEMBL:Q1M9E0" + /translation="MMARTGMHMTKKGFSFQGPADGVHALKSWITPEDDLFLVTHMGF + LEIDPEHWHLDVDGLVGNPTRLHLSDLQAMPQREYMSFHECAGSPLAPTVAKRRIGNV + VWKGVPLSLVLERAKISTDASYVWTSGLEWGEYAEIEEAYQKDLPIEKALAEEVLLAL + EINGRPLTPERGGPVRLVVPGWYGTNSVKWVGSITAANRRASGAYTTRFYNDPTASGT + KPVWDVTPESVIVSPSPNDLLSADMPTKIWGWAWGDCQISSVEVSVDGGGSWRTASVG + PREGRSWQRFELTWSPEPGPHVLLCRCKNELGEEQPVSDARNAVHSVQVQVDF" + gene 81245..81457 + /locus_tag="pRL80078" + CDS 81245..81457 + /locus_tag="pRL80078" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="hypothetical protein" + /protein_id="CAK02879.1" + /db_xref="EnsemblGenomes-Gn:pRL80078" + /db_xref="EnsemblGenomes-Tr:CAK02879" + /db_xref="UniProtKB/TrEMBL:Q1M9D9" + /translation="MPRFYSNQEVTRHHDFSDATDADLAIYRGEISSVVQLLIAPTIA + SLAIGNRHAFWVSAATRSQASPGTKN" + gene complement(81899..82882) + /locus_tag="pRL80079" + CDS complement(81899..82882) + /locus_tag="pRL80079" + /codon_start=1 + /transl_table=11 + /product="putative transcriptional regulator" + /protein_id="CAK02880.1" + /db_xref="EnsemblGenomes-Gn:pRL80079" + /db_xref="EnsemblGenomes-Tr:CAK02880" + /db_xref="GOA:Q1M9D8" + /db_xref="InterPro:IPR007324" + /db_xref="InterPro:IPR007630" + /db_xref="InterPro:IPR013324" + /db_xref="UniProtKB/TrEMBL:Q1M9D8" + /translation="MTKLTRMPPTSLLDAESLRLKAAFLYYNQKLTQNEVAAKLGVSR + STIVKLLDEALKRGEIQIWVKQAASELELASELEAALNLDEVIVTPPAKDVDGTARAV + GQALGQFLSDTIPNNATIGVGWGRTLSAALSSFRPLRREGVKIVSLLGGTVEAQHENP + IDFTWQLANQLGAQCFLLMAPLLVDSPDTKERLIEKCGLNRIMKLSADLDIALVSVGD + IGTHSTSLSVASLAPEELETLIGKGAMCDVLCNFLDRDGRTVDHPVNDRVMSVDLDTV + RRARHVVIASGGEQRAAAILAAIRRIGCNTLVTDESAARQMLSLLRSPRVD" + gene complement(83064..84020) + /locus_tag="pRL80080" + CDS complement(83064..84020) + /locus_tag="pRL80080" + /codon_start=1 + /transl_table=11 + /product="putative fructokinase" + /protein_id="CAK02881.1" + /db_xref="EnsemblGenomes-Gn:pRL80080" + /db_xref="EnsemblGenomes-Tr:CAK02881" + /db_xref="GOA:Q1M9D7" + /db_xref="InterPro:IPR002173" + /db_xref="InterPro:IPR011611" + /db_xref="InterPro:IPR029056" + /db_xref="UniProtKB/TrEMBL:Q1M9D7" + /translation="MIVVCGDALIDFLPVALPEGGSGYIPVCGGSCCNIATAIGRLGG + KVGFMGGLSEDFFGAMLVQQFNEAGIDLRYATRLPFDTTLAFVRLGDDEPEYAFYDSG + SAARHWTLKGAPSLGTEVDVLHIGSVTLIHPPVSSACESLFENEQGKRVLSIDPNCRP + GLAQDPEAYRQRLNRLCGMADIVKLSVTDLGFMQPGVGPHSAAESWLSNRAKIVLVSR + GAGGATVYLAGGRVVEVPARPARVVDTVGAGDALIAGFLTHLQQSGDLHRDSIGALTG + DRARKALEFAAHVASLACEHRGSDPPWRREIIVAGYDESSQM" + gene complement(84025..84819) + /locus_tag="pRL80081" + CDS complement(84025..84819) + /locus_tag="pRL80081" + /note="Similar to C-terminus from codon 127 of Pseudomonas + putida dihydrolipoyllysine-residue acetyltransferase + component of acetoin cleaving system acoC SWALL:ACOC_PSEPU + (SWALL:Q59695) (370 aa) fasta scores: E(): 2.5e-17, 31.98% + id in 247 aa, and similar to the C-terminus from codon 133 + of Acetobacter pasteurianus esterase2; est2 SWALL:O66382 + (EMBL:AB013096) (406 aa) fasta scores: E(): 9.5e-17, + 32.65% id in 245 aa. Upstream CDS is similar to the + N-terminus of putative biotin-binding hydrolases. It is + possible that these two CDS constitute a pseudogene rather + than two separate CDS features" + /codon_start=1 + /transl_table=11 + /product="putative hydrolase" + /protein_id="CAK02882.1" + /db_xref="EnsemblGenomes-Gn:pRL80081" + /db_xref="EnsemblGenomes-Tr:CAK02882" + /db_xref="GOA:Q1M9D6" + /db_xref="InterPro:IPR000073" + /db_xref="InterPro:IPR029058" + /db_xref="UniProtKB/TrEMBL:Q1M9D6" + /translation="MTNATANLPLPVFRLGGSGPNLLLLHGFGADRMSWIANQDALMQ + SFAVFSCDLPAHGGQPPGRNGMKISEMADELIDALARQNDRFVVVGHSLGGAIAIELA + ARRPDLVAGLGLIAPAGLGKEVGREFLSELPELNELGSALALLQRLVSRPRLITPPIA + QRLLDHLERPGIRAALRALASELSHVETSIEPHVASIAVSRVPRIVIWGEEDTINPID + RPRLSRFNAQVLTLPDTGHLPHVEASRAVSRHLCEFLKSAVLELGG" + gene complement(84816..85052) + /locus_tag="pRL80082" + CDS complement(84816..85052) + /locus_tag="pRL80082" + /note="Similar to N-terminal 80 residues of + Methylobacterium extorquens dihydrolipoamide + succinyltransferase SWALL:Q8KTE4 (EMBL:AF497852) (442 aa) + fasta scores: E(): 7.3e-07, 44.87% id in 78 aa, and to + N-terminal 80 residues of Staphylococcus epidermidis + dihydrolipoamide S-acetyltransferase se0256 SWALL:Q8CTW0 + (EMBL:AE016744) (425 aa) fasta scores: E(): 4.5e-09, + 48.68% id in 76 aa. Downstream CDS is similar to the + C-terminus of putative biotin-binding hydrolases. It is + possible that these two CDS constitute a single pseudogene + rather than two separate CDS features" + /codon_start=1 + /transl_table=11 + /product="putative biotin-binding protein" + /protein_id="CAK02883.1" + /db_xref="EnsemblGenomes-Gn:pRL80082" + /db_xref="EnsemblGenomes-Tr:CAK02883" + /db_xref="InterPro:IPR000089" + /db_xref="InterPro:IPR003016" + /db_xref="InterPro:IPR011053" + /db_xref="UniProtKB/TrEMBL:Q1M9D5" + /translation="MDIPIIMPNLGNEIDEAQIDEWFKTEGDMVTEGEQLVLITTPKV + TMEIEAPATGILKKILIPADELAAVGSTLGIIET" + gene complement(85082..87496) + /locus_tag="pRL80083" + CDS complement(85082..87496) + /locus_tag="pRL80083" + /note="CDS is a fusion protein of which the C-terminus + from codon 456 is similar to Sulfolobus solfataricus + pyruvate dehydrogenase, beta subunit PdhB-2 or sso1526 + SWALL:Q97Y22 (EMBL:AE006767) (324 aa) fasta scores: E(): + 1.3e-45, 43.76% id in 329 aa, and the N-terminus to codon + 456 is similar to Chlamydia pneumoniae pyruvate PdhA/PdhB + or PdhA_PdhB or cpn0033 or cp0743 or cpb0037 SWALL:Q9Z9E8 + (EMBL:AE001588) (678 aa) fasta scores: E(): 2.5e-34, + 29.55% id in 653 aa" + /codon_start=1 + /transl_table=11 + /product="putative dehydrogenase, fusion" + /protein_id="CAK02884.1" + /db_xref="EnsemblGenomes-Gn:pRL80083" + /db_xref="EnsemblGenomes-Tr:CAK02884" + /db_xref="GOA:Q1M9D4" + /db_xref="InterPro:IPR001017" + /db_xref="InterPro:IPR005475" + /db_xref="InterPro:IPR005476" + /db_xref="InterPro:IPR009014" + /db_xref="InterPro:IPR029061" + /db_xref="UniProtKB/TrEMBL:Q1M9D4" + /translation="MKAPSIPIHAYDENIEAERARYGDEGLLQILRDMIIIREFETIL + ASLKGKGAYCGIEFNYKGPAHLSIGQEAAAVGAAASLEPDDHIFGSHRSHGEFIAKGL + SAIRKLPDDALRLIMESHERGTLLRTVETSLQGPKTSETAENFLLLGLLSEIFMRSTG + FNRGMGGSMHAFFPPFGTYPNNAIVGASAGIATGAALRKKLAGASGITVANAGDGSTG + CGPVWEAMNFAAMAQFETLWADAFKGGLPVLFFFTNNFYAMGGQTIGETMGWDRLSRI + GLAVNQQAIHAETVDGTNPLAVADAVARKRELLVQGRGPALLDVECYRSSGHSTTDIN + SYRTKDEMQAWEQHDPIILFSNRLQEAGIVTAGQVAELREQTTDRMRSITAIAVDPAL + TPPVDIHADPTLIGKLMFSNTNIELPSQEVSLLKPVDEVSRIRQDAKKSRYGFAEDGA + KLSPMRAITLRDALFESVLHHMTHDGSLVAYGEECREWGGAFGVYRGLAEILPHDRLF + NSPISEAAIVATAVGFALEGGRALVELMYGDFLGRAGDEVFNQMAKWQSMSGGELKVP + VVLRCSIGSKYGAQHSQDWTALCAHIPGLKVVYPATPYDAKGLLASALSGNDPVVFFE + SQRLYDTVEEFRNEGVPTGYYQLPIGEPDCKRAGEDVTILTVGPSLYSALAAAEELES + TFGISVEVIDARSLVPFNYEPVLASIRKTGRIVLVSEASERGSFLMTLAANITRFGYE + TLHAAPRVIGSPNWIVPGAEMESTYFPQKDDIIDVITSELFPGQRSNRRSIRDWDDRE + LARLGL" + gene 87873..88718 + /locus_tag="pRL80084" + CDS 87873..88718 + /locus_tag="pRL80084" + /codon_start=1 + /transl_table=11 + /product="putative epimerase/isomerase" + /protein_id="CAK02885.1" + /db_xref="EnsemblGenomes-Gn:pRL80084" + /db_xref="EnsemblGenomes-Tr:CAK02885" + /db_xref="GOA:Q1M9D3" + /db_xref="InterPro:IPR013022" + /db_xref="UniProtKB/TrEMBL:Q1M9D3" + /translation="MAKFGMHFSLWAPEWTTEAANAAIPEAARYGLEIIEIPLFEPAK + IDLDHAKSIIRDHGLQATASLCLPEDKMAHLAPEACTQYLFQVLDAAHHIGCSMLTGV + TYSALGYKTGVPPTSSEYEAVVRALKPVARRAAGLGMTFGVEPCTRFDTHILNTAAQG + IWLLEQIDEPNTFVHLDTYHMNVEESGFDDGIRQAAGRSPYIHLSESHRGVPGTGTVD + WELVFRTLRDTGFDGDLVIESFVSVPPQLAAALCMWRPAAPNAGAVLDQGLPYLRGLA + TRYGL" + gene 88797..89795 + /locus_tag="pRL80085" + CDS 88797..89795 + /locus_tag="pRL80085" + /codon_start=1 + /transl_table=11 + /product="putative substrate-binding component of ABC + transporter" + /protein_id="CAK02886.1" + /db_xref="EnsemblGenomes-Gn:pRL80085" + /db_xref="EnsemblGenomes-Tr:CAK02886" + /db_xref="InterPro:IPR025997" + /db_xref="InterPro:IPR028082" + /db_xref="UniProtKB/TrEMBL:Q1M9D2" + /translation="MKRRDILKFSLAAGVAWLIATPNLAMAADPVMVTVVKIAGIPYF + GALERGLQEAGKQFNIDVSMTGPANIDPAQQVKLLEDLIAKKVDVIGLVPLDVKACEP + VLKRAQAAGIKVIVHEGPEQEGRDWDVELIDSTKFGEVQMQSLAKEMGEEGDYVVYVG + TLTTPLHNKWADAAIAYQKAHYPKMNLVADRFPGADEIDSAYRTTIDVLKAYPKLKGI + LAFGSNGPIAAGNAVKEKHLSKRVAVIGTVLPSQAKDLIMDGVIREGFMWNPREAGSA + MVAVARLVLDGTKIEDGMDVPGLGKATVDVPGKLIKVDKITHINKETVDGLIAQGL" + gene 89878..91362 + /locus_tag="pRL80086" + CDS 89878..91362 + /locus_tag="pRL80086" + /codon_start=1 + /transl_table=11 + /product="putative ATP-binding component of ABC + transporter" + /protein_id="CAK02887.1" + /db_xref="EnsemblGenomes-Gn:pRL80086" + /db_xref="EnsemblGenomes-Tr:CAK02887" + /db_xref="GOA:Q1M9D1" + /db_xref="InterPro:IPR003439" + /db_xref="InterPro:IPR003593" + /db_xref="InterPro:IPR017871" + /db_xref="InterPro:IPR027417" + /db_xref="UniProtKB/TrEMBL:Q1M9D1" + /translation="MTTFLELTHVSKHFGGVRALRDVDLSLEAGEVHCLVGENGSGKS + TLIKIIAGVQAPDPGGSIVLEGREHARLDPILSTKSGIQVIYQDLSLFPNMSVAENIA + IGSHMGLPRLANWNRINDIAAKAMARINVNLDLETMVSDLSIANRQLVAICRAMAADA + KLVIMDEPTASLTRHEVDSLLRVVNDLKSRDICTVFVSHRLDEVMEIAERVTVLRDGG + KVGTFDASEITSRRLETLMTGHEFHYAPPRPGGEAAEVVLAVRNLSRPGHYEDISFDI + RKGEIVGLTGLLGSGRTELALSIFGMNPPSRGTIEVSGKPLIASSNRVAIASGVAYVP + EDRLMLGLALGQPISANILATVLDSLAGKFGLINPAKRVAAADDWIVRLNTKVSDLEN + PVGTLSGGNQQRVVLGKWMATKPRVLILDSPTVGVDIKAKDGIYEIVHRLAAEGVGVL + LISDEAQEVFYHTHRVLVMRQGRLVSEVDPLSSTERNLQEEIYA" + gene 91355..92344 + /locus_tag="pRL80087" + CDS 91355..92344 + /locus_tag="pRL80087" + /codon_start=1 + /transl_table=11 + /product="putative permease component of ABC transporter" + /protein_id="CAK02888.1" + /db_xref="EnsemblGenomes-Gn:pRL80087" + /db_xref="EnsemblGenomes-Tr:CAK02888" + /db_xref="GOA:Q1M9D0" + /db_xref="InterPro:IPR001851" + /db_xref="UniProtKB/TrEMBL:Q1M9D0" + /translation="MPKTIQRWTRSHEFWLLAVVIVLSLFLTAATDSFLTLQNLFDLL + TSTSFAGILAAGLLVVLVFGGIDISFTAIASVAQYVALMIAKTYPIGWFGVFLVACCT + GILCGLFNAAIIHKVRISSVIVTISTLNIFYGLLIYITRGDYITSLPSYFREGIWWFE + FTDSNGFPYAINFQALLLVVAFFMTWVLLNKTNIGRQIYAMGGNEIAAERLGFHVFGL + RCLVYGYMGFMAAIASISQAQLAQSVTPTTLIGKELEVLAAVVLGGASLAGGNGSVFG + AVLGVMLIAILQNGLILLGVSSYWNQFFVGCVILLAVSATALSQRRRHAGLAS" + gene 92358..93416 + /locus_tag="pRL80088" + CDS 92358..93416 + /locus_tag="pRL80088" + /codon_start=1 + /transl_table=11 + /product="putative permease component of ABC transporter" + /protein_id="CAK02889.1" + /db_xref="EnsemblGenomes-Gn:pRL80088" + /db_xref="EnsemblGenomes-Tr:CAK02889" + /db_xref="GOA:Q1M9C9" + /db_xref="InterPro:IPR001851" + /db_xref="UniProtKB/TrEMBL:Q1M9C9" + /translation="MKPSANRSLISRFVRENATTMTLATIFMAVLAVFGLILGDRLLS + VGTFQSIAFQTPELGILGLAMMLALLSGGLNLSIISTANLCALTIASVLQFTIPWGDA + GSVLWLTWQVGAVAAGLAVAILIGLLNGFIIAYLGVSPILATLGTMIACKGLAIGLTR + GNVLSGFSDPIVAIGNGTYLGVPLAFLLFVALCVFVSVVLRRSSFGQKVYLVGANEKA + AQFSGIHVKRVLLLTYALSGALAGCGGLVMMARFNSANASYGESFLLISILAAVLGGI + DPYGGTGKVSGLFAALLLLQLISSAFNLMNFSQFLTIAIWGALLIGVSALRSGTGIFD + RLHLFEFWRAKSAVSENP" + gene 93578..93829 + /locus_tag="pRL80089" + /pseudo + CDS 93578..93829 + /locus_tag="pRL80089" + /pseudo + /codon_start=1 + /transl_table=11 + gene 93990..94445 + /locus_tag="pRL80090" + CDS 93990..94445 + /locus_tag="pRL80090" + /codon_start=1 + /transl_table=11 + /product="putative fucose operon protein" + /protein_id="CAK02891.1" + /db_xref="EnsemblGenomes-Gn:pRL80090" + /db_xref="EnsemblGenomes-Tr:CAK02891" + /db_xref="GOA:Q1M9C8" + /db_xref="InterPro:IPR007721" + /db_xref="InterPro:IPR023750" + /db_xref="UniProtKB/TrEMBL:Q1M9C8" + /translation="MLKNIDPALNADVLHALRSMGHGDTVVVSDTNFPSDSIARQTVL + GKLLRIDNVSAARAIKAILSVMPLDTPLQPSAGRMEIMGAPDEIPPVQQEVQAVVDGA + EGKPALMYGIERFAFYEEAKKAYCVITTGENRFYGCFLFTKGVIPPETV" + gene 94458..94652 + /locus_tag="pRL80091" + /pseudo + CDS 94458..94652 + /locus_tag="pRL80091" + /pseudo + /codon_start=1 + /transl_table=11 + gene 94652..95527 + /locus_tag="pRL80092" + CDS 94652..95527 + /locus_tag="pRL80092" + /codon_start=1 + /transl_table=11 + /product="putative permease component of ABC transporter" + /protein_id="CAK02893.1" + /db_xref="EnsemblGenomes-Gn:pRL80092" + /db_xref="EnsemblGenomes-Tr:CAK02893" + /db_xref="GOA:Q1M9C7" + /db_xref="InterPro:IPR000515" + /db_xref="UniProtKB/TrEMBL:Q1M9C7" + /translation="MTRNDPVKHFFIWPALLIVLVISIFPLIYSLTTSFMSLRLVPPI + PAHFVGFGNYAELLQNPRFWSVTWTTTIIAFVAVSLQYVIGFSVALALSRRVPGEGLF + RVSFLVPMLVAPVAVALIARQILNPTMGPLNELMTAFGFPNLPFLTQTRWAIGAIISV + EVWQWTPFVILMLLAGLQTLPEDVYEAAALENASPWQQFWGITFPMMLPISVAVVFIR + LIESYKIIDTVFVMTGGGPGISTETLTLFAYQEGFKKFNLGYTSALSFLFLIVITVIG + LVYLAILKPYLEKHK" + gene 95524..96552 + /locus_tag="pRL80093" + CDS 95524..96552 + /locus_tag="pRL80093" + /codon_start=1 + /transl_table=11 + /product="putative permease component of ABC transporter" + /protein_id="CAK02894.1" + /db_xref="EnsemblGenomes-Gn:pRL80093" + /db_xref="EnsemblGenomes-Tr:CAK02894" + /db_xref="GOA:Q1M9C6" + /db_xref="InterPro:IPR000515" + /db_xref="UniProtKB/TrEMBL:Q1M9C6" + /translation="MSVRDLKGSGRWWALAGCLLWLAFTFFPLYWVAITSFKSPLGVV + GGPTYVPFVDFDPTLTAWSELLSGARGQFYNTFIASTIVGLSASVLATFIGSMAAYAL + VRFTFEVRLLSGVIFVVVAFGGYLLGRHVLGFGQAISLIYAFVAALALAVGSSRIKLP + GPVLGNDDIVFWFVSQRMFPPIVAAFALFLMYTEMGKMGIKLVDTYTGLTFAYVAFSL + PIVIWLMRDFFAALPVEVEEAAMVDNVPTWRIFFGIVLPMSKPGLIATFMITLAFVWN + EFLFALFLTNSKWQTLPILVAGQNSQRGDEWWAISAAALVAIIPMVVMAGILSRLMRS + GLLLGAIK" + gene 96597..98051 + /locus_tag="pRL80094" + CDS 96597..98051 + /locus_tag="pRL80094" + /codon_start=1 + /transl_table=11 + /product="putative solute-binding component of ABC + transporter" + /protein_id="CAK02895.1" + /db_xref="EnsemblGenomes-Gn:pRL80094" + /db_xref="EnsemblGenomes-Tr:CAK02895" + /db_xref="GOA:Q1M9C5" + /db_xref="InterPro:IPR006059" + /db_xref="UniProtKB/TrEMBL:Q1M9C5" + /translation="MRRLLLSSTAAALLAAAGTTSALACEPDYTGVTLTATTQTGPYI + ASALQLAGKGWEEKTCGKVNVVEFPWSELYPKIVTSLTSGEDTFDVVAFAPAWAPDFT + DYLSEMPKAMQSGADWEDIAPVYREQLMVWNGKVLSQTMDGDAHTYTYRIDLFENAEN + QSAFKAKYGYDLAPPKTWKQYLDIAEFFQQPDKGLWGTAEAFRRGGQQFWFLFSHVAG + YTSHPDNPGGMFFDPDTMDAQVNNPGWVRGLEEYIRASKLAPPNALNFSFGEVNAAFA + GGQVAESIGWGDTGVIAADPKQSKVAGNVGSASLPGSDEIWNYKTKKWDKQAEVVQTS + FMAFGGWQAAVPSSSKNQEAAWNYIHFLTSPAVSGQAAITGGTGVNPYRLSHTTNTKL + WSKIFSEREAKEYLGAQKDAVTAKNTALDMRLPGYFSYTEILEIELSKALAGEVTPQQ + ALDTVADGWNKLTDEFGRDKQRAAYRSSMGLPAK" + gene 98167..99258 + /locus_tag="pRL80095" + CDS 98167..99258 + /locus_tag="pRL80095" + /codon_start=1 + /transl_table=11 + /product="putative ATP-binding component of ABC + transporter" + /protein_id="CAK02896.1" + /db_xref="EnsemblGenomes-Gn:pRL80095" + /db_xref="EnsemblGenomes-Tr:CAK02896" + /db_xref="GOA:Q1M9C4" + /db_xref="InterPro:IPR003439" + /db_xref="InterPro:IPR003593" + /db_xref="InterPro:IPR008995" + /db_xref="InterPro:IPR012340" + /db_xref="InterPro:IPR013611" + /db_xref="InterPro:IPR017871" + /db_xref="InterPro:IPR027417" + /db_xref="UniProtKB/TrEMBL:Q1M9C4" + /translation="MSQVRLDQVTKSFGSVAVIPPLDLVIADKEFVVLVGPSGCGKTT + TLRMIAGLEQATSGEIRIGEREVTALRPGLRNCSMVFQNYALYPHMTVAENIGYGMKV + RGTPKEDIDTAVANAARILNLGAYLNRKPSALSGGQRQRVAIGRAIVRQPDVFLFDEP + LSNLDAKLRIEMRTEIKLLHRRLQTTIVYVTHDQVEAMTMADRVVVMNQGRIEQAADP + ITLYESPKNLFVAAFIGAPSMNFVQGRLEAGDGGVVFRAEGDVAIVVPARMEEHLSAG + IGQAVVLGIRPEHTMTADSTFPMIRVHVADIEPLGPHTLAIGKAGASAFTAQIHASSR + VRPEDTIDVPIDPEKMHFFLKSTGEALRR" + gene 99428..100429 + /locus_tag="pRL80096" + CDS 99428..100429 + /locus_tag="pRL80096" + /codon_start=1 + /transl_table=11 + /product="putative transposase for insertion sequence + (IS30 family) element" + /protein_id="CAK02897.1" + /db_xref="EnsemblGenomes-Gn:pRL80096" + /db_xref="EnsemblGenomes-Tr:CAK02897" + /db_xref="GOA:Q1M9C3" + /db_xref="InterPro:IPR001584" + /db_xref="InterPro:IPR012337" + /db_xref="InterPro:IPR025246" + /db_xref="UniProtKB/TrEMBL:Q1M9C3" + /translation="MSRCYMQLALADRRRLHQLVAAKVPVNEMARQLGRHRSTIYREI + KRNTFHDRELPDYNGYYSTVANDIAQDRRRRLRKLRRHPTLRTEIINQLEARWSPEQI + AGRLLSDGLSRIRVCKETIYRFIYSKEDYGLGLYQYLPEARRKRRAMRSRKPRDGAFP + ATHRISQRPDFVGDRSRFGHWEGDLLIFERPLGHANITTLVERKSRYTVLIKNPSRHS + RPIMDKIIRAFSPLPAFARQSFTLNRGTEFAGFRALEEGIGACSWFCDPSAPWQKGTV + ENTNKRIRRFLPGTTDLAVVSQRDLLHLTRHVNDQPRKCLGYRTPTEVFMAHLHEDR" + gene complement(100505..101764) + /locus_tag="pRL80097" + CDS complement(100505..101764) + /locus_tag="pRL80097" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="hypothetical protein" + /protein_id="CAK02898.1" + /db_xref="EnsemblGenomes-Gn:pRL80097" + /db_xref="EnsemblGenomes-Tr:CAK02898" + /db_xref="UniProtKB/TrEMBL:Q1M9C2" + /translation="MTCNAEDLLARVSVDGPHRAFAGILASLVPSRSLNVTGEMEMAL + KWTALGVLPNINMMSDVFAGEHMALVGFQDDRFLDALANGPNLQTFLSKFEGNHGQRL + RPSLLVRSDAYPDRPNSEAISSFMDIVVACVVLDARVRAVTWRRNVGPFHTDAFEIYP + WMIDPQGKRLVAGNAALWAIHELDKFRGAASAAVPVQDVGNEPAHPRFFKQLLALWKS + RFIDGHNDWTSRAILRSLKMAASAMRLSSPTGSTESFYDYGRVLSLWVSAFEILVHPG + ATGKATRQRVLDLLKSIPWQSEMAREETHSADFGGGKTFDLRLANALYMKMNDLRNNF + LHGNAVEPEDFRLKNGTLILSFPAAIFRMALATILGDPDGVTSEQVIAGEKTREEYGK + HLQATRFRNDCEECLLAAVKPQTPDDE" + gene 101909..102397 + /locus_tag="pRL80098" + CDS 101909..102397 + /locus_tag="pRL80098" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02899.1" + /db_xref="EnsemblGenomes-Gn:pRL80098" + /db_xref="EnsemblGenomes-Tr:CAK02899" + /db_xref="InterPro:IPR007438" + /db_xref="InterPro:IPR014519" + /db_xref="UniProtKB/TrEMBL:Q1M9C1" + /translation="MKTVYTIGYEGTDIERFVKTLTAVGIEAVADVRAVPLSRKKGFS + KNALREHLEKAGIKYLAMQQLGDPKEGREAAKAGDYDRFRSIYSGHVDLPEVAAAIEE + LATASEEQAVCLLCFERDPKTCHRFIVGERMGAFGYEMTHLFGDDPARYIRNQDRLPK + RS" + gene complement(102406..103215) + /locus_tag="pRL80099" + CDS complement(102406..103215) + /locus_tag="pRL80099" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02900.1" + /db_xref="EnsemblGenomes-Gn:pRL80099" + /db_xref="EnsemblGenomes-Tr:CAK02900" + /db_xref="UniProtKB/TrEMBL:Q1M9C0" + /translation="MAVVDARILILCKTYPSPSGKYAETTCVAGMDESGKLVRLFPVP + FRLIATGQQFKKWQWIRAKVEKARKDHRQESLTIKVDTIEGEAVVQPGKDWAERRHLI + SPIHVYDHFDKIDAEQRASGMSLAMLKPARILGLDIEPVSNPEWTEEELAKLVQEQKQ + GGLFDDEDKPSIRTLQKLPFDFYYRYQCGEGPGAKMFRHKLVDWEVGALYLNCHRSHG + ADWEKYFRDQLENKIPAKDLMFLMGNQHRFQDQWLIISLIYPPHLAQGLLL" + gene complement(103693..105171) + /locus_tag="pRL80100" + CDS complement(103693..105171) + /locus_tag="pRL80100" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02901.1" + /db_xref="EnsemblGenomes-Gn:pRL80100" + /db_xref="EnsemblGenomes-Tr:CAK02901" + /db_xref="InterPro:IPR027299" + /db_xref="UniProtKB/TrEMBL:Q1M9B9" + /translation="MLAFGNMLDLAGIERSTVRLLRHQDNRHAGFPTPYALWRDHRER + FEAYQATQSFASEPKLRATYWASFVGIPERETLFVGLYAARLLGPLPADRQHPITGGI + EPAGSCNVYEVDRLPSLSEYAGRLWIDWGDSYRSWIQRGDGKAKRLIELRRTLGDPPF + PGFAAFIANLSDIESLPATWQAPLSATSGIYLLTCPRTREQYVGMASGVDGFIGRWRE + YFATGHGGNVGLKSRDASDYQLSILKRSARPRRSRTSGSSNDAGRTSCRAGKWALTGT + DMTKADQIRSLAADGLKAADIAARLGIRYQHAYNVINAGPRRTLEAATKPSPIGRPET + RPQSKPALTTDILISGGFTRVGRWIISGDSLAVETPAPKSKGVYAFVKAGVALYVGVA + TKGLAGRLYSYGRPGISQRTNQRLEAIIMAELSGPEAIEISIAMPPDLEWNGLPVNGS + AGLELGLIEKYSLPWNIRGATALEAIAGPGQPALGRDPGQKM" + gene 105548..105724 + /locus_tag="pRL80101" + CDS 105548..105724 + /locus_tag="pRL80101" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02902.1" + /db_xref="EnsemblGenomes-Gn:pRL80101" + /db_xref="EnsemblGenomes-Tr:CAK02902" + /db_xref="UniProtKB/TrEMBL:Q1M9B8" + /translation="MSHDLSLAQSHAFQLSRDLMVPVTVFEVDGEYGVLPSDEIDADD + DLSVIHEFHPWPAH" + gene 105915..106883 + /locus_tag="pRL80102" + CDS 105915..106883 + /locus_tag="pRL80102" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02903.1" + /db_xref="EnsemblGenomes-Gn:pRL80102" + /db_xref="EnsemblGenomes-Tr:CAK02903" + /db_xref="InterPro:IPR017041" + /db_xref="InterPro:IPR025054" + /db_xref="UniProtKB/TrEMBL:Q1M9B7" + /translation="MKREEIEKLREAVSCAAVLEQAGFAVDVKESTRRAVKFRRGAEI + IIVTHEGRGWFDPLSDDKGDVFALTCLLQHLGFSEAVDRVGDLIGFTAAPVIWKKPPS + KVEPADILTRWQGRGLPATGSGVWRYLCWSRAIPISILRSAINQGIVREGPFGSMWAA + HSDGAGLVVGWEERGPDWRGFSTGGSKVLFRLGAPDALRLCVTEAAIDAMSLATIEDL + QDGSLYLSTGGGWSPRTEAALVDLLACPGTHLVCATDANSQGDAFARRLQALAAQVDR + PSVRLRPPAEDWNEVLQERRREKLKSEGRERRAASPPTASREASPG" + gene 107000..107602 + /locus_tag="pRL80103" + CDS 107000..107602 + /locus_tag="pRL80103" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02904.1" + /db_xref="EnsemblGenomes-Gn:pRL80103" + /db_xref="EnsemblGenomes-Tr:CAK02904" + /db_xref="InterPro:IPR009862" + /db_xref="UniProtKB/TrEMBL:Q1M9B6" + /translation="MNTPAPIRKIFEGVATRPQMFRLFDRHSQRPDRWRSDAAPLYSG + EWFELDEALYDYMLNILPPLWMCGPIFALREFLTGSTTSIFLALRIDGKPRYFHGYCD + LSDPTSVETMRATIFERETQPVHAMSREELLEHIWSSTTNAYRGYAGDRFPPVMQGQR + MVMLWSGTNGTLLKLLDDLTDDETAAKLPVHMRHLPDIAA" + gene 107693..112783 + /locus_tag="pRL80104" + CDS 107693..112783 + /locus_tag="pRL80104" + /codon_start=1 + /transl_table=11 + /product="putative methylase" + /protein_id="CAK02905.1" + /db_xref="EnsemblGenomes-Gn:pRL80104" + /db_xref="EnsemblGenomes-Tr:CAK02905" + /db_xref="GOA:Q1M9B5" + /db_xref="InterPro:IPR000330" + /db_xref="InterPro:IPR002296" + /db_xref="InterPro:IPR014001" + /db_xref="InterPro:IPR027417" + /db_xref="InterPro:IPR029063" + /db_xref="UniProtKB/TrEMBL:Q1M9B5" + /translation="MSNDPFTLDMFGSSALSSGLALGVTAFGGFDTVAANDDDPDPTP + PAPAPALPVVTSAACPNSQRQNFYLDGDRGLGASWKDRARVNVAAILVTEGIVKQERP + ATAKEQAQMVRFTGFGAGELANGMFRRPGEVDFCDGWDALGSSLETAVSEADYASLAR + CTQYAHFTPELIVRAIWAGIQRLGWRGGRVLEPGIGTGLFPALIPPEYRDTAYVTGIE + LDPVTARIVRLLQPRSRIIEGDFARTDLAPIYDLAIGNPPFSDRTVRSDRAYRSLGLR + LHDYFIARSIDLLKPGALAAFVTSHGTLDKAATTAREHIAKTADLIAAIRLPEGSFRR + DAGTDVVVDILFFRKRKAGEPEGDQIWLDVDEVRPAVDDEGAIRVNRWFARHPDFVLG + THALTSGPFGETYTCVARDGADLDTILDAAIELLPADVYDGEPTPIDIDLEDELAEIV + DLRPKDSPVREGSFFVDRAKGLMQMLDGTAVAVTVRKGRPGDGISEKHVRIISKLVPI + RDAVREILKAQETDRPWRDLQVRLRLAWSAFVRDFGPINHTTVSIQEDPETGEVKETH + RQPNLLPFRDDPDCWLVASIEDYDLETDTAKPGPIFATRVIAPPMSPVITNAADALAV + VLNERGHVDVDHIAELLHREISAVIDDLRDTVFQDPADGSWKTADAYLSGSVRTKLAA + AQAAAELDPVYERNVRALQAVQPADLRPSDITARLGAPWIPAADVVAFVKERMESDIR + IHHMPELSSWTVEARQLGYSAAGTSEWGTGRRHAGELLADALNSRVPQIFDVFKDVDG + ERRVLNVVDTEAARDKLQKIKQAFQDWVWTDPDRTDRLARDYNDRFNNIAPRKFDGSH + LKLPGASGAFVLYGHQKRGIWRIIADGSTYLAHAVGAGKTMTMAAAIMEQRRLGLIAK + AMLVVPGHCLAQAAREFLALYPNARILVADETNFTKDKRARFLSRAATATWDAIIITH + SAFRFIAVPSAFEQEMIQDELQLYEDLLTKVDSEDRVSRKRLERLKEGMKERLEGLAT + RKDDLLTISEIGVDQIVVDEAQEFRKLSFATNMSTLKGIDPNGSQRAWDLYVKSRYIE + TKNPGRALVLASGTPITNTLGEMFSIQRLLGHAALFERGLHEFDAWASCFGDTTTELE + IQPSGKYKPVSRFASFVNVPELIAMFRSFADVVMPDDLRQYVRVPDISTGRRQIMTAK + PTALFKTYQQTLGSRIKMIEQREGPAKPGDDILLSVITDGRHAAIDLRFVMPAAGNED + DNKLNLLVRNAHRIWKETGDAVYRRPDGKDFELPGAAQMIFSDLGTMNVEKTRGFSAY + RFIRDELIRLGVPAAEIAFMQDYKKTEAKQRLFGDVRAGKVRFLIGSSETMGTGVNAQ + LRLKALHHLDVPWLPSQIEQREGRIVRQGNQHDEVDIFAYATEGSLDASMWQNNERKA + RFIAAALSGDTSIRRLEDVGEGAANQFAMAKAIASGDERLMQKAGLEADIARLERLRA + AHEDDQYAVRGQMRDAEREIEISTRRIGEVGQDLERLQPTSGDAFTMTVLGESHTERK + EAGRSLMKEILTLLQLQHEGEVHLATIGGFDLVYEGERFGRGDGYRYKTLIQRSGADY + EIELAITVTPLGAISRLEHGLDGFEEEQRRYRQRLDDAERRLTSYRSRTGGTFQFADE + LSEKRRLLLGIEDELAAAAVDDGAQEAA" + gene 113368..113685 + /locus_tag="pRL80105" + CDS 113368..113685 + /locus_tag="pRL80105" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="hypothetical protein" + /protein_id="CAK02906.1" + /db_xref="EnsemblGenomes-Gn:pRL80105" + /db_xref="EnsemblGenomes-Tr:CAK02906" + /db_xref="UniProtKB/TrEMBL:Q1M9B4" + /translation="MPFYLVIQTSLIEADDEEAAARIAVDQIRSGNKVAVTVKSDETT + VSHIVVAAKPAISLADPVADPEDGGPVPATHPAPTSAVEADRKAMLKRIVADAFSLLK + RRP" + gene 113859..115771 + /locus_tag="pRL80106" + /pseudo + CDS join(113859..114026,114026..115771) + /locus_tag="pRL80106" + /note="Putative pseudogene, frameshift after codon 56, + similar to Agrobacterium tumefaciens chromosome + partitioning protein ParB or atu6103 or agr_pti_191 + SWALL:Q8U632 (EMBL:AE009429) (653 aa) fasta scores: E(): + 3.5e-151, 65.94% id in 643 aa" + /pseudo + /codon_start=1 + /transl_table=11 + /product="puative ParB-like partitioning protein, + pseudogene" + gene 115866..116762 + /locus_tag="pRL80107" + CDS 115866..116762 + /locus_tag="pRL80107" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02908.1" + /db_xref="EnsemblGenomes-Gn:pRL80107" + /db_xref="EnsemblGenomes-Tr:CAK02908" + /db_xref="UniProtKB/TrEMBL:Q1M9B3" + /translation="MNGASFTSATDPVSISSAADGVEFGTSADGFPVARIGEILLGLI + SNGSGDFFLASAWRITKPLAEVRRHHFYRHDGRVKDEAAFRLRAIETAEHMRELSAFS + RIQTRMSASTPWGGSQLATIYAEGIVSHSTSGHGGFHLSPDRNLQVDASVRSAGGWYE + EDSEWAIVALTFPDLFTGYERQCANEAARNTFPDYWEKLRGRQLSAGESWLKDSAEFD + RVHADDWIVISAIISSHHSGMTEVFAKRGGNREPQREERRFLVPHEEYGRRGRFGFVI + DLARHAAYDGPSSFVGWSARAA" + gene 116741..117100 + /locus_tag="pRL80108" + CDS 116741..117100 + /locus_tag="pRL80108" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02909.1" + /db_xref="EnsemblGenomes-Gn:pRL80108" + /db_xref="EnsemblGenomes-Tr:CAK02909" + /db_xref="UniProtKB/TrEMBL:Q1M9B2" + /translation="MEREGGMMAPVMSPETQLSRMEDARRQTQRQLELIDRQITRRMT + AILPKLARRQTGYHRGKAPDGRTLLERYRANLAGLTAERQPEAEALSRKLARQDAAIA + ALRDRLSSAGPHSSPEG" + gene 117105..117557 + /locus_tag="pRL80109" + CDS 117105..117557 + /locus_tag="pRL80109" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02910.1" + /db_xref="EnsemblGenomes-Gn:pRL80109" + /db_xref="EnsemblGenomes-Tr:CAK02910" + /db_xref="UniProtKB/TrEMBL:Q1M9B1" + /translation="MATIGDLERNAGIGSSNAERTAFWLRFHHLEGKACLDAGVAELK + RMIAERNGTELRAAKHRRQQWPAPSDDQEAALQAYAARHGRRWKSIFSDVWMGGGPPY + DDGGILRGLRNTHGPTWLQSYRLPKAVLRSQSDGNAAVSHVGIGKSEE" + gene 117875..118459 + /locus_tag="pRL80110" + CDS 117875..118459 + /locus_tag="pRL80110" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02911.1" + /db_xref="EnsemblGenomes-Gn:pRL80110" + /db_xref="EnsemblGenomes-Tr:CAK02911" + /db_xref="UniProtKB/TrEMBL:Q1M9B0" + /translation="MAKPVTTRQTARVVQLRKGATVEMVRLTCPDSAQAIKIAESFGT + AVIDSEGIRDLHERLITETADALSEGLGDRAMQIHLQRIVGAYVGSAHGAGQFYSNAV + TQARDATAKAANDARDEDLDGPVGYDSAAHRKREFAADMGIQAHALRMAAEGAVAAYK + HIVGESWKPFDRPVENPGHSVDRKAAAAQMSAFD" + gene 118748..119674 + /gene="ardC8" + /locus_tag="pRL80111" + CDS 118748..119674 + /gene="ardC8" + /locus_tag="pRL80111" + /codon_start=1 + /transl_table=11 + /product="putative ArdC antirestriction protein" + /protein_id="CAK02912.1" + /db_xref="EnsemblGenomes-Gn:pRL80111" + /db_xref="EnsemblGenomes-Tr:CAK02912" + /db_xref="InterPro:IPR013610" + /db_xref="InterPro:IPR017113" + /db_xref="UniProtKB/TrEMBL:Q1M9A9" + /translation="MSKKVESQRTDIYSRITDRILEDLASGVRPWMKPWNAANTDGRI + TRPLRHNGQPYSGMNVLLLWSEQMSRGFASSMWMTFKQALELEAAVRKGETGSTIVFA + SRFTKSEADGKGGEVDREIPFLKAYSVFNVEQIDGLPDHYYYRPAPAQDHVERIEQAD + RFFRNTGAVIRHGGNQAFYAPGPDLIQMPPFETFKDAASFYATLSHEATHWTAAENRV + GRDLSRYAKDRSERAREELIAELGSCFLCADLGIAPELEPRPDHASYLQSWLKVLADD + KRAIFQAAAHAQRATVFLHGLQPEAANFRDAA" + gene complement(119698..119928) + /locus_tag="pRL80112" + CDS complement(119698..119928) + /locus_tag="pRL80112" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02913.1" + /db_xref="EnsemblGenomes-Gn:pRL80112" + /db_xref="EnsemblGenomes-Tr:CAK02913" + /db_xref="UniProtKB/TrEMBL:Q1M9A8" + /translation="MRDELAAIELPRESSPGGFMAKQTKPFIVEIKQSRKLKPSAPKP + SIWGRLDLSTTEDLAPADPLTEPATTESGDRP" + gene 120357..120668 + /locus_tag="pRL80113" + CDS 120357..120668 + /locus_tag="pRL80113" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02914.1" + /db_xref="EnsemblGenomes-Gn:pRL80113" + /db_xref="EnsemblGenomes-Tr:CAK02914" + /db_xref="InterPro:IPR007948" + /db_xref="UniProtKB/TrEMBL:Q1M9A7" + /translation="MATIGTFTSTENGFTGSIRTLALNVKARIARIENPSDKGPQFRI + FAGAVELGAAWQKRSEQTDRDYLSVKLDDPSFPAPIYATLSEVEGEDGYQLIWSRPNR + D" + gene complement(120786..121355) + /locus_tag="pRL80114" + CDS complement(120786..121355) + /locus_tag="pRL80114" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="hypothetical protein" + /protein_id="CAK02915.1" + /db_xref="EnsemblGenomes-Gn:pRL80114" + /db_xref="EnsemblGenomes-Tr:CAK02915" + /db_xref="UniProtKB/TrEMBL:Q1M9A6" + /translation="MGETAPSLTPDLLEQNRTSEREETRGPEGDATDSGVSRRGVSRG + EIPDSLVGRAIRLTKLTDAAKAPSEGGLRLEPAILLGNDDGQLGARQLEFDLDLATFV + GIDALPELGPQFFDLDFNVVGHRLISLDVKDDDSGREGMAGSGIAQATGRAGEWGSRF + SEGATGGCRKLGEPLILEAGMRERHTPGF" + gene 121486..121950 + /locus_tag="pRL80115" + CDS 121486..121950 + /locus_tag="pRL80115" + /note="Similar to the N-terminus of Rhizobium etli + hypothetical protein yh026 SWALL:Q8KL65 (EMBL:U80928) (106 + aa) fasta scores: E(): 4.7e-13, 72.3% id in 65 aa" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein, pseudogene" + /protein_id="CAK02916.1" + /db_xref="EnsemblGenomes-Gn:pRL80115" + /db_xref="EnsemblGenomes-Tr:CAK02916" + /db_xref="UniProtKB/TrEMBL:Q1M9A5" + /translation="MTSDQNLMLYAKLVGFRLVVLADRVGCDTDFLQELHDRLVEGLE + AAIARIQTIMALERSVLTGDEAAYQLDGETEIFGRCAISLLDDLEIDFDTHEYRINGS + DWINALTADYSGVDIDCPELVALTEDELGSLAQIVKDITRETGIPVHAARAV" + gene 121940..122221 + /locus_tag="pRL80116" + CDS 121940..122221 + /locus_tag="pRL80116" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02917.1" + /db_xref="EnsemblGenomes-Gn:pRL80116" + /db_xref="EnsemblGenomes-Tr:CAK02917" + /db_xref="UniProtKB/TrEMBL:Q1M9A4" + /translation="MPSRADGIARTGSNLRRGEVAKAASQLGMVQADRVSDKFAGSAL + DCKDNGASVHNACDAGERRQIQAELDVAQAELAAAVDLQGNIFDWEPPH" + gene 122326..122652 + /locus_tag="pRL80117" + CDS 122326..122652 + /locus_tag="pRL80117" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="hypothetical protein" + /protein_id="CAK02918.1" + /db_xref="EnsemblGenomes-Gn:pRL80117" + /db_xref="EnsemblGenomes-Tr:CAK02918" + /db_xref="UniProtKB/TrEMBL:Q1M9A3" + /translation="MPPEFSPPSASSRSKIQSFSGPPLRCGRADAALLPEAHLIPAMK + RNHRRKIIQQLAKFLAATGRRLRTLGKVVGHFVRKGKLGLKVAIKIPFFVDIEVNFET + DWNRRP" + gene complement(122685..123125) + /locus_tag="pRL80118" + CDS complement(122685..123125) + /locus_tag="pRL80118" + /codon_start=1 + /transl_table=11 + /product="Putative nuclease" + /protein_id="CAK02919.1" + /db_xref="EnsemblGenomes-Gn:pRL80118" + /db_xref="EnsemblGenomes-Tr:CAK02919" + /db_xref="GOA:Q1M9A2" + /db_xref="InterPro:IPR016071" + /db_xref="UniProtKB/TrEMBL:Q1M9A2" + /translation="MKSILTGVISAAVLLGAILIVPHVEAAQSGLPATYPKCASGARY + NCVVDGDTLWIYGQKVRVADIDAPEISTPKCASELTLGNKATERLIELVNEGPFQLQA + WPGRDTDRYGRKLRVLVREGRSLGDRLVSEGLARTWSGRREPWC" + gene complement(123122..123724) + /locus_tag="pRL80119" + CDS complement(123122..123724) + /locus_tag="pRL80119" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02920.1" + /db_xref="EnsemblGenomes-Gn:pRL80119" + /db_xref="EnsemblGenomes-Tr:CAK02920" + /db_xref="InterPro:IPR008893" + /db_xref="UniProtKB/TrEMBL:Q1M9A1" + /translation="MKLLVRSLLMPTRRGNIMVRISSAISGTDQCRGDREKIVTEMGE + DRLRASFAVANRAVRRRRIAQDLTFRTLHQIESITRNRFIIVVGPRTPYARFSVMIAQ + PYQLYVERKDRAKNMARYYAMSIEANLFGELCLTRRWGRIGSKGQTLTHHFEREQDAV + ALFLDLTRQKRARGYRTRSAATRESDSCAPSTVIELGAIR" + gene complement(123855..125807) + /gene="traGp8" + /locus_tag="pRL80120" + CDS complement(123855..125807) + /gene="traGp8" + /locus_tag="pRL80120" + /note="This CDS overlaps 8 nt at the N-terminus with + pRL80120A, alternative start site at codon 16" + /codon_start=1 + /transl_table=11 + /product="putative conjugal transfer protein TraG" + /protein_id="CAK02921.1" + /db_xref="EnsemblGenomes-Gn:pRL80120" + /db_xref="EnsemblGenomes-Tr:CAK02921" + /db_xref="GOA:Q1M9A0" + /db_xref="InterPro:IPR003688" + /db_xref="InterPro:IPR014135" + /db_xref="InterPro:IPR027417" + /db_xref="UniProtKB/TrEMBL:Q1M9A0" + /translation="MTPKRILVAGIPAVAMVAIALTFPGIERWLSAFGTTDQAKLMLG + RIGLALPYALAGACGVMFLFGTKGSINVKTSGWSVAAGGLGVIAVAALREGSRLLTFA + SQVPARRTLLSYADPSTIIGAGTALLVTFFALRVARMGNAAFTRSEPRRIRGKRALHG + EADWMTMQEAEKLFPETGGIVIGERYRVDRDDPAATSFRPGEPTSWGRGGSPPLLCFD + GSFGSSHGIVFAGSGGFKTTSVTIPTALKWGGSLVVLDPSNEVAPMVMEHRRKAGRRV + IVLDPKNAQSGFNALDWIGRHGGTKEEDIAAVASWIMSDSGRATGVRDDFFRASGLQL + LTAMIADVCLSGHTDEKHQTLRQVRANLSEPEPKLRARLQEIYDNSASDFVKENVAAF + VNMTPETFSGVYANAIKETHWLSYANYGALVSGSSFSTEALADGETDVFINIDLKALE + THAGLARVIIGSFLNAIYHRDGAIKGRALFLLDEVARLGYMRVLETARDAGRKYGITL + TMIYQSIGQLRETYGGRDASSKWFESASWISFAAINDPETADYISRRCGTTTVEIDQV + SRSFQSRGSSRTRSKQLASRQLIQPHEVLRMRADEQIVFTAGNAPLRCGRAIWFRRSD + MKSCVGENRFQQRGDRPEVSASAEPN" + gene complement(125797..126009) + /gene="traDp8" + /locus_tag="pRL80120A" + CDS complement(125797..126009) + /gene="traDp8" + /locus_tag="pRL80120A" + /note="This CDS overlaps 8 nt at the C-terminus with + pRL80120" + /codon_start=1 + /transl_table=11 + /product="conjugal transfer protein TraD" + /protein_id="CAK02922.1" + /db_xref="EnsemblGenomes-Gn:pRL80120A" + /db_xref="EnsemblGenomes-Tr:CAK02922" + /db_xref="GOA:Q1M999" + /db_xref="InterPro:IPR009444" + /db_xref="UniProtKB/TrEMBL:Q1M999" + /translation="MQKMSTAEARKKDAREKIELGGLIVKAGLRYEKRALLLGLLIDA + SARIKADEAERTRLSELGAKAFADDA" + gene complement(126014..126307) + /gene="traCp8" + /locus_tag="pRL80121" + CDS complement(126014..126307) + /gene="traCp8" + /locus_tag="pRL80121" + /note="putative alternative start site at codon 8" + /codon_start=1 + /transl_table=11 + /product="putative conjugal transfer protein TraC" + /protein_id="CAK02923.1" + /db_xref="EnsemblGenomes-Gn:pRL80121" + /db_xref="EnsemblGenomes-Tr:CAK02923" + /db_xref="GOA:Q1M998" + /db_xref="InterPro:IPR012930" + /db_xref="UniProtKB/TrEMBL:Q1M998" + /translation="MNKEATLMKKPTAKIREEIARLQEQLKQAETRDAERIGRIALRA + GLGEIEIEEGKLLSAFEEVAARFRAQARPAQGRKPATTPPNGAGEAAGRSAEA" + gene 126539..130132 + /gene="traAp8" + /locus_tag="pRL80122" + CDS 126539..130132 + /gene="traAp8" + /locus_tag="pRL80122" + /codon_start=1 + /transl_table=11 + /product="putative conjugal transfer protein TraA" + /protein_id="CAK02924.1" + /db_xref="EnsemblGenomes-Gn:pRL80122" + /db_xref="EnsemblGenomes-Tr:CAK02924" + /db_xref="InterPro:IPR005053" + /db_xref="InterPro:IPR014136" + /db_xref="InterPro:IPR027417" + /db_xref="InterPro:IPR027785" + /db_xref="UniProtKB/TrEMBL:Q1M997" + /translation="MAVPHFSVSIVARGSGRSAVLSAAYRHCAKMEFEREARTIDYSR + KQGLLHEEFVIPETAPDWLRSMIADRSVSGASEAFWNKVEAFEKRADAQLAKDVTIAL + PVELSNDQNIALVRDFVERHITAKGMVADWVYHDALGNPHIHLMTTLRPLTEDGFGAK + KVAVLAPAGKPVRNDAGKIVYELWAGSTDDFNAFRDGWFACQNRHLALAGLDIRIDGR + SFEKQGIELTPTIHLGVGTKAIERKGDNKTGWGEEKVALERLELQEERRAENARRIQR + NPEIVLDLITREKSVFDERDIAKILYRYIDDAALFQNLMARILQSPQTLRLDRERMDL + VTGVRAPSKYTTRELIRLEAQMANQAIWLSQRSSHGVRHAVLSGVFSRHDRLSDEQKT + AIEHVAGPERIAAVIGRAGAGKTTMMKAARQAWEAAGYRVVGGALAGKAAEGLEKEAG + IASRTLSAWELRWDQERDRLDEKSVFVLDEAGMVSSRQMARFVEAVTVSGAKLVLVGD + PEQLQPIEAGAAFRAISGRIGYAELETIYRQREQWMRDASLDLARGNVSAALDAYAQR + DMVRTGWTRDEAITALIADWDHEYDPAKSTLILAHRRIDVRLLNEMARSKLVERGLIE + AGHAFKTEDGTRQLAAGDQIVFLKNEGSLGVKNGMLARVVDAQPGRIVAEIGNGEDRR + RVVVEQRFYANVDHGYATTVHKSQGATVDRVKVLASSTLDRHLSYVAMTRHRETAELY + VGLEEFAQRRGGVLIAHGEAPYEHKPGNRDSYYVTLGFADGQERTVWGVDLARAMDAS + GARIGDRIGLKHVGSQRVTLPDGTEVDRNSWKVVPVEELAMARLHERLSRPGSKETTL + DYQDASHYRAALRFAEARSLHLMNVARTIAHDQLQWTIRQSSKLAELGARLVAVAAKL + GLGGAKSTVSTTSAIKEAKPMVSGTTTFPRSIGQAAEDKLSADPGLKASWQEVSARFH + HVFADPQAAFKTVNVDAMLANGTVAATTIVQIAEQPESFGALKGKTGLFAGSAEKRAH + DTALVNAPALARDLQGFIAKRAEAASRYEDEERAVRTKLSLDIPALSASAKQVLERVR + DAIDRNDIPAGLEFALADKMVKAELEGFAKAVSERFGERTFLPLAAKTADGKAFEVAS + AGMQPAQKNELRSAWDTIRTVQQLAAHERTAVALKQAEAIRQTQTKGLSLK" + gene 130129..130695 + /gene="traFp8" + /locus_tag="pRL80123" + CDS 130129..130695 + /gene="traFp8" + /locus_tag="pRL80123" + /note="This CDS overlaps 8 nt at the C-terminus with + pRL80124" + /codon_start=1 + /transl_table=11 + /product="putative conjugal transfer protein TraF" + /protein_id="CAK02925.1" + /db_xref="EnsemblGenomes-Gn:pRL80123" + /db_xref="EnsemblGenomes-Tr:CAK02925" + /db_xref="GOA:Q1M996" + /db_xref="InterPro:IPR014139" + /db_xref="InterPro:IPR015927" + /db_xref="InterPro:IPR019533" + /db_xref="UniProtKB/TrEMBL:Q1M996" + /translation="MMSHISPTSAVVRQKRPVLALLAVSCGIALVVIIGGFIGGLRIN + TTPSEPLGLWRVAPLEHPIQVGEMVFVCPPETDAVSEGFERGYLRSGLCPGGFGPLIK + TVAAVGGKRIEIAGNVTIDGRPIANSSLVSQDGQGRPLRPYAGGTIPAGFLFLHSPFP + GSWDSRYFGPVPGSGVLGLAEQVLTYAP" + gene 130685..131851 + /gene="traBp8" + /locus_tag="pRL80124" + CDS 130685..131851 + /gene="traBp8" + /locus_tag="pRL80124" + /note="This CDS overlaps 8 nt at the N-terminus with + pRL80123" + /codon_start=1 + /transl_table=11 + /product="putative transmembrane conjugal transfer protein + TraB" + /protein_id="CAK02926.1" + /db_xref="EnsemblGenomes-Gn:pRL80124" + /db_xref="EnsemblGenomes-Tr:CAK02926" + /db_xref="GOA:Q1M995" + /db_xref="InterPro:IPR003010" + /db_xref="InterPro:IPR016707" + /db_xref="UniProtKB/TrEMBL:Q1M995" + /translation="MRPDWLQPFTLALAAATTGYISWSGHALALPAAIAFPALWSLAH + SRRAASVVSAAYFLAASRGLPQGVAAFYQSDIWPGLILWLVASSGFVFVHVALWSRQS + GGWKALRYTIAMVLMALPPFGIVGWAHPITAAGVLFPGWGWAGLAAVTAGLALMTTQY + RPAVAITLAGFWLWSAAFWTAPDIGRHWQGVDLQLGNRLGRDNSLARHSDLVATLRSE + RRPGSTFMLLPESALGFWTPSVERVWRQQLAEADLSVIAGAAVVDREGYDNVLVRVSA + TDSEILYRERMPVPGSMWQPWLAPIGKSGGARADFFANPVVSVGGQRVAPLICYEQLI + IWPVLQSMLHDPDLVVAVGNDWWTKGTAIIGIQRASAEAWARLFNKPLVMSFNT" + gene 131867..132484 + /gene="traHp8" + /locus_tag="pRL80125" + CDS 131867..132484 + /gene="traHp8" + /locus_tag="pRL80125" + /codon_start=1 + /transl_table=11 + /product="putative conjugative transfer TraH protein" + /protein_id="CAK02927.1" + /db_xref="EnsemblGenomes-Gn:pRL80125" + /db_xref="EnsemblGenomes-Tr:CAK02927" + /db_xref="InterPro:IPR010680" + /db_xref="UniProtKB/TrEMBL:Q1M994" + /translation="MDAALIAKCADPSLPPAIVEQFISAVGSDDPLAVTVNADGRLVL + IPKPRSPDEAMGVVKDYVGHAIVRVGITQFPADVGVDDASQLQSDMFEACANLRTGTG + IFAKVARIVAKWYGRPTNKELLPQLVDDTIYAWKTGSFEGDNVFRASDPGGPTFFGTR + SEKRAEGTDPVAPPMESEDASQSAEPDKATEAGMRIDLSRIGGQK" + gene complement(132508..133125) + /locus_tag="pRL80126" + CDS complement(132508..133125) + /locus_tag="pRL80126" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02928.1" + /db_xref="EnsemblGenomes-Gn:pRL80126" + /db_xref="EnsemblGenomes-Tr:CAK02928" + /db_xref="UniProtKB/TrEMBL:Q1M993" + /translation="MSSNYPAYVLGYHGCDKAVGMAALTGASPLLPSEKAYDWLGSGI + YFWENDPERALEWATLKAESGAYKEPFVLGAIIDLGNCLDLITRKYVPLIQTSYRMLK + SQIEATGGKMPVNSDAKGDKNSDKLVRKLDCAVINYVHEIAKEAALPAFDTVRGLFPE + GNEIYDGARFHERTHTQIAVRNDPCIKGFFLPRGETPALTSPVSP" + gene complement(133444..134439) + /locus_tag="pRL80128" + CDS complement(133444..134439) + /locus_tag="pRL80128" + /codon_start=1 + /transl_table=11 + /product="conserved hypothetical protein" + /protein_id="CAK02929.1" + /db_xref="EnsemblGenomes-Gn:pRL80128" + /db_xref="EnsemblGenomes-Tr:CAK02929" + /db_xref="InterPro:IPR029058" + /db_xref="InterPro:IPR029059" + /db_xref="UniProtKB/TrEMBL:Q1M992" + /translation="MYLSWLDRWDEQRARRGEDGKETTSFILDADRAFPGERIDSIRE + FCARADEASIDPTFFDEQNGGDQRFEVREHWVKFPSDISTDVAENNIVWAKITKSGSL + DKALVIFHHWNARARNAQIAGFLSRRGITVIEIAMPYHFERNRPGSLHADYMLSANLG + RTIQAVRQAVCDGGKLIRWLKSEGYREISVLGMSLGSWVAGLIAAHDPNVSKASLFLT + GGSLADMVWTGRATRTIRSSLEPEIELADLRRAWSPLNLENYAHRLARPDLDIQMVLA + RRDTVVMPELSESLLRSLKNAGGRPQIMKLSCGHYSLGKLPYILYAGLSLKRFLS" + gene complement(134429..135988) + /locus_tag="pRL80130" + CDS complement(134429..135988) + /locus_tag="pRL80130" + /note="no significant database hits" + /codon_start=1 + /transl_table=11 + /product="putative transmembrane protein" + /protein_id="CAK02930.1" + /db_xref="EnsemblGenomes-Gn:pRL80130" + /db_xref="EnsemblGenomes-Tr:CAK02930" + /db_xref="GOA:Q1M991" + /db_xref="UniProtKB/TrEMBL:Q1M991" + /translation="MFKKLAAAGAGRAVGEDGFPAGPWTPELLAAAISQIDSNRSGVD + LRTVQLWFQENDRGISATNIRWLARVFGCDDPEATAAWQVELSAAQARLQAKRREAKK + ADSLAPGTRDVPPDPTVDHEQWSPVKFTPDAEPEADRKKQRFSLARKSETLFSGGSPL + NLPASVFAGASALGFLSYIVGIHSAVYLRADNVAKEVGFLWAPNWTFLFMVLLPLFFA + LVTELLAFWTKEGRVRLWAWNNTTTSEDDWARNVEASSSSYWAVFLICVLFAGVFQWI + GVCLIPLLEGGADYATSWGTLAIVRPEVISVPVSIAFTALAYLYMCLCFYLFFAGLIL + LHTMIHDLRKIDLEAKILQQVGSQGEVHELCLRLMRGIFRCTILGLLVALCMKAQSSY + LTSNAKNILDWLIGDLSSALAGRDGASNGFHYRMPTHYSGLLVALSTIVVFVYGSICL + RAIGTRFHVPLWKMAAVVTLLFATYLLVDAFVGFSIVLGIGVFVALFSLVDPGLGSRR + SSEIGSKQIVS" + gene 136336..137040 + /gene="traRp8" + /locus_tag="pRL80131" + CDS 136336..137040 + /gene="traRp8" + /locus_tag="pRL80131" + /codon_start=1 + /transl_table=11 + /product="putative LuxR-type transcriptional regulator + protein TraR" + /protein_id="CAK02931.1" + /db_xref="EnsemblGenomes-Gn:pRL80131" + /db_xref="EnsemblGenomes-Tr:CAK02931" + /db_xref="GOA:Q1M990" + /db_xref="InterPro:IPR000792" + /db_xref="InterPro:IPR005143" + /db_xref="InterPro:IPR011991" + /db_xref="InterPro:IPR016032" + /db_xref="UniProtKB/TrEMBL:Q1M990" + /translation="MNQLVAGLLEISAVAHDDATLKAALADLAERFEFSGYDYAKLLP + GDFYVISNLHPDWLKRSRKLDLDRRNPVMKRAQQTRRAFIWSGTPQTGTPLEEDQTFY + ETAAQFGIRSGITIPIAISSGAISVLSFVSPKSILTAQDEIDPIAASSAVGQLHARIG + QLKVTPSIQESFYLSPKEGTYTRWLSLGKTVEDTADIEQVKYNTVRIALAEARRRYDL + CNNTQLVALAIRRGLI" + gene complement(137041..137340) + /gene="traMp8" + /locus_tag="pRL80132" + CDS complement(137041..137340) + /gene="traMp8" + /locus_tag="pRL80132" + /codon_start=1 + /transl_table=11 + /product="putative transcriptional regulator TraM" + /protein_id="CAK02932.1" + /db_xref="EnsemblGenomes-Gn:pRL80132" + /db_xref="EnsemblGenomes-Tr:CAK02932" + /db_xref="GOA:Q1M989" + /db_xref="InterPro:IPR015309" + /db_xref="UniProtKB/TrEMBL:Q1M989" + /translation="MGYERNDTATDIDLRPIIGLLSNEPEQVVEILTVGAIKKHRTLV + DRAERMFQVAHAGDRSDEKEPSDAHLAYLEATIEMHAQMSALTTLLNILGRTPKV" + gene complement(137542..138840) + /gene="trbLp8" + /locus_tag="pRL80133" + CDS complement(137542..138840) + /gene="trbLp8" + /locus_tag="pRL80133" + /codon_start=1 + /transl_table=11 + /product="putative conjugal transfer protein TrbI" + /protein_id="CAK02933.1" + /db_xref="EnsemblGenomes-Gn:pRL80133" + /db_xref="EnsemblGenomes-Tr:CAK02933" + /db_xref="InterPro:IPR005498" + /db_xref="UniProtKB/TrEMBL:Q1M988" + /translation="MVQSLQLGTSNQSADEKGMKRLNRVPLFVGIGILVLFLAVLVYG + LSSRGLRFGQHDPNESGSGTPASTFADQLKRGVKDGIIGDREEQQPVFQPTPVPQQQG + DTRPVQTEPKIEPREPRRSRMESDEEWNARMLREQREQIFREQQRQRMASLQARGAAL + DSPLKVDITDVSAQAPGAASAAPRQSNTAPNGSQDLYAAALRAGAAGQVDQNGQASKE + DFFNADIKDLGYLPNQVVPQQSRYELKRGSVIPATLITGINSDLPGRITAQVSQNVYD + SATGHFMLVPQGTKLFGRYDSKVSFGQNRALVVWTDIIFPNGSTLQIGGMAGTDSEGY + SGFSDKVDNHYLRTFGSAALVALIGTGIDMSMPQSSTLATQDTASDAARRNFAETFGR + VAEQTISKNLIVQPTIKIRPGFRFNVLVDQDIIFPGNYSN" + gene complement(138856..139302) + /gene="trbHp8" + /locus_tag="pRL80134" + CDS complement(138856..139302) + /gene="trbHp8" + /locus_tag="pRL80134" + /codon_start=1 + /transl_table=11 + /product="putative conjugal transfer protein TrbH" + /protein_id="CAK02934.1" + /db_xref="EnsemblGenomes-Gn:pRL80134" + /db_xref="EnsemblGenomes-Tr:CAK02934" + /db_xref="InterPro:IPR010837" + /db_xref="UniProtKB/TrEMBL:Q1M987" + /translation="MQLYRLIPIVLTLSLAGCQTATDGLSTSGAPAEVTGPAAGAIAG + DMAGRFAEQAGSTTTPIKLHKDTSEFSVALEAALKGWGFAIVTDDKSASVKDAPKPVE + LAYSIATVDGQVLARLSTDTMELGRAYSVNNGLATPASPLSLMKRN" + gene complement(139306..140136) + /gene="trbGp8" + /locus_tag="pRL80135" + CDS complement(139306..140136) + /gene="trbGp8" + /locus_tag="pRL80135" + /codon_start=1 + /transl_table=11 + /product="putative conjugal transfer protein TrbG" + /protein_id="CAK02935.1" + /db_xref="EnsemblGenomes-Gn:pRL80135" + /db_xref="EnsemblGenomes-Tr:CAK02935" + /db_xref="InterPro:IPR010258" + /db_xref="InterPro:IPR014142" + /db_xref="UniProtKB/TrEMBL:Q1M986" + /translation="MRIKHKRVALRCMLALAVSTAVSPMAIAQSLTGNEAKGTNLSGK + WRGQTGLVTRGPDGKVIFLFGETQPSVVCSPLQVCDIELQGGEIVRDVLVGDTVRWKV + EPATSGAAGGQAIHLIVKPSEPGLVTSMVVTTSRRTYHIQLKSHPTQYMARVGFEYPE + DAATKLSDINARIQAMTGPDGGVAPEQLAFSYSLSGSAPWRPKRVYSDGQKTYIQFPR + AISGQDAPVLFVVSGGQNRIVNYRMKKDMMIVDYNIDKAILISGVGWKQQKITIRRGG + " + gene complement(140152..140814) + /gene="trbFp8" + /locus_tag="pRL80136" + CDS complement(140152..140814) + /gene="trbFp8" + /locus_tag="pRL80136" + /codon_start=1 + /transl_table=11 + /product="putative transmembrane conjugal transfer protein + TrbF" + /protein_id="CAK02936.1" + /db_xref="EnsemblGenomes-Gn:pRL80136" + /db_xref="EnsemblGenomes-Tr:CAK02936" + /db_xref="GOA:Q1M985" + /db_xref="InterPro:IPR007430" + /db_xref="UniProtKB/TrEMBL:Q1M985" + /translation="MAGHPAPENPYLAARQEWSERYGSYVRAASAWKVVGILSLGMAV + IGFGYALYLSTQVKLVPYIVEVDKLGNTVSGGFPQQIEYADTRVVRATLGNFVTSFRS + ITPDAVVQKQYIDRTYALLRTSDPSTQKVNAWFRGNSPFEKAVNATVAIEVNNIVALS + NQTFQIDWTEYERDRKGKEVATRRFRGIATVSITSPQDEATIRLNPIGVYVTDFDWTA + QL" + gene complement(140836..142017) + /gene="trbLp8" + /locus_tag="pRL80137" + CDS complement(140836..142017) + /gene="trbLp8" + /locus_tag="pRL80137" + /codon_start=1 + /transl_table=11 + /product="putative conjugal transfer protein TrbL" + /protein_id="CAK02937.1" + /db_xref="EnsemblGenomes-Gn:pRL80137" + /db_xref="EnsemblGenomes-Tr:CAK02937" + /db_xref="GOA:Q1M984" + /db_xref="InterPro:IPR007688" + /db_xref="InterPro:IPR014150" + /db_xref="UniProtKB/TrEMBL:Q1M984" + /translation="MVKRNRRSTILVAGIAFLALAAPAFAQQGQVLTELENQVSTAAK + GWETTVMDAAKSLFWILAGIEVGIAAVWLAIQAASLDSWFAELVRRIMFIGFFAFALT + QGPTFARAVVDSLFQIGAGGGSASPAEVFDAGIRVASQMSEQAKFGVFEDNALAIAAV + LAMGIVVICFSLVAAIFVSVMVEMYVGLLAGMIMLGLGGSSFTKDFAIRYLVYAFGVG + MKLMALVMIAKIGSNVLLGLAQAPTAESDQFITTLAIAGISVVVFIIAVYVPNIIQGV + VQGASVSGGMETIRHGGQAASFAAGGAFLGAGAVGAGFAAAQAARAGGSSAAASVLRG + MGASFSSGAMAAGSAAKEKAIGSPGAYAGSLLGLANAKLDQARGNSSAPTPPPEKPEK + P" + gene complement(142208..143017) + /gene="trbJp8" + /locus_tag="pRL80138" + CDS complement(142208..143017) + /gene="trbJp8" + /locus_tag="pRL80138" + /note="This CDS overlaps 5 nt at the N-terminus with + pRL80139" + /codon_start=1 + /transl_table=11 + /product="putative conjugal transfer protein TrbJ" + /protein_id="CAK02938.1" + /db_xref="EnsemblGenomes-Gn:pRL80138" + /db_xref="EnsemblGenomes-Tr:CAK02938" + /db_xref="InterPro:IPR014147" + /db_xref="InterPro:IPR023220" + /db_xref="InterPro:IPR024475" + /db_xref="UniProtKB/TrEMBL:Q1M983" + /translation="MSEAIQFLKLGKSLVVGLVAAGLTATPTVSYAGGAVTGATEMTQ + LLNNGELISLVGQSSEQIANQITQITQLAEQIQNQLNIYQNMLQNTAQLPNHIWGQVE + GDLNQLRDIVNQGQGIAFSMGNADDLLKQRFKSYADLKTSLPNAESFSSTYQTWSNTN + RDTISSTLKAASLTADQFDSEEDTMGQLQSMSQSADGQMKALQVGHQIAAQQVAQIQK + LRGIVSQQTTMMGTWLQSEQTDKDLAQARREKFFNADVQSIPSGQKMEPRW" + gene complement(143010..145445) + /gene="trbEp8" + /locus_tag="pRL80139" + CDS complement(143010..145445) + /gene="trbEp8" + /locus_tag="pRL80139" + /note="This CDS overlaps 5 nt at the C-terminus with + pRL80138" + /codon_start=1 + /transl_table=11 + /product="putative conjugal transfer protein TrbE" + /protein_id="CAK02939.1" + /db_xref="EnsemblGenomes-Gn:pRL80139" + /db_xref="EnsemblGenomes-Tr:CAK02939" + /db_xref="GOA:Q1M982" + /db_xref="InterPro:IPR003593" + /db_xref="InterPro:IPR004346" + /db_xref="InterPro:IPR018145" + /db_xref="InterPro:IPR027417" + /db_xref="UniProtKB/TrEMBL:Q1M982" + /translation="MVALRTFRSTGPSFADLVPYAGLVDNGILLLKDGSLMAGWYFAG + PDSESATDFERNELSRQINSILSRLGTGWMIQVEAVRVPTTEYPSAERSHFPDAVTLL + IDHERRRHFGQERGHFESRHALILTYRPPERRRSGLTRYIYSDNESRSAKYADTALDA + FRRSIREIEQYLGNVLSVQRMQTREVSERDGARVARYDELFQFIRFAITGENHPVRLP + EIPMYLDWLATAELEHGLTPRVEGRFLGVIAIDGLPAESWPGILNSLNLMPLTYRWSS + RFIFLDAEEAKQRLERARKKWQQKVRPFFDQLFQTQSRSVDQDAMAMVAETEDAIAEA + SSQLVAYGYYTPVIVLFDESRSALQEKAEAVRRLIQAEGFGARIETLNATDAFLGSLP + GNWYCNIREPLINTRNLADLVPLNSVWSGQPFAPCPFYPPDAPPLMQVASGSTPFRLN + LHVDDVGHTLIFGPTGSGKSTLLALIAAQFCRYEKAQVFAFDKGNSMLTLTLGVGGDH + YEIGGESGEGASLAFCPLSELSTDADRAWASEWIETLVSLQGVTIAPDHRNAISRQIG + LMASAPGRSLSDFVSGVQMREIKDALHHYTVDGPMGLLLDAEEDGLTLGRFQCFEIEQ + LMNMGERNLVPVLTYLFRRIEKRLDGSPSLIVLDEAWLMLGHPVFRDKIREWLKVLRK + ANCAVILATQSISDAERSGIIDVLKESCPTKICLPNGAARETGTREFYERIGFNARQI + EIVANAIPKREYYVTSPDGRRLFDMSLGPVTLSFVGASGKADLARIRALSSTHGLEWP + GQWLIERGINRNE" + gene complement(145456..145755) + /gene="trbDp8" + /locus_tag="pRL80140" + CDS complement(145456..145755) + /gene="trbDp8" + /locus_tag="pRL80140" + /note="This CDS overlaps 5 nt at the N-terminus with + pRL80141" + /codon_start=1 + /transl_table=11 + /product="putative conjugal transfer protein TrbD" + /protein_id="CAK02940.1" + /db_xref="EnsemblGenomes-Gn:pRL80140" + /db_xref="EnsemblGenomes-Tr:CAK02940" + /db_xref="InterPro:IPR007792" + /db_xref="InterPro:IPR016704" + /db_xref="UniProtKB/TrEMBL:Q1M981" + /translation="MADAGVGLHRNRIHRALSRPNLLMGADRELVLLTGLAAIILIFV + VLTIYSALFGIAIWIVVVGALRMMAKADPMMRKVYARHMRYRAHYLPTSAPWRRY" + gene complement(145748..146140) + /gene="trbCp8" + /locus_tag="pRL80141" + CDS complement(145748..146140) + /gene="trbCp8" + /locus_tag="pRL80141" + /note="This CDS overlaps 5 nt at the C-terminus with + pRL80142 and 9 nt at the N-terminus with pRL80142" + /codon_start=1 + /transl_table=11 + /product="putative conjugal transfer protein TrbC" + /protein_id="CAK02941.1" + /db_xref="EnsemblGenomes-Gn:pRL80141" + /db_xref="EnsemblGenomes-Tr:CAK02941" + /db_xref="InterPro:IPR007039" + /db_xref="UniProtKB/TrEMBL:Q1M980" + /translation="MQHNRFLRLGIIGALLCASLAGPALAGSGGSLPWEGPLEQIQQS + ITGPVAGYIALAAVAIAGGMLIFGGELNDFARRLMYVVLVAGILLGATTIVGLFGSTG + ASIGSSFVAEQPPTPSSTPLGGGEGRHG" + gene complement(146130..147095) + /gene="trbBp8" + /locus_tag="pRL80142" + CDS complement(146130..147095) + /gene="trbBp8" + /locus_tag="pRL80142" + /note="This CDS overlaps 5 nt at the C-terminus with + pRL80141" + /codon_start=1 + /transl_table=11 + /product="putative conjugal transfer protein TrbB" + /protein_id="CAK02942.1" + /db_xref="EnsemblGenomes-Gn:pRL80142" + /db_xref="EnsemblGenomes-Tr:CAK02942" + /db_xref="GOA:Q1M979" + /db_xref="InterPro:IPR001482" + /db_xref="InterPro:IPR014149" + /db_xref="InterPro:IPR027417" + /db_xref="UniProtKB/TrEMBL:Q1M979" + /translation="MLQSHSRLVRKLQDALGEHLCIALEDPTVVEIMLNPDGKLFIER + LGHGVAPAGEMQATAAETVIGSVAHALQSEADGERPIISGELPIGGHRFEGLLPPVVN + SPTFTIRRRASRLIPLDDYVTAKIMTEAQASIIRSAITNRLNIVIAGGTGSGKTTLAN + AVIAEIVSSAPEDRMVILEDTSEIQCAAENAVCLHTSDAVDMARLLKSTMRLRPDRII + VGEVRDGAALTLLKAWNTGHPGGVTTIHSNSAMSALRRLEQLTSEASQQPMQAVIGEA + VDLVISIERAGRGRRVREVLHVEGFNGSRYQTEHYPQIDEDSHAA" +BASE COUNT 30119 a 43224 c 43403 g 30717 t +ORIGIN + 1 gtggagaatc ccgctcagct tcagaaggct attcataaac tgatagcggc ccacgcgcga + 61 gatctctcgg gcgcgcttca cgagcatcgt gtgaagcttt atccgcctga agctcgaaag + 121 acgcttcggt cattttcgtc gatagaggct gcgaagctca ttggcgtcaa cgatggctat + 181 ctccgccatc tttcgctcga gggtaagggg ccgcagcctg agatcggaaa taacaatcgc + 241 cgttcgtatt cggtcgagac tattcaggcg ctccgcgagt atctcgacga gaacggcaag + 301 ggtgaccgtc ggtactcacc acgccggagc ggtcgtgagc atttgcaggt tataaccgca + 361 gtgaacttca agggaggcag cggtaagacc acgacggctg ctcatcttgc tcagtatctt + 421 gcgcttaatg gataccgggt tcttgcgatt gatcttgatc cgcaggccag catgtccgct + 481 ttgcacggat tccagcctga gtttgacgtt ggcgacaacg aaacgctcta cggcgccgtt + 541 cgttatgatg aagagcggcg cccgctgaag gatataatca agaaaaccta ctttgcgaac + 601 cttgatctcg ttccgggcaa cctcgagctt atggaattcg agcacgacac cgctaaagtg + 661 ctcggctcta acgaccgcaa gaacatcttc ttcacgcgaa tggatgacgc aatcgcgtca + 721 gtggcggacg actatgacgt tgtcgtcgtc gactgccctc cccagctcgg ctttctgacg + 781 atctcggctc tatgcgcggc aaccgccgtt cttgttactg tacatcctca gatgctcgat + 841 gtgatgtcga tgtgccagtt tctgctgatg acctcagaac ttctgagcgt cgttgcggat + 901 gctggcggga gcatgaacta cgattggatg cgttatctcg ttacgcgcta cgagccggga + 961 gacggaccgc aaaaccagat ggtgtcgttc atgcgcacga tgtttggcga ccatgtcctg + 1021 aaccacccga tgctcaagag cacagccatt tcagacgcgg ggattactaa gcagactctc + 1081 tatgaggtga gccgcgacca gttcacgcga gcaacatacg accgagccat ggaatcgctc + 1141 gacaacgtga acagcgaaat cgaacaactc attcaatcat cttggggtcg caaatgatgg + 1201 ctctagagat ctcagaaaac gcgacattga tggagaagtt gccagccgga aacttttcgg + 1261 aatttgcact ctctatgtcg aggaatccgg cttgtcacga gtacctcagg ggaaagcaag + 1321 atggctagaa aacacctcct ttcagatttg aaagctcctg cttcatcatc tacggagttc + 1381 gatgaagcta gggctgcaga cgtccctact ccgcagtatg cgcctcgagg tgcaatcggt + 1441 gccgtctcgc gatcgattga agctttgaag tcgcagggac tgagtgaact cgatcccgaa + 1501 ctgatagatg cgccgtccgt tactgatcgc cttgatgagg atggggctca gtttgaggag + 1561 ttcgctcgca acatccgtga gaatgggcag caggttccga ttcttgtccg gcctcacccg + 1621 accgtggaag gacggtatca gattgcctac ggccggagac ggttgagagc ggtcaaggcg + 1681 gccggcctca aggtcaaagc cgcaatcaga aatctgacag atgacgagct tgtactggcg + 1741 caaggtcagg aaaacagcgc gcgtcaggat ctgtcgttta tcgagcgggc gctctatgca + 1801 gcccagctcg aagcgagtgg ctaccagcgt cccgtcatca tggcagcgct ggctgtcgac + 1861 aaaagtaacc tttcgcggtt gattcaggct gcgacccaat tgccggacga cgtcatccga + 1921 ctaattggtg ctgcgcctaa gaccggccgt gatcgctggt acgagctatc atcgcggttg + 1981 gctgcagaag gtgctgcgga gaaggcgcgc gctcttcttt cgactagcga ggttggctcc + 2041 ctgggttctg atgagcgatt tgttcgcgtt ttcgacgcgg ttgcgccgaa gaaatctaag + 2101 aaggaaaaag ttcaggcgga tgtctggcaa gctgacgatg gggtcaaggc tgcgagtttc + 2161 cgccaggaca aacgaacact gacattgatg atcgacaaga aggcagcgcc ggaattcggt + 2221 gagtacctga tgtcggctct ccccgagatc tacgcttcgt tcaagaagtc gaagcaatag + 2281 atgagtcgta acgaagaaag gtgccgatag cgcaaagaaa aagccctccg aaacggtgtt + 2341 ccagaaggcc tctctcagtt tggtcgctta gagaatcgca tttcccggaa tcacagtcaa + 2401 gagtcaacgc cacaccggcg tagccttttc tttgccttgc gaaaggtgaa ggacatggaa + 2461 acgggttata tcacgacgcc ctttgggcgg cggccgatga cgcttgctct ggtgaagcgt + 2521 caggttaaga ccgagcaggc aatagcggat ggctcggtcg acaagtggcg cgtgtttcgc + 2581 gacataagcg acgcccgctc acgccttggc cttcaagatc gagccttggc ggtcttgaat + 2641 gcacttttaa cattcttccc agttgctgaa ctcagcaatg agaggaacct ggtcgtcttt + 2701 ccatcaaatg ctcagctatc agcccgcaca aacggtatcg ctgggacaac tctgcgcaag + 2761 tgcctcggtt cgctggtgga ggccggtgta atcatccgca aggatagccc taacggtaag + 2821 cgatatgctc gaaaaggcaa agaaggaaac atagaggacg cctacggctt cagtctggca + 2881 ccgcttcttg cgcgcgccgg cgagtttgct agcctcgccc aagacgtggc tgctgaacag + 2941 cgccgcttcc gcatcacgaa agaccgcctc acgatcgttc ggcgagatgt ccgcaagctg + 3001 atcaccgtcg ggatggaaga gaaccttgcc ggcgattgga ttgccgcgga aacgtgcttt + 3061 gtcgagattg tgggaaggtt cgttcggcac ccgacgctcc aggacctgat ttcgagcctc + 3121 gacgagatga gccttcttca cgaagaagtc tccaggatgc tggaaattaa agaagaaacc + 3181 gcaaaaagtg atggcaatgc catcccggac ggatgccaca tacagaattc aaataccgaa + 3241 tcctgccatg aacttgaacc ccgctccgaa aagaagcagg gcgaaaagtc cgagccaaac + 3301 aagaaaacgg agcggaaaga cgaaccggaa gcgtttccgt tgtccatggt gttgcgtgcc + 3361 tgcccggaga tcaacgcatt tggccctggt ggatcgattg gaagctggcg cgaaatgatg + 3421 tcagcggcgg taacggttcg gtccatgctt ggcgtcagcc cctctgccta tcaggaggca + 3481 tgcgaggtga tggggcaggc cggagcggcg atagcaatag cttgcattta ccagcgtggc + 3541 gggcacatca actcggcggg gggatatctt cgggatctaa cggggaaggc gcggcgaggg + 3601 gagttttcac ttgggccaat gctgtttacg caattgcggg cgaactcggg caccgtcaag + 3661 gcgtcagcgt aggtcaaagt atcatgattg tttagcctaa ccggttgaac taattaacct + 3721 attttgacta gtttccggct ggcaacttta tctcgatcta aagcgtcgag tgaatggcag + 3781 aagataatct tcctgatggg cgtccgtata atgaccgaaa ttgtgcttcc gaccgaaaac + 3841 acgatcatcg cggcagccaa aaaacttgac gcggccgcat cgcagctggt ggcagagacg + 3901 ttctttgcca ttcggcatgg gatgtcaatc aatccaattg gtcgcaaccc ggatgggcag + 3961 accatcaagg gataccctga cattactggg cgggtgccgg gtgagaagaa gtacctgatc + 4021 gaagtcacga aggacgactg gcgcacacat cttcagagcg atctatcaaa actgtcccgc + 4081 ctgcagaaag gagcctacgc gggtttccta cttctctgct tccgaaagtc cgagtccgaa + 4141 ctcactcaaa gcaacaggaa gaaggcacgg gaaaccgtcc agcaggccga gagccggatt + 4201 gaaaagcttt tgggtgtcca ggcaggacag gtagaattcg tctttcttgg cgagttcgcg + 4261 cgtgaggtca gatcggcgaa ataccaccgc gtattgctgg ctctgggtct cgagcttgtg + 4321 ccagcgccat tctacacgga tttgcgcttc gtgcagggct tagccgattt cgtaccgacc + 4381 gctgaggaat atgaggctga gagtgttgtt cctcgcgatg aggtaagccg gacctatgag + 4441 cgggtcttca aaaacagact aacgttgatc gaaggcgagg gcggtagcgg caaaacaagc + 4501 ctggccctag ccgttgcgac ggagcatcgg aagcaaggcg agatctttct gttcttagac + 4561 gcctctgtcg ctgactggaa gagcggttcg gagcgagctc gcctcgttga cgtagcggcg + 4621 atgttcgcgg aatcgaatgt cctgattata ttggacaacg tacatctggg cgatgcgtcc + 4681 ggcatttctg aactgattac aaatgtccag gcgtccggtt atgatttccg ctttttgatg + 4741 acgacgcgca gcagcgacga agttgaacaa tggaagcgcc tgggaaatat cgagcttctc + 4801 cgcagagttc cgtctggagc cgatgtcaac tctgcctatc accgcctgct cactcaaaag + 4861 tttcccggaa gcagtttcaa cgatattccc ccagcggtga ccacacgatg gtcaaatcaa + 4921 attcccaatc tggttattct cacgcttgct cttgaaggtc tcacaaagag aggcggctat + 4981 gatcgcgatt gggcgatcaa ggttgaggac gcaggcacat accttcaagc taagttcatc + 5041 tcgaagctgt cgtccgacga cgtcaaacag gtgggcaaga tcgctgcgct ctcacttctg + 5101 gaaattccca cctcgctcag gtcgctcgac caccgggttc caaagtctgc tgtggatctg + 5161 ggcttcgttc gtctgaactc gagttcaaca actcagcgat atgagctcgt tcaccacgaa + 5221 ctgggcaagc tgatcacgtc cttcaaagat ccggatatca aggcgcggct gggagaggtg + 5281 atgtccgctg atcccttcca ggcaacatat atcgggctga agcttatcgg aaacggagaa + 5341 gccagcctgg caaaggaatt gttgtcgtca gtcctttctc aatcactcac actctcgcca + 5401 gatttctcga tgggaaactc cggcggagtc ttcggtatcc tggtccagtc caacgtgact + 5461 acctatcccg aaattgagcg tatccttctt cctgatatcg gcgccttttt cgatacaaag + 5521 ccggatattg taaccggcct tagctccttc ctcggggctg cctccgaaaa catggagcgc + 5581 gtatacaatg ccattgtgga aaaacttgcc gaacaggaaa cgattcgacg gatcgaagag + 5641 cttctcccat ccgtcggccc gacgactttc gcgacacttt accgatgcgc gaactcacgg + 5701 aacctcccgt ttctttcaac gcttcgaaaa tatctcaaca gagggaagcg tatagattcc + 5761 tttgcctatc gatgcaggtc tgaaagtccg agtaaggtcg agatctgctg gggcctgatt + 5821 gatgagttct ttccacacca caaggcccgg tttgaagttg tgcttcgctc tgccctcgcc + 5881 gagggataca tcgagcgcct tatcccggaa gagcttattg agtctcgctc ttcaagggct + 5941 gttcagacgg cgatccgatg cgcaaatagc gaagttttca aacggtacat cacgttccgt + 6001 gactgcagcg acgcgacgct gttgcttctg gcccacacga tgcacgacat gggcaggaat + 6061 gatctctcgg aggtcgcagc tgaccgagtt gcaggcagga cgacctcttc aatctggtat + 6121 catcgtcgca ccggtggcag ggcgttgctg actattttgc ggagagcatc gatatctgca + 6181 gaaggagatg ttcagaaaat tctgatgcgg cttgaggctg aaggaaaaat gagggccatt + 6241 gtgaatggaa tgcggcctta tcgcctagcg aattttattt tcgtgatctg ggatcggcac + 6301 gagcaattta cttcattcat ctcgaagaca gatcttcagg aaattacaaa ccgccggttc + 6361 aaagcgcgag cggcagagtt ctctgaagag cgacaagcgt ccatctacat tgcaggaatc + 6421 tatgcgctgg taggcctcga cataccgcgg gacgagtgga gcgcggtcga cgtcactgaa + 6481 gacgatttca ttggaaacca gaacaacccg gtcttctgga tcggtctcaa ggctctggaa + 6541 gaaaatggca tgatacgcct tgcccatcga agcagatttc cgacatctgt cgcggcgcta + 6601 gatactcatt cggaaaacac cagccggatc atgaacgatt tgaaaaactg ggctgcgacc + 6661 aggtaattca tagttgaaga gagtctctga accatacagc cattgcgact atcagcgaac + 6721 cgtaaaacat ctcgcgctgc gcaacgatct taaccattcg ttaacttgtc gggatccgct + 6781 cgtgaatacc agcgtcgctt agggagatcg acgggatcac acgagcgcag cgttcacctt + 6841 tttaagggtc tcattgaaac tgtgcagcag ctgcctccct tgcatcgggg caggaagagt + 6901 ttcgtctatg ccggtacgcg tccttccgct ctcgatgcgg gcggacccta gatgtgcgtc + 6961 agcgagcctg aggtcgtaga ttccgaaaag cggtgacatc aggtttccag cttccgctgc + 7021 agtgagcctc gtggaaagta gccgctccaa gagcttgaga gatccgggcc gcttttcgtt + 7081 cttagaaagg gtaagttggg aagcgatcgc atcagcgtcg attctttcta tgaacagccg + 7141 ggtcaggtcc tttgcaagct cgagaatccc atctgttttc actgcctgaa agcggtgcaa + 7201 cctgcgcagc agcgagggga ccgcttcgtt ttctcgaaga agagatctgc cgtacttagt + 7261 tttgaacgcc gcctcgaccg cctccaaaac acttggcatc tcccgttcag gcgcgacggt + 7321 tgcagccggc tgcaccatca tttgcgccgc aaagagttca gaagacacgc ctccttccgg + 7381 cgtgacgtta tatgcgctcc acaaccgctg ttcccattga gggagtaggc ctatatcctt + 7441 ggcgaacaca gtgatcagac caaggttgtt caagccaaaa tgcacgggct cgttggctcc + 7501 tagggcgcca gtttcggcgc tataccaatg cagttttgag ccacggcggc tcagcaacgc + 7561 atccgcgacg gatggattga aaaagagcca ggtgctcgag cctatgagct gaggaccaag + 7621 gactcggtta ccatccgtgt cgagcgcgaa ggccgcggcg tgattatcct tgtcgcctcg + 7681 cacacgcacg ctctggccac cactttgaac ccactcagtc cgccagatgg cccctcgcgt + 7741 aaagaagctg ccggcaggga aggggtagcc agcctcgttg gttgttctgt cccagctgtc + 7801 tcgccctcgc gtttccgaag gcgtgtcttc aaacgcgtac gacggtttag tggcggtgat + 7861 catagtccgt tcgcgatagc tggagcagta aagatccatg ccgcgggcgg cgagatagtc + 7921 gccgagatat tccttacgga tttcaagaac cgtaggcttc ccctcgctat ctcgcttcaa + 7981 tctggcgaca tcgatccagt cttcctcagg acgcttccaa acgtctccct cacgaacgag + 8041 atgcagggcc acgacaaggt ccgtatgaag atgccagatc tcatggtcga cttcgtcgat + 8101 gagctggacg attaccaggt ttacccccag gtcaccgccc tgccagttgc gaaatacgtc + 8161 ggctgatcgg tagccgtcat tttcaacgac ggagcggtgg ggctggatgc caagatcgtc + 8221 ccatgcgagc gcttccacgt ccttgcgccg ttcggccatc acggccgcgc ttccgatccc + 8281 gatccactct tctaaacgaa gaatatccgg gtcatcggag gcaatcgttg aaaaggtaat + 8341 ttgacgaagt ggcacccagg aacctgcctg gagataagga acgtcttgaa ggtggaagta + 8401 ggcttcctga actggcatga actctctcgg tcacatcatc acggacaaca cgactatgga + 8461 ctgtaaaacc tagccgtaaa atcaattgga atcgctgtcc cacattacag gttgtgaaca + 8521 tcgcaaaaaa tggcagcttt acaacaggtc tggtgattcg cgattttcgc cggaatcaac + 8581 cagcaatacg agcagtgtga tggcagttca ttcaattgca taccgaatat ttccggcttc + 8641 agcctgtccg ccggctctct tgcatctcag tcccgctgga cacctctgcg aggttgacac + 8701 cgatcaggtc gggttgagtg cgctgagtga aatcattcag acggcacccg gatcaaatcc + 8761 cctacctctg ctgttcgacg agctcttgtc gccgggcact gcgcagctcg tctggaaggg + 8821 gtcgggaccg gggcgaggtg ttgaaattag gagggcgttt tccggccttt ttgggcggtt + 8881 ctttgcccgc gcatatctgg aacgctacta tggtttcacc tggttttctc cgatcagtgg + 8941 ttcgccatat aatttgagca acaggctgag ggtcgtacgc cagccaggca gagaattcga + 9001 tttgcctgac tggatcatgg ccggtccggg tgtgcttgcc attggtgaag cgaaggggtc + 9061 tcacgcaaaa ggaccggccc cgacttctgg cctgccaggt ccactgcgca ctgcgtacga + 9121 gcagatttca agagtctggg ttcagaaggt ggaccccgcc ggcgtttggg taaaccgaca + 9181 ggtgaaaggc tggggcgtga tgtcgcgatg gggtgtcgag agccctgctc gcagggcata + 9241 tcattgtgtt ctggacccgg acactgaagg ggagccgctg agcggagagg agctcgaaga + 9301 ggccatacag gacgtcgccc ggtcgcatgt tgctcttctc ctcgatgggc tgggaaggcc + 9361 agaccttgtt gataagagag cgagcccagg cttcagtcct cagcaggttt cggcaacgat + 9421 agaagggcta ggagaacgca cgttcatcgg aggtatcgtc aacaacttcg gttttctccc + 9481 gatgtccatt gatgatgccc gcgcggtgca agcgagcctg ccagagcgcc tgcgccccac + 9541 ggtccggttc ctcggattgg aaacagacgt tgtggaacaa tatcgctcgg gctctgtcat + 9601 caaggctcaa ccttttcgta tcgatgcaag tggacccagt ttgagttccg acggaatgat + 9661 gctggcgccg cttgagcgga tcgacccagt tccgtcgaca atctgatcat aacctccact + 9721 accggtgctc cgatcaaatc ggagcgacct ctcgacctaa tttcggaatc gcaagtgaag + 9781 aaacttttta agcagctcgg cctgcacaaa atcgaagtgt gcctgaagct ccaagcgacg + 9841 gaaataatac agcttgagcg aaagtcggaa cttttgccgg attttgactt cgtgtcctac + 9901 ggggggctaa gcgagaaaga ttatccgatc tggcacgaat tcgagttcgg taacgaattc + 9961 ctgttcattc atcggctcag gatgagtgac tatggttact tcgtcgacga cgccccggaa + 10021 gatgacgaca acttcgagct tccgaaatac acctctggaa gggcaggata cggcgaggct + 10081 ggtatcctga aactggaact ccataccgtc gataggttcg ggtgccgggc gatgctccgg + 10141 ggcctgctat ctggttcatc catcgagatg aatatggacg tgcgtttcaa attcatcgcc + 10201 gctggatatc gtgaaggcga aacaaggctt agacatgcgg ccgaggcgat agccgaagga + 10261 tgggcgtttg aagaagaagg gaagctcaaa caagccttct tttcctatta cgctgcattg + 10321 gacagcttca tcgatgcgga acgcatcaag ctcaacggcg gcttggatga tgacgaaata + 10381 gatgaagaga tacatgaaaa tataatcgct cccgacatta ggttgaacga gaagcttcgg + 10441 catgtcgtaa agcgcaacct gcccaccaat ttgaacggac tcaatggtct gagaatttgg + 10501 ggcgaagtct ttggtcgatt taatcggatc acagagaccc gaaacgctat tgcccacaac + 10561 acgaagatcg cagtaatcac tcacgatgac gtcaacgtat gcttctcgac cctggcaatt + 10621 atcattgcga tcgttcagga gcagtgcttc gacgagcccg cgatcttgga gtgctacggt + 10681 ctaagctagg cgccgcggac aaagcgatcc ctttcaagtg gatttagccc gcgaattgta + 10741 tcaaagcatt cgagctgcgc ggccgctccg gcgcgtggca ttgttatagt agctcgatgc + 10801 ctgctgtacc gaacgatgac gcgactgctc catagcctct ggaagaggga taccgcggtt + 10861 cgccgcctcg gtcagatatc ccgatcggat cccgtgcgcg gaaaactccc cacgctctaa + 10921 tccagccagt tccacccgct gcttgatgat cgcattgacc gactgcggat cgatcgcccg + 10981 tcgcgaaaca gtcccccatc gcccgatcgc ccgaaacacg ctgccgctgt cgatcttggc + 11041 ggccaccatc caggcattca gcgcatccac cggccgaccg gtcaggtaga caacgtcgtc + 11101 ctcgtcgccg ctggtcgtct tggtgcggcc gagatggatg gcgagagagg ggagggggct + 11161 cccgccttcg accggaatgg gcggttcgac ggtcagctgc tcgcggcgca gcccggcgat + 11221 ctcgctgcgc cggcgaccgc cggaggcaaa ggcgaccatc aggatcgccc ggtcccgcag + 11281 atcgcgcaga ctgtcggtcg cgcaggtcgc cagcagtctt gccaggacgt cgccggtgac + 11341 cgccttggcg ctcttgcgaa gacgttgtct tggcgcggct cggatcgcca gccgaatggc + 11401 tgatttgagg gcaggggagg cgaacgcgcc gtcgaggccc cgccacttgg tcagcgtcga + 11461 ccagttcgcc agccgccgcc gcaccgtggc tggcgcgtgt ggaccgacgg atttcagaaa + 11521 cccttggctc ctgaggcttt cgtcgacgtc ggccggcatg ccgtgatcgg catccgtctc + 11581 gcggtgtcgt gggtcccaga gatgatgggc aacgaatttc aacaacagcg cctcgggcgc + 11641 cggccaaggc agggaacgtc cggtcgccgc cagaccccag gcttcgaggt aagtgagatc + 11701 ggaggtcagc gcccgcagcg tgttgtcgcc catgccctgg ttgaccaggt ggcgcagcgt + 11761 ctcgacgtcg tggtcggtga gcagttcggc gagttcgtcg cggcgctcga gcggtaggac + 11821 ggcggcgatc gtgtcgagtt cttcggcgcg gcgatcgacg gcagattttt gcagtggcgg + 11881 catttcggtt cctgaaatgg ctacgcggaa gcttcacggg ccggtggcga aggccacttt + 11941 cggtgatttg ccccgcaaaa tcaatctacc atcgataatt ggtagttatc ggtggttaga + 12001 gcgccatccg acaatcatac gatgcgagga accataatat tcatatagag ttcgtttagt + 12061 aaatcgttta ccataatatc aatggcttac gatctcgcca aaatcagcat gacagccttg + 12121 atgcagccgg ctttcgacgc cgccatcgcg ctgacgcgtc tggacgaacg gatcgcccgt + 12181 tcgccggtcg gcgcggggtg gatcgaacgc acccattttg ccgacgcctg cgcctcgctg + 12241 tgggtcgacg gtgaactcgt ccatctcgaa gaccttgtcc tgcacgacgc cacccgcgac + 12301 atccgcaccc cgacgcacga actgaccatc gcccgcgacg ttctacggac ccgccggcgc + 12361 attgccgcgc agtcgccgga ttgggcgtta tcgaccgagg gtatccgaaa tttgcgacag + 12421 acgtcggaca gcaatccggc cggcgctgag gcggggcagc caagcgacgt cattcggccg + 12481 gcggtcgcca tcgatccgga aggggagggg gacgacttcg acgacatcga aaatctcccc + 12541 ggcgttgact atgccgccat cgatgcggtg ctcgcccgat cggaagccgc gatcgagagt + 12601 gcaacacggc ctggcgacgc cggaggcaac agggctgccg aaaaagatcc gatgatctat + 12661 gatctcgact gggacgagga tgagcggctg gaggaatggc gaacggtgct gcggcagacg + 12721 gaaaatctgc cggcggtgtt tcgggcgatc gtcgccctcg atgcctggaa cgagattgcg + 12781 gtgctgcagc attcgccctg gctcggccgg ctgttttccg cgtcgatcct gcgccaggcc + 12841 ggcgccacgt caggcgccca tctcgccgcc gtcaatctcg gcctcaaaac cattcccgtc + 12901 gatcggcgcc ggcaccgcga ccgggagacc cggctgctcg ccatcgccca cggttttttg + 12961 gcgaccgccg agatcggcat gaaggagcat gaccggctgg cgctcgccaa aaagatgatg + 13021 gagcggaagc tggaggggcg ccggacctct tcgaaattgc cggatctggt cgagctggtg + 13081 atggcaaaac cgttggtgtc ggcgggcatg gtggcgaaga cgctggacgt cacgccgcag + 13141 gcggcgcggc ggattgtttt ggaactcggc ctcagagaga tgacggggag ggggaggttt + 13201 cgggcatggg ggattatcta gtcggaaaaa ttcttttgcc gccttctacg acgccgtttt + 13261 cccgtcgccc aggcttcata ggctttagct ttcccgatca gttcagggtg aagcgcttcc + 13321 agggtatcgc ctgagtaggc acccgcgaca gaaaatcaac gagatcacgt cggccacgat + 13381 atgtcgggac gccgtggctg ttttgcgcca tgagaacaat cggcacgccg gggaatgctg + 13441 gctggaagct atacgcaacc tcatcggctt gggggccgct attgataacg tgcggcttaa + 13501 caatgacgat ggcgaatgtg acgccctgct cgcgaactaa tgcgccttca aactgcattt + 13561 cttctctcct ttgatgaaga ggccgcctca ttccaggcgg cctcaggttt tagccacgcg + 13621 gcgggaacgg gtcgcgaccg taggaatcct tgtcccggat ttgtccgttg ggacggtgga + 13681 tgactacttc gctctgctgg ttgatcgcaa tcttgcgagc agcagcagcc gcgtcggcct + 13741 gcgttccatg ggtggaggtc actcgctgat ttccagcgcc ccgcacggcc cattcgccgt + 13801 tatggggaac cacatgttga tttttatttg tcatcgcatc agattccttt ttcttgagcg + 13861 ccatatcaat ttacattaat caaaatttaa ttgtcaatac ggctattatc gtttacgaat + 13921 tggaatagaa attttataca tggagtgatc atgaccattg agatgaaaaa tgtgctggag + 13981 ttggccgatg cggcactttc cgctgactac acgcgcgtac gcagagctgc gaatgcgctc + 14041 gcgcgagatc tcgacaagaa tggtgagact tccattgcca aggaactgaa ggccctggtg + 14101 cggaagaggg gggtgccttt gaaggcgtct ggctacgtgg aatcgctacc cgtggactcg + 14161 aaatcccgct tgccgcttgt cgaggagcag acctggcccg atacgcctat ttttctgaac + 14221 gaaggcgggt ggcatgtgtt cagcgatttc atcgcggatg cccggcgcat tgatgacctg + 14281 agcgccaaag gattggcgtc acgtcttggg cttctgctct ctggcccgcc aggaacaggc + 14341 aagtctctcc ttgccggcca tatcgctgcg cagttgtctc gtccgctata tgtcgtccgg + 14401 ctcgactcgg tcatctcctc tttgctcggg gacaccgcca aaaatatccg ttcggtgttt + 14461 gatttcgtgc cggcgcgaaa cgccgttctt tttctcgatg agatggatgc agtcgcaaag + 14521 ctgcgtgacg atcggcacga gttgggcgag ctcaaacgcg tcgtcaatac tgtcattcaa + 14581 gcgttggacg ggctggaccc gagctcgatc gtggttgctg caaccaatca tgcgcatctt + 14641 cttgatcctg caatatggag gcggtttcct tataaaattg aactcggtct tccggatgaa + 14701 agcgttcgcg ctgacctttg gcgccatttt ctttttgagg acaaagatga ggagggtcgc + 14761 gccgagctat tcggcgtcgt gtccgaaggg ttgtcggggg ccgatatcga aacgatgagt + 14821 ctgtcggcgc ggcgtcatgc cgttcatgag tcgcgtaaca ttgattttgg agccgtcgtt + 14881 gcggctctgc tggagccgcg ttcaggccgt actgtccctg tgcagcggca gccattggat + 14941 gcggagcaga aacgccaggt agcaatagcc ctcaaggaga agtacgccat tgggggggcg + 15001 gataccgccc gcattctcgg agtttcacgg caagccatct atgcttacct gaagcaacag + 15061 gaaggggagg tgtagtatgg tggagccgag agaccaaccc cttctctatc ccgttctgag + 15121 ccttcagatg gatccggcgc tgcggagtcc gacggggcgt ggcaaaggca tcgacagcat + 15181 cgtcaaggag cggctcggcc ggcagcagga cgtgctcgcg agcgaaacac gcgacattta + 15241 tgaacaccgg acagagttgc ccacatactc gggtttaacg cacctagtcg ttcggatgtt + 15301 cagtgaggat tcccttgctc caacgcacac ccccgatgac ttgttctcgc agcgtcatgg + 15361 atgcaggctc gttgctcctc ttcctggcgg ctatttgatc gaggctgatg tcaaagaatt + 15421 gcctcgcctc cttcacgcaa tcgaacatcc aatcggctat gcggtgcaag ccgacatatc + 15481 ccgtgtttcg tctctcggac aatttgatgc gaaaagtcgt ttgcgcggtc gatcggtcaa + 15541 tgaactctgg aattccgcac cggaagatga tgacggacgt ttgttcgttg tttggcttgc + 15601 gccgttccgc gatcgagatg ccaaggccga agtcctcgag cgtatccagg gctttgccaa + 15661 tgagaggctg gtcatgccga catttaccag cgtgcggctc acgctcggga catctgagga + 15721 aacggaagaa ccgcgttctc tcacgacacc acgacaatcc agcattgcac gggcaatgcg + 15781 agattatcga aacaccggcg tcggacgcgc tacagttcga attccgaaca aagaagggtt + 15841 gagacagctc atcgcgtcag gcgcttctta tcgcattgat cctgtgcggc cgatcagagt + 15901 ggcagcacct ggcgaaggag cggagcctcc tgcaccagta atcgatgaga acgctccaat + 15961 cgttgccgtc gtcgatggcg gtctgcatgc gcgaagctac actgcggctg aagctttcag + 16021 ggcgacacct ttcgtcacga acgcccaagc cgacaagccg cacggaaata gcgttagctc + 16081 cctcgttatc cacgggcacg cctggaacaa aaatcgctcg ctgcctgaac taaactgccg + 16141 tatcggaact gttcaggccg tcccccatcg gaacgccaat cggcgcttcg atgagcggga + 16201 gttggtagac tatctggcgg aggttgcgcg tctttaccca gaagcgcgcg tctggaacat + 16261 ctctgccaat caggatggcg ccggtttgga tccctccgag gtcagcgttc tcggccatga + 16321 aatcagtctt ctggcgagat cggctggatt tcttccagtg atctccgttg gaaatgtcag + 16381 cccggacaac aattcccggc ccaatccgcc agctgattgc gaagccgcga ttgtcgtagg + 16441 tggccgtcag gcactcccgg atggaacgcc aggcgaccgt tgccctgcct gtctccccgg + 16501 tcccggcccc gatgggatga tgaaaccgga cctatcgtgg ttttcgaatc tcaggatgct + 16561 gggtggggtc gtcgatacgg gaagcagtta cgcaacgccg ttggtgtcgt ctttggcagc + 16621 ccacactttc gatagcttac gggaaccaac gccggatctg gttaaggctc tcctcatcaa + 16681 ttcagccgag cgaagtgagc acgatcccaa cctcggttgg ggaaccccgt atcaagggca + 16741 ccttccttgg acctgcgtgc ccggcagtgt cacgcttgcg tggcgggcac aactcgagcc + 16801 gggaaccgca tactactgga acgatattcc catccctccc gagctggtgc gtgacggaaa + 16861 attgttcgga cgcgccagct tgaccgctgt tctgcggcct ctcgtgtcgc cgtttggcgg + 16921 cgctaactac ttcgcttccc gcttggagac atctctggcg tatcagtctg gagcggataa + 16981 gtggccgtcg ctacttggct caatgaaaga gtcgacgctt ccggaaaacg acgctcgcga + 17041 tgagcttcgc aaatggcagc ctatcagacg gcactgcagg gacttttcca aggggagcgg + 17101 gcttgggttt tccggtccac acctgcggct ttatgcccgc gtgttcatgc gtgatctcta + 17161 ccaatttggg tggacgcatc acagccaggc gggggcgcaa gaagttgcgt tcgtgcttac + 17221 gctttcaagc gcggacggag aaaagtcgat ctatgactca actgctcgtg cgctcggaaa + 17281 cttcgtcgag agcgccgtct tgaatcagga catcgaggtc tcgaatgagc tttaaggtgc + 17341 ggttaaagtc tctaaattgg taagattttc tcaccacgtc cttcagtttg ggcgtctcat + 17401 tgccttcatc agtttgcgat tgccgtctcc gacgggtttg aatttcaagg gcacacactt + 17461 tccaggaaga tactgtcaag cgcattcata caacgacacc cacagccagt agctgtctcc + 17521 agaagcgaac gcaattgcat cccgcaaaca tcgatttgac tttgacatct tcagagaaaa + 17581 gcaaatacta ctttggggtg acgggttcgt gcgactagtg cagctccgag gagacgtcac + 17641 ggttgcttgg gacggcgcca gcgataacgc tggaatgcgg gaagtgcgag gctggggatt + 17701 tgacggcgag gggacaatct gcgtggccga tgctcacggc cgcgaaagca gctgcagatc + 17761 attgtgacat cgacaccgtt tgcattgatt tgaatccttt cgagcacaag gaagtcgcga + 17821 tcttcaggaa taggctcgct gcattgtcgc ctcatgaccg caactattgg cgcggcacgc + 17881 tctacagccg cgaccagagc tttgccgtca tcagaagcaa ccgggtattc ttgcttgtca + 17941 tggtggctgg atcggtcgtt ggagcgttta tcggcggaca gctccccggc attgtcccga + 18001 gcgccgtgct gctgccaggg ctcactttga tcctcgtgat ctccgcgatc aagatatggc + 18061 ggcactcgtg actacgagcg ccgtgcctgc aaccgcatga ttttcagcgg ttctgcgacg + 18121 ggcgtttcta tggcataagg tgttggagac cagcacatgg gcaacggtag tccacaatcg + 18181 cgaagcgagc ggacacacgt cgggagaccg gacccgcagc gtcgatgctt ttcgttcatg + 18241 ggttgtaaac agcccgcagt tcacgcaccc cgcgtccaag ccgccggcgc agcagcgtgc + 18301 gtagggtgac cccgtcggca tagcctatct ccgtggcgat cgactcgatg tccttgcggc + 18361 ttgtacgcaa gaggtgaacg gcccgctcca cgcgcaggtc ctgaaaatac gatagcggcg + 18421 acttccccag gacagcctcc atgcggcgtt gaagcgttct tttgctgaca tggagggcgt + 18481 ccgccgccac gtcaagagag aacccttcgt gcaaccgatt gcgcgcccag cgttcgaagt + 18541 gctggatcaa gggatcggca ttcgccaggt gcattggtat catatacggg gcttgtgagg + 18601 gcctcgtatc cacgagcaga tagcgtgcga ccatgtcggc aagcgcagga ctggtttggc + 18661 gcaccagcca aagggcgagt tccagatgcc ccatcgcagc gccagcggtg acgacatcgc + 18721 ccgatggcac caacattcgc cgctcgtcca ggcgaacatc aggatattgt tgacgaaata + 18781 gcggccccaa ccaccaggcc gtcgtcgcct cgtgaccagc caagaggccc gactcagcca + 18841 gcagaaacgt cccgatgcac gccgcaccga tccgcgcgcc gtcggcatgc caggtttgca + 18901 gttggtcagc tgcatcgtgg gactcgcgcc catccagtag ctctagaagc ggcccaggca + 18961 ttttggtccc cagcgctgga acgatcaccc agtcgggctg cctgattgtg gctgcgggtt + 19021 gcaccggcac gctcaggcca agcgaggtct gtacggtctc cctgacgccg gccatgacgc + 19081 attcgaaccg tgggatgcgc aaggactcaa gctcagccag ttcgttcgca gtggcgagtg + 19141 tgtcgagcac ggctgccaag ccggtgtcga acacattgtt gtgggcaagg acgacgatct + 19201 tcatggcgct tatgataccg aaaatgtcat ttgcggcaat accgttgagg aaactcggga + 19261 cctaatttgg tcttcgtcgg atgttcgacg tcagtgaagg agttcaatgc catgatccag + 19321 tatgccctgt ttgcccgtct cgaagccaag cccggcaaag aggcagaagt cgaaaacttc + 19381 ctataggcag cgctcgaaat ggcacgggac gaaggcccga ccccaatctg gtttgccttg + 19441 cgactctcgc cctcgacgtt cggcatcttt gacgcgttcc acgacgaaaa ggcgcggcag + 19501 gaccacctgt ccggcccgat cgcgcaaacc ctgatggcaa aggcaccgga tctgttcgct + 19561 acggcacccg ccatcgagtt gatccaagtg ttgggcctga agaacgaaac gactgcgaac + 19621 tgacaatcgc gtcgaggaca ggcaatgacc tcgaccaatt tagaaagcaa tcacatgacc + 19681 ctgaatgtat ctacccggct ctcgctgctt gtcgcgctcc tagccagcgc gactggtctt + 19741 gcgtccgcgc cggccgttgc cggttcgacc gacaagttcg agctttccag ccctgacatc + 19801 gcgcccggca gcaaaatcga cgataaattc gtcctcaacg gctttggctg caagggcgga + 19861 aatatctcac cagccctgca atggaaaaat gccccggctg gcaccaagtc atttgtcctg + 19921 caggtctatg atccggatgc acccaccggc agcggtttct ggcattggac ggtcaataac + 19981 atcccggcca acgtcacgca gctgacccag ggtgccggca atgccccggc aaacctgccg + 20041 gcaggcgcct atggcggcgt caacgacttc caggacacgg gtgcaactgg cggcaacggc + 20101 aattatggcg gcccgtgccc gccggctggc gacaagccgc accggtacga attctcgctt + 20161 ttcgcactcg cggttgatga catcgacgcg gcagccggcg ttccaaaaac tggcacggca + 20221 gcgcttcatg gcttcgtcct gaacaagggc cttggcgaca aattgctggg caaggcgtcc + 20281 ttcactgccg cctacggcca ctaatccatc attaaaatct ggagattatt atgaagaaac + 20341 ttattctcgt accgatcctg ctggcctcga tgaccggcgt tgtcttcgcc gccactccct + 20401 tcaagacggt caagaccgaa aaaggcgtgg tcctttccgg cgaaaagggc ctgacgcttt + 20461 ataccttcaa gaaggatgaa gcgggagcct cgaattgcta cgacgagtgt gcccagaact + 20521 ggccctcggc tattgccgcc ggcaacgcca aggccaatgg cgcctactcg atcgtaaccc + 20581 gcaaagacgg cacgaagcag tgggcaaagg acggaaagcc gctctattat tgggtcaagg + 20641 acgccaagca aggcgatgtc accggcgacg gcgtcggtgg cgtttgggac gcagcgaagc + 20701 cttgatttcc agcagcggcg cggaagcagt agcttccgcg ccgcctttgt atgaagtgct + 20761 tttcagacga tcgatccgtc tgcccctcgg taagcttggc aagccatcgc cacctgctat + 20821 tggcccagat ggagccgctg acttacccgg cacggtctct taacctgatc attcagcggc + 20881 cacccgttca ttgcaaggcg cgcgctatct tgccgctaat ctggtctgcg tgaggcattt + 20941 gggacagcca tagtaatccg gcatcggctc gtagcaaaat gatattggac cggtatcaga + 21001 gcctggatta ccgtgacaat cattcaaggg ctcgtccctt gacaggcgga atggagccgc + 21061 ctttgggaga tataggagga ggaaatcttg atcgcagctt tggtgctgtt ccccgttccc + 21121 gctggaacga cgatggagca gataaaggaa gcgtacgagc tgtcggcgcc tcgctttacc + 21181 ggaatgcccg gtcttcttag caaacactat ctgttcgacg gcgccggtca gggcggagcc + 21241 ttctacgttt ggtcgacgcg ggctgatgcg gaagcactat acaccgaaga gtggcgccag + 21301 tcgctgacgc aacgctatgg cgcgccaccg accctttcga tctacgaagt gccggtcgcc + 21361 atcgataacg cggcggcctc gcgcaccttt tcgtagaggc aaagcacgtg tgccccgacc + 21421 gccgcgttgt aagagcttcc atcggccagg cggttgtcca tcttgtgaag ccgccgaatg + 21481 agatttgtgg tccgcgacat tagcggcttg atggacggga gctggcgcgg agcgccggac + 21541 gggttgtcga attctatttc aattcaggtc ggggcgcgaa gcgacttcgg ccggtttggt + 21601 aaggtacggc gcatctaaga gagaggccgg ccaaacgtat ccagggaggt tgatgatggc + 21661 ggttctgcta tctggaaatt attctttgcc agcgagtcag gctgaggttt acgccgcgct + 21721 caatgatgcg gacgtccttc gcgaatgtat cccaggctgc gaggagctgg aagctcgtgc + 21781 agacggtata ttcgcggctg tggtccggct cgagcttggt cctttgaaga cgcgcttccg + 21841 aggtaaagtt cgcttggagg atttggatcc tccgaacggc taccggatca taggcgaagg + 21901 tgacggcggc atcgccgggt tcgccaaggg aggtgctgct ctgaagctgg caccagacgg + 21961 cgaaggcgga acacttctgt cctacgaggc cgaagccaac gtcaacggca agatcgctca + 22021 gctgggacag cgtttgatcg ccagcaccag caagaaaatt gccgatcgct ttttcgagac + 22081 actcgtgaag cgtctgcaaa atgagaccgt cacggctgaa gcgaaataga aattcccggt + 22141 ttcaccgacc cctcattcaa tcgacgaatt gaaaggtaga agtatgccta aggttacgga + 22201 attaacgttc tttagtgaag gcttgaagct caagggcttg ctctatgagc cggacgatct + 22261 caagccggga gaaaaacgcc ccaccgtcgt gtgctgccat ggctacaccg gcatgaagga + 22321 cgtatatctt ctccccgttc ctgagcgctt ggctgtgcac ggttacgtgg cattcgcatt + 22381 cgaccatcgc ggcttcggca agagcgaggg tgtgcgtgcg cgtctgatac cgcccgagca + 22441 ggtggaggat attcgcaacg cgatcacctt cgtctccact ttgccctcag tggataccga + 22501 ccgcatcgct ctctacggca cgtcgttcgg cgggggcaat gtcgtggtcg cgaccgcaac + 22561 cgacgatcgc gtccgctgtg tggtgtctgt tgtgcccgtt ggcaacgggg agcgctggct + 22621 gaaaagcctt cgcaagcact gggagtggct gaagtttcag gacgtcctgg ccgaagaccg + 22681 ccgccagcgg gttctgacgg gagaatcccg ccgcgttgac gtcacggagc tgatgcctgg + 22741 cgacccccac tcacggcagg tcatccagga gaaggtgaag gcggcagaaa cctatacgca + 22801 aggctaccct ctggaaaacg ccgaagcgac tctgcggtgg aaaccggagg atttcgcgca + 22861 cgccattgcg ccccgtccca tcctcttcat gcacaccgaa tgcgacggct tggtgccgat + 22921 cgacgagtgc tacgcacttc attcaaaggc caaagagccg aagaagctcg tcactattcc + 22981 gaacgccgac cattacgacg tctatcaatt cgtcaatccg gacgtgttcg agaaggtgat + 23041 cgcggaatcg atcaagtggt atgatcgcta tctcaaggcg gatgcgccgc aagagcgcat + 23101 cgcagagatc gcttaaggga ggcaaaccaa cgtggcgcta ccggtttttg actacttcgc + 23161 acccaaaagt atcgaagagg cttgcgcagc actggcatcc aatccggatg gggcaaaatt + 23221 gctcgccggg gggcagtcga tactgcgtgt catgaaattc cgcatcatgg cgcccgagct + 23281 tctggtggat gtaaaggcca tacccggcct gcgctacatc gaaggtgacg ccgatacgct + 23341 gcgtatcggc gccacctcaa cacagagcga cgtgttgcgc aacgacgtcg tccggaagga + 23401 attcccgttg ctggccgagg ccattgccag gatcgcaacg acggcggtcc gcaacaccgc + 23461 gacgatcgtc ggcaatatct gcgtcggaca tactgcgagc gacccgtctg cggccttgct + 23521 cgccctcgac gccgaacttg tcgtcgtgag ccttgagggg gaacgcatcc ttccaatcag + 23581 cgagttcttc gtgggtcata tgtcaacttc cctcgatgcc gctgagctgg ttcgcgaagt + 23641 gcgcatacgc agacgcaatg acaagccggg aatggcctat ctggcgcatg ccggaagagc + 23701 tgccatggaa accccgcttg ttgcggcagg agcaatcgtt tcaaccagga acggcatctg + 23761 cagtagtgca accatcgcac tcgccggagc cgatgagacg cctgtccgta tttcgcgagc + 23821 ggaggaggct ctcatcgggt gcaagctcga cgatgtggcg atccttaaag cggcggccat + 23881 cgcggcggag gactgttcgc cagacactga cgtctacgcg tccggagagt accgacgcag + 23941 gcttgtggga gtttatgtgc gcgacgcctt gcgggcggca gcaagccgcg tggcgtaggc + 24001 tttaggagga ggaaccacat catgcgcaaa aacatcacgc ttgtgatcaa cggggcatca + 24061 cattcgctgg atgtgccagc gaacacactc ttgctcgacc tcctgagatg ggaagtcggc + 24121 ctgacaggca ccaaggaagg atgcggtgag ggcgtttgcg gctcctgcac cgtcaacgtc + 24181 aacggcgacc tcgtcagatc ctgcttgacg ctagcggtcc aggtggacgg caagtcgatc + 24241 acgacgatcg aaggcatggc cgatggtgat acgcttcatc cgctccagcg gaagttcctg + 24301 gagctcggcg ctgtccagtg cggtttctgt tcgccaggct tgatcgtcac tgccgacgca + 24361 ctcttgaaaa gcaatccaga cccgacggaa gccgaagtgc gcgatgcact caggggcaat + 24421 ctttgccgct gcaccggcta cgtgaagatc atcgatgccg ttttggcagc tgcaagcgaa + 24481 atgcggagcc acgcacatga ataatcccga tcaagaattc aacgtcatcg gcaagaacgt + 24541 catccgcgag gaaggtccgg gaaaagtcac cggccttggc aaatacgcca tcgacctcga + 24601 atttccacgg atgctttggg caaagatcaa gcgcagtacg cgccctcatg ccaagatcat + 24661 caacatcgat atcagccgcg cgcaagcctt gccgggtgtt catgccgtca tcgtcgacaa + 24721 ggactgccct cagacgttgt tcgggtttgg ctgctatgac gagccgctgc tcgcccgtgg + 24781 aaaggtccgc tatatcggcg agccggtggc agcggtcgca gccgagagtg aggcgattgc + 24841 cgagcaagcc tgtgatctga tcgagatcga ctatgaggac ctgccggcaa tcttcgaccc + 24901 gtgggaagcc ttcgaagccg atcccaaggt gatcatccac gaagaccagg ccaactaccg + 24961 ccgcgtgcct atcggacccg cgcaatacga tccgaagcac cccaacgcgt tcggttacta + 25021 caggatccgc accggcgagg tgtcgcaagg ctttgcggaa gcggatgtcg tcctggagaa + 25081 gacctattcc aacgccatga tggcgcacgc caccatggag cggcacaatt cgatctcgct + 25141 gtgggacgcc gacggcaagg taacggcctg gtcgtccgct caggccgcct atccgctgct + 25201 gaaccagatc agcgaggcgc tggacatccc gcattcgcgc gtccgggtca tcatcccgaa + 25261 atatgtcggg ggcggttttg gcggaaagat cgaaatgaag gcagaaggct tatgcgccgt + 25321 gctttcccgc gccgccggcc acagacacat caagatcatc tacacccggg aggaatcact + 25381 ctgctgggca ggggtgcagc atcctttcga gatgcgtatc aaatcgggtg tgcgcaaaga + 25441 tggcgttatc acggcgtgcg agatgttcgt gctcgtcaat ggcggcgcct acgctcagca + 25501 cggcttcctc gtcacccgcc aggcgagtta cggtccactt ggctcttacc gcttccctca + 25561 tttcaagctc gacaactacg ttgtctatac caacaaccct ccgggcgtcg cctatcgcgg + 25621 tttcggcaac acccagatcc acttcggact ggagtctcac atcgatgagc tcgcgcacgc + 25681 aatcggaatg gatccttacg agattcgccg caagaatgtc ctcaaggaaa atgagatcaa + 25741 cgcagccggc gaaatccagc attcggtcgc gggcgccgag ctgcttgacg agataaaggc + 25801 tggtctggag cgtcacggcc ccttgcaacg ggaagacgga ccctggcgcc gcggcagggg + 25861 catcgccttt gccaacaagg acagcgttgc cccgtcggca tcctctgcca tcgttaagat + 25921 ccataacgac gaaacagtcg aaatccgcca cagcgccggt aatatcggcc agggcagctc + 25981 gacaacgctt atccagatca ccgcggagtt cttcaaggtc gggcctgagc gggtcaggac + 26041 ggccgaagtc gacacttggg tcactcctta cgaccagctg acggggtcga gccgccttac + 26101 attcgcggcc ggcaacgccg tcctgatggc ttgcgaggat gtgaagaacc agatcctgac + 26161 gatggcggcc caaatgatgc aggcaacccc agaagagctc gatctcgccg atatggtcgt + 26221 cttcgtgaag gagaacccgg gtcgctccat gcgggtcaag gacctcttcc gcacggtgtt + 26281 ctttaccggt tccttcctgc cgactggcgg cgagttgctg ggcaaggcga ccttcacagt + 26341 gccctccagc aagatcgacc cggagaccgg gcatgccgcc aacgacggta tgcgcaagat + 26401 cttctccttc tgcacccgcg cggcccaggc cgtggaggtg gcggtcaata tcgaaactgg + 26461 ccaggtcaag ctggagaaga tcgcgatcgc aaacgatctc ggcaaggcga tcaacccgat + 26521 gtcgtgcgaa ggtcagatgc atgccgcaat ctcaatgggg ctaggccagg cgatctcgga + 26581 ggagctgcag atcagcgagg gaagtgtcgc caacggcgac ttctcgtcct atcgctttct + 26641 gacggcaaaa gatgcgccgt cgaacgatca cgtttcgacc cacatcgtgg agatccccca + 26701 gttcgacgga ccttacaagg caaagggctt ctcggaagca acgacgtcgc cgacggcgcc + 26761 cgccatcgcg aacgcgatct tcgacgccgt cggccttcgt ctccgacaca tgccgatgac + 26821 gccggagcgc gtcctcgagg ggctcgatcg cctgacttct gcagatagag actgacactt + 26881 cagggaggaa accatgtcag attttacgaa acgcgaattg ctccgcgtca tggctcttac + 26941 tatgggagcc ctggccttga cgcagcctgc ctggtccgag gatcagccga tcaagatcgg + 27001 ctcgagcatg gccttgtccg ggccgcttgc tggaggcggc cggcaatcgc agcttgcatt + 27061 gcagatgtgg gtggaggacg tcaattcgcg cggcgggctt cttggccgaa aggtggagct + 27121 agtcacctat gatgaccagg gaagcccagc acagagcccc ggtatttttt ccaagctgat + 27181 cgacctcgac aacgccgacc ttctgattgc gccctacggg accgtgccgg cggctgcagt + 27241 tatgccgctg gtcaaggaac gcggccggct tctgatcggg cagatcgggt atcagatcaa + 27301 ttccaaggtc caccacgaca tgtggttcaa caattcacct tggaacgacg cggaaagctg + 27361 ggtcggcggc ttcttcaagc tcggcgaaac ggtcggcgtc aagaaagtgg cgttcctggc + 27421 ggcagaccag gagttttccc agaacatcct cgcaggcgcc aaggcgctcg cgggcaaagc + 27481 cggtttcgag acggtctatg agcaaaccta tccgcccacg accgtggact tctcggcgat + 27541 gatccgcgcc attcgcgcgg cctctcccga catggtgttc gttgcttcct atcccgcaga + 27601 ctcaacggcg atcattcgcg ctgtcaatga gatcggcgtt ggatcttccg tcaagctgtt + 27661 tggtggcggg atggtcggac ttcaatatgc ctcggtgatg caggctctgg ggtcccagct + 27721 gaacggggtc gtcaactatc acacctatgt ccccgaaaag accatggcgt ttcctggcgt + 27781 gaaggaattc ctggaccggt atgctgaaaa ggccaaggca gccaaagtcg aaaccctcgg + 27841 ctactatgtc gctccattct cctatgcgtc gggacagatc ctggaacagg ccgtcaaggc + 27901 aaccggaagc ctggataatg ccgaactcgc aaaatatctg cgcaccaatg aggtccagac + 27961 gatcgtcggc ccaatccgtt ggggcactga cggagaatgg tcgcagcccc gggtcgtgat + 28021 ggtccagttc cgcgacgtca aggacggcga tgccgagcaa tttcgccaag agggcaagca + 28081 ggtcatcgtc taccccgaca aatataaaac cggggatctg gtttcgccct taagcggggc + 28141 gcagggcagg taaacctgcg ctgccgaggg ggaattgacc atggtttcga tcgatctttt + 28201 gattgaaggg cttgtcttcg gtgttcttgt cggctgcttt tatgcagccg tcagcatcgg + 28261 gctttcaatt gcattcgggc tgcttgacgt gccccatatc gcccacgctt ccttccttgt + 28321 gctggcagct tacatgacct tcctgctcgg atctttcggg atagatccgc ttctggccgg + 28381 agcccttatc ctgatcccgt tcttcttcct gggcgccgct gtctaccgct tctactatga + 28441 ggcgttcgag aaacggggca ccgatgccgg ggtacggggc atcgccttct tcttcggcat + 28501 tgccttcatt gttcaggtcg tgctttcact cgtcttcggt ctcgaccagc aaagtgtcag + 28561 cgcaccctat atcggcagca gtctcgcgct tggggaaatg cgaataccct ggcgcctgat + 28621 tgtcgcgctt gtggtggctg tcggactggt tcttagccta aacctttacc tgtccagaac + 28681 cttccgcggc cgcgcgatcc gtgcggttgc tcaagacccc tgggcgttaa aggtgattgg + 28741 ggcgaacccg gttctgacca aacagtgggc tttcggcctt gccactgccg ctacagcggt + 28801 cggcggggcg ctcctgatca tcgttagccc tgtcgagccc agtctcgaca gggtctatat + 28861 cggcagaact ttctgcgtcg tggtcctggc aggcctggga agcatgaacg ggacattgat + 28921 cgcgggcatt ttgctggggg tgatggagtc gctcgtacta acagcgttcg gagcctcatg + 28981 ggcgcccgcg gtcgccttcg gacttctcct gctcgtattg ggcttgcggc cacaggggct + 29041 ttttggacga tgagaaacaa ctcgatccca ttctggttac tcgcactggc gctcccggtc + 29101 ctggccttca tattgccgaa gctcgggctc aatgaatatt acctctatgt cgggtacgtc + 29161 attcttcagt acgttgtgct cgctacggcc tggaacattc ttggcggtta cgcaggctat + 29221 gtcaattttg gcacgggcgc ctttttcgga cttggcgcct acacggctct ggtgctgatg + 29281 aaggcctttg gtgcaccact tccgattcag atcgcgggcg ccgccatcgt tggcgcgctg + 29341 cttggcatcg gagccgggtt actgacgttg cgcctcaagg gcatcttctt ttcgatcgcg + 29401 acgattgcag ctgcgatcgt gatcgaaacc tttattctca actggcgctt cgtcggcggc + 29461 gcaaccggta tgcagatcat ccgcccggag gtgccccttg gatttgatac ctatacacgg + 29521 ctgttgctgt tcgtgatgac ggtgctgacg gttattgcaa tcatcgttgc tcgctacatc + 29581 gagcggtcct ggctcggcag aggcctgcat gccgtgcgcg atgcggaggc ggcagctgaa + 29641 tgctcgggcg ttccgacatt gcggctcagg ctgatcgcct gcgcgatatc gggtgccctc + 29701 atggctgcag ccggggctcc gttcccgcta tatacgagct ttgtcgagcc gagttcgacc + 29761 ttcagcctga attactcggt catggccctt tcgatggctg tggtcggcgg aatgtcgcgc + 29821 tggtggggac ccgtgctcgg cgcaatcctg atcgccagca gccagcaact cgctgcctca + 29881 gcaagcccgg agcttcacct gctggtcgtt ggactgctga tggtcatctt cgtgatcatg + 29941 gcgcccgagg ggctggtcgg attggcaaaa ctcgcccgca aaagcctgca accaggcgcc + 30001 aagtcaatcc gcctcgtcga aggagcaaaa gcgcatgact agccttttga aggtggacaa + 30061 tgtcacgaaa cggttcggcg gcttcacggc ccttaccgac gtcaatctcg acattgccaa + 30121 gggcgaacgc ctcggcctga tcggcccgaa cggctcgggc aaaacaaccc ttatcaactg + 30181 catttccggg gtattgccga tcgaagcggg agcaatcgca tttgatggcg cagacatttc + 30241 gaagcttgcg acctatcgcc gagccaaggc aggtctggcg cggacttttc agatcccgaa + 30301 gccgttccac tcgatgacgg tcatcgagaa cctgatggtg ccactcgagt acatcgttca + 30361 ccgtctggtc gatgcaaaaa accaaaacgc ggctcattcc gaagcgagcg atttgctacg + 30421 ccgtgtgcgg ctcgccgatc gcatgagcgc gccagcagga cagttgtcgc aggtggagct + 30481 gcgcaaactt gagctggcgc gggctgttgc cgctcgtccc aaattgctga tctgcgacga + 30541 ggccatggcc ggccttgcca ctaaggaggt ccacgagatc ctcgatattc taatggacct + 30601 caattcttcc ggcatcacca tcgtcatggt ggagcacatt ctgcaagcgg tcatgcggtt + 30661 ctcgcaacgc atcgtttgcc tgactgcggg acggatcata tgcgatggcg cacccgccga + 30721 tgtcatggcc aatccagaag tgcggagggc atatcttggc agttaagata gtggtgcgcg + 30781 acctgcatgc cggatacggc acggtgcagg tcctacatgg tctgtcaatc gaggctcggg + 30841 aaggcgagac cgttgtcctc ctcggtacca acgggaatgg caagagcaca ttgatgaaat + 30901 gcctgatcgg cgacgtgcgt ccaacgcaag gctcgatcac gctcgagctt gacggtcagg + 30961 cgatcgacct gacccatctc ggcaccgatc aaatcgtcga atatggcatc agcatcgttc + 31021 cggaagggcg gcggctattt ccgcagctga ccgtcgagga gaatttgctc ctcggcgcct + 31081 atcgcaaggc ggcgagaagc aagatcgccg ccaatctcga attttgctac ggcgccttcc + 31141 caatcctgaa ggaaagacga cggcaactgt cgggatcaat gtcgggcgga cagcagcaga + 31201 tgctggcgct tggccgggcg atcatgtctt cgccccgcat cctcctcgtg gacgagcctt + 31261 ccgtcggtct ggcgccgatc atggtgtccc aggcgatcgc gaagatcggt gaattgaagg + 31321 agcagttcgg gctaacggtt gtcatggccg aacagaattt tcaggaagcg atgcggattg + 31381 ctgaccgggg ctatgttctc gtgcacggtg aggtcgcctt ttccgcagag accgcagctg + 31441 agcttcgcga cagcgagctt atcagccaac tctatctggg aggttgaggc taaaagcggt + 31501 cttaactcct gagtcatttc aaaaacccgt caactggatc agctccttcc tccgcagagg + 31561 tttctatgtc aaatttccgc atttccagca aactgatcct gctgacgacg ggtcttgtca + 31621 tcgtcttcgc cttggctgcc gtattcctga tcgaggcggc gacggaaaca atctacagcg + 31681 agcgcaagga cgcgctgaag acgcaggtcg acatcgccta ttcgatcgtg acgacgcttc + 31741 atagtgacga aactgccggg aagatcagcc gcgaagaggc gatcgcgcag gcaacggcgc + 31801 ttgtttcgca gatccactat gagcccaatg gggtcatttt cggatacgac tattcgggcg + 31861 tgcgcgtcat caaccccggc aatgccgggg tcggcaagaa tttcatggcc ctgacagaca + 31921 agaacggaac gccactcatc aagaacatca tcgatgccgg cagagccggc ggtggtttca + 31981 gcgaatattt atggccgaag cccggcgcgg gcgatgatgc gacgtccgta aaagtttcgt + 32041 actcaaaagc gtttgacccc tggcagttgg tgctcggcac gggcgcctat ctggatgaca + 32101 tcgacgaaaa gattaatcag gtttacgttc aggcgttagg gatcgtggct gccgtccttg + 32161 ttatcagctt gatcggagca cttgcggtgg tgcgtggcat tacgcgaccg ctcactcgca + 32221 tacattcatc tttatcagcg gtctcaaacg acgacgtctc gattgcgatt ccgcacacga + 32281 acctgacgaa tgaaatcggc atgatggcaa gggccacaaa gatcctgcag gacaaagttc + 32341 gggatcgtct gaccatggaa cagcgtgaag cagaccagca gagattgatc gaacaggagc + 32401 gttccgaggc ctcccggatc caagaagagg aagccgcggg acaggcacat gtcgtcaaac + 32461 aactgagcca agcactcgcg gcactttctg aaggcgatct gacggtgcgc tgctcggatc + 32521 tcggaagccg ttatgatgtc ctgcgcgcca atttcaactc tgccataagc aggctccttc + 32581 aggcgatgca ggcggtcgca cgcaatgccg gtgccatcac ggcgggctcg gagcagatcc + 32641 gcagtgcatc ggacgaactg tcgaagcgca cggaacagca agcagcctcc gtcgaggaga + 32701 cggctgcggc cttggaggaa atcacgacga cggtggcaca ggcgagccgg cgcgctgaag + 32761 aggcaggacg tcttgtgcgc cggacacggg agaacgccga ggtttccggc acggtcgttg + 32821 gccgtgccat cgaggcaatg agcaaaatcg aagcctcttc agcagaaatt tccggcatta + 32881 tcggcgtcat cgatgagatc gcttttcaga ccaatttgct ggctctcaac gccggcgtcg + 32941 aggcagcgcg ggcgggtgat gccgggaaag gctttgcggt cgttgcccag gaggtgcgtg + 33001 aacttgcgca gcgctctgcc aaggccgcgc aacagatcaa tcagctgatt gcagtttcga + 33061 acgtccacgt ccagaccgga gtggctttgg ttggtgaaac aggcagtgcg ctcagtacca + 33121 tcgtgtcgca ggtcaagcag gtgagcgaca atgtcgaggg gatcgttgaa gctgccaagg + 33181 agcagtctct cggaatcgcc gagatcaacc aggcaatcaa cgtcgttgac cgcggcaccc + 33241 agcaaaatgc ggcgatggtc gaggaatccg ctgcagccgc ccacagtctt gcggccgaag + 33301 ccgctgcact gctccggttg ctggcccagt tcaacgttgg tggcggcgcg acatctcacg + 33361 ttacgcccaa gatggcggtc gccggctaat gctcaggggc cgttgccaag catgtcgata + 33421 gaagcttgct gattatctgc actgatcggc gcaggcgtgg ttgtcagcac ggcgcccagc + 33481 cagttatttc cttcaacctt tcgatggatc tgatcgagaa gcatttggtg aaagagcttc + 33541 gcaggaattg tcagttccgt ctgcggaggg cttacaaatg cccagtcgac agagaggccc + 33601 tcaatcgggc ttgctgaaat tagcccgtgg ctgagggcgg cgtcgatcgc gcaatagggc + 33661 aggatactgt aggccgtgcc tttcgcgacg agtgctgcaa tcatactcgt cgaattgctg + 33721 tcagccaaga cgcgcatcgg cagctggttg cgcccaaaag cgtcttcgat cgccacccgg + 33781 aaattgttca ttctgttggt caggatgagc ggcttgccag cagcgctatc cagggtaagc + 33841 agctgattta agtcgtactt ctcagaggct ggcgcgacga ggaaaagggt ttccctaaac + 33901 agaaactcgg agcgcaagcc ctggttctcg tcggagacga cgattgcgca gtcgattttg + 33961 ctctgctgga taccggcaat gagatccaaa ctgatccctt cgcgaatatg aagatgaaca + 34021 tcggcatatt gccggcaata agcgtcgatc agctcgatgc tcaccatgtg ggagatagcg + 34081 gggggcaatc cgaccgtgag atggcctcgc ggttcagatt gatctgaaac gacatcacga + 34141 agcttgtcaa attcgtagac gagcactgtt gctctttcac gcagagcctt gcccgcagcg + 34201 gtcagcgcaa caccatggcc ggatcgaatg aaaagggttt cacctagatc ttcctcgaga + 34261 agtcggatct gtctcgaaag agcgggctga gcgatattga gaacttcaga ggcacgcgta + 34321 aaattgccgt gctcggcaat ctcgagaaaa tatcggattt gccgcagatt catgctcgtg + 34381 ttcccgccct gttcaatttt gcttacatca tgccgtcaag ttgaaagagg cgtcaactgt + 34441 cgcggcgggg aactcatgat gagcccgatc gagccgcggg ctggacgcgg ttggcgtcag + 34501 ccggccactt ggatcatgtc cgatcgcttt gaaagctaac gtggaccgca gagttctctg + 34561 gccatgatgc ctgcacgcct gcttgaacat ttagtatttg atgatgccgc cggcaccctt + 34621 caccgcctct ggcgtgtcga gatccagcag cgcggcctcg ccgatctcta cctgaaaaac + 34681 aggcaacgag gcattttcaa tgatctggcg tgctcctata tcgcctttaa gctgcaagat + 34741 ttgctgattg agcgaggtcg gaagaatgac cggatttccg ggccttccct gacacgccgc + 34801 gcgcacgatg caatcttgcc cggacaattg gaaggtggcg atcagctgat taaggtgcga + 34861 actgagcaag cctggcatat cggcgagcat aaccaaagcg ccatcgatct gccgatcttc + 34921 gaggtatgac acccctgcaa ccaaggagct cgccattcca gtgagataat gcggattttc + 34981 gaccatctct aagttcaggc ccgtaactgc cgtctcgatc tccgttcgcc ggtggccggt + 35041 cacaaggatg acggccgccg cattcgagcc gagtgcaacc gccgccatgc ggcgcaccag + 35101 tggaatcccg tcgaactccg caagcaattt gtgcccacca gttggaccca ttctggccgc + 35161 ctgaccggcc gcaagcaata agatggcgac gcggggaatg gcgctttggt ctgctgttcg + 35221 gcgcttgcca gccggatgat cggcccgagg ggtgttggac attccagtcg atcctcccgt + 35281 cagccgccgg tgacactcat atgccgcgcc accgccggcc gcttgtacgt gcggtcaatg + 35341 atgaaatcgt ggccttttgg cttgcgggtg atggcttcgt cgatcgccgc gtagagcagg + 35401 ccgtcgtctt cggtggcgcg cagtgcgacg cgcagatcgg cggcgtcgtt ctggccgagg + 35461 cacatgtaga gcgtgccggt gcaggtcagg cggacacggt tgcagctctc gcaaaaatta + 35521 tgggtcatgg gcgtgatgaa gccgagccgg ccgccggttt cggcgacgtc gacatagcgg + 35581 gccgggccgc cggtcttgta aaggatatcg gcgaatgtga attgttgctc gaggtcagcg + 35641 cgcaacttgg agagcggcag ataccggtcg gtgcggtctt cctcgatctc gcccatcggc + 35701 atcgtctcga tgacagtcag atccatgcct cgaccatggg cgaagcgcag gaggtcgggc + 35761 atctcggctt cgttgaagtc tttcagcgcc acggcattaa gcttgatttt caaaccggcc + 35821 ttctgcgcag cgtcgatgcc ctccgtcacc ttggcgaaat cgccccagcg ggtaatggcg + 35881 cgaaacttgt cggggtcgag cgtgtcgagc gagacattga tgcggcgcac gccgcaatcg + 35941 taaagttcgt cggcatggcg ggcgagctgc gacccgttag tggtcagcgt caattcgtcg + 36001 agaccggagc cgatcttttc gccgagttcg cgcaccagaa acatgatgtt cttgcgcacc + 36061 aggggctcgc cgcccgtcag cctgatcttg cgcgcgccct tggcgatgaa ggcggaacag + 36121 agccggttca gctcttccag tgtcagcaga tccttcttcg gcaggaaggt catgttttcc + 36181 gccatgcaat aggtgcagcg gaagtcgcag cggtcggtca cggaaacgcg cagataggtg + 36241 accgcccggc cgaaggggtc gatcataggt ggcgtcctca acaccgaccg gctgccgcta + 36301 aatgcgagca tatggcgctc tccgaaagtc acgcaggaac tctgctggta tggaataaca + 36361 tcgagccgat aaacctaacg tccttgatgg tagcggcagg ctgtcatctc ggcaattgct + 36421 tgaagaggca gttcgacttt cgagccaaat tgcctcaact catcgcggta agccttcgcg + 36481 tcgatagctt ctgtggtcgc ccgcgtgttt ggctcgtcta tctcttgtgg ggctttgccg + 36541 gcgcccaaaa tggtcagcca tgcgaccaaa ccaagcagcg ccgatcaccg ctccgacccc + 36601 agaggacagc aatgaccggg gattgaccga agttcagtcc ccagggtttc gagacgaagg + 36661 tcagaaaccg atcaagaaat agtcaagaaa ttccacaacg ctcgaaggcc ccggccgcga + 36721 tcatgctctt ttgattaccg gtttgattac cggtcagcgg ttgctgacat ccagcattgt + 36781 gaacatcttt ctccctccgt cggctctcgc cagcgattgc cccaagtcaa ctccccgggg + 36841 cgtcggcttg ctcctcccga ccgacgttgt gctccgccga actttctgtc cttttccaaa + 36901 aaaatgcagg cggatgcttc aatttgtcgg agctcatatt ccacgctcct gccacctccc + 36961 gggctccggc tggatggact ctttgcccgg gaggcggcac gggggaggat ggcggtaccg + 37021 acgcaataat tatcggacga taatttattt tgtcaacagc tgcttattgt tttcggcatc + 37081 agccgaattt tcgaagccag gaaggctagc ctgacagcgg aacgccgcca tgggagccct + 37141 cttaatgatc gatcgccgag gtttacgttt gctaggattg cgccgcaagc tttgccggct + 37201 tcatcgccga aagtttcgaa aaatacgcct ggattccgtg atggaagaca tccacgacgt + 37261 ttccgatgat cttctggcgg tctttcggcg gcggcaggct gaagatgtgc tcgcgaacgc + 37321 cgagatagaa gatgccgctg tgcaatgccc aggccagttc gagttcctcg gcctccggct + 37381 gtctttcgcc gtcaagaccc gccaaatgcc ggtattcctt gatgatacgt tggaggatcc + 37441 gctcctcaag gaggccaata taccagcgat tgatgtcagc gccttttaag cctgaaaaca + 37501 gatagatgcg catccaatcg cgagtgaaaa tcgcgtgcgt gtaactgtcg tagaaggcct + 37561 tcagccgcca atccaagggc ttggtccggt cgcacagcag ccggtcccac tccacgtccc + 37621 agcgctccag gtaaaccgat cggtaggctt cgcggatgag gtcttccttg cttgggaagt + 37681 aacgatagag caggggctgc gtaatgccca gctgccgggc aatctcgcgg gtactactct + 37741 cgaaaccatg ttccgaaaaa aggttgatgg ccttctggag gatctcttcg cgcctttcgg + 37801 tcggcggtag gcggcgtctc ttggtggggt gctttggaac cattgtcctt cggaagtcag + 37861 ggtgtggctg atgcgaccct tctatcacgg aaaaagacct cttgactacc acgccgggca + 37921 gggtttattt atctatcgat aacttaggag gagatttcat catgacgttc acccgtcagg + 37981 cgatttatga cgccgccaag aaggtttcca actgggggcg ctggggcgac gacgaccaga + 38041 tcggcaccct caacaatatc gagcctaccg acatcgtggc ggctgcttca ctggtcagga + 38101 aaggcaagac cttttccctg gggctgtcgt tgaaggagcc gatccagtca ggcctgtttg + 38161 gaggtcgctg gaacccgatc cacacgatgc tagcaacggg cacggatgcg gccgcgggca + 38221 atcaggacga accggcaccc tatctgcgct acgccgatga cgcgatcaac atgccttgcc + 38281 aggcttcgac gcaatgggat gcgctctgcc acatcttcct cgatgacaag atgtataacg + 38341 gctacgacgc aaggctggtc gacgtcaaag gcgccaagaa gctcggcatc gagcattatc + 38401 gcgacaagat ggtcgggcgc ggcgtgcttc tcgacatcgc tcgctggaaa agcgtcgcat + 38461 ctctggacga cggttacgcg atcacgcctg ccgaccttga tggctgcgcg gcctcgcagg + 38521 gcgtggagat ccgcaagggc gatttcgtga tcgttcgcac cggccatcag gagcgttgcc + 38581 tggcgaaggg aagttgggaa ggctatgccg gaggcgatgc gcctggcatg ggttttgata + 38641 cctgcttctg gcttcgcgac aaggacgttg ccggcatctg caccgacacg tggggatgcg + 38701 aagtgcgtcc aaatcaaacc aaggaagcca accagccctg gcactgggtc gtcattccgg + 38761 cgatggggat cgcgatggga gaaatcttct acctcaagga gttggccgag gattgcgccg + 38821 gggataaggt ctatgagttc ttcttcctgg ccccgccctt gcatctgcca ggcggcgctg + 38881 gctctccgat caatccgcaa gcgatcaagt gaacgagaca tgtccggttt caacatcctt + 38941 gagcggtttt cgctctccgg tcgccgcgca ctcgtcaccg gcgctggccg ggggcttggt + 39001 cgctcgatcg ccgaaggact ggcaagcgcc ggcgcagagg tcaccctctg cgcccggacc + 39061 gaaagcgagg tcgaagaggg cgcccggtgc atccgcgacc atggtttcaa ggctgaggcg + 39121 cttgttgcag atgtcagcga tatcgccggc tttcgcgcaa ccgtcgacgc gatgcacgcc + 39181 cacgacattt tcgtcaacaa tgccggcacg aacaggccga agccactctc agacgtcacc + 39241 attgaggact tcgatgcggt gatcggattg aacctgcggg ctgctgtctt cgcggctcag + 39301 gccgtcacgg cccgtatggc aaatctcggc atacaaggtt cagtcatcaa catgtcgtcg + 39361 caaatggggc atgtcggcgc tgccaatcgg acgatctact gtgcctccaa gtgggctctg + 39421 gagggattta caaaagcgct tgccgttgag cttggccccg tcgggatccg cgttaatacg + 39481 gttgcgccca cattcatcga aacgccaatg acgacaccgt ttctggagga tcccgcggcc + 39541 cgtaacgcca tcgtctctaa aatcaagctt ggccgcctcg gtacgcccga ggatgtggtc + 39601 ggagcggtgc tgttcctggc gtcggatgct tcggcgcttg tgacgggaag cgctctcctg + 39661 gttgatggcg ggtggacggc ggattgatcg tgacgggcga ctgaccgcac gcgatagcgc + 39721 cgcctggaaa tcacgccgga ttatccaagc ataaccacct gtttcgccac gcgtactttg + 39781 attgttgcgg atgttgatac ttgacataga aggtgtttaa tgttcgactg ataaataagg + 39841 ttctttggga tgagggccgc ggagccgaaa gagtttcggg ccagcacccg cgccatggcc + 39901 gcctacgtaa gtcgcctaga gagtgcttcg cgacgagggg attggggagg attgtcgatg + 39961 caatgtgagt tgcaagccgc gcccttgctg tcagcagcgc atccggcgct ggccgatcca + 40021 tgggaacttg ccatcaatgc cagcggggat gtggtgatgg ccgtcctgac tgagacccgc + 40081 ggccccgctt atcgcttgcc gggggcagcc atggcaatcc ttccagacgg aagtttttcc + 40141 ggggcgatca catccggctg cgtcgaagca gacctcatct tgaatgcaag tgatgttcgc + 40201 aacaccggtg acgtgcgggt gcttcgctat ggcgagggat cgactttcat cgacatccgt + 40261 ctaccctgcg gcggtggcat cgaagtcatg ttgtttccgc tgctcgatgt cgaggtgctg + 40321 gggaaacttg caaaggccag gaagctccga cggccggtca gcctgcagat ttcgaaatcc + 40381 ggccggctca ccctgggccc aatcactgaa acaaagaacg atgcgcatgg tttcgcactt + 40441 gggttcgagc ctccgctgca gttcttgacg ttcggagctg gaccggaggc gtcggtcttt + 40501 gccgccttag tagaggggct cggttacgaa cagcagctgg tctctcatga cgccatgaca + 40561 cttgcatcga cgcgcgcgtc cggtaacaag tgccgcgagc tgacgaacct ttcagagctt + 40621 ttcgcattgc agatcgatgc gcgcacagca gccgtgctct tttatcacga tcacgattac + 40681 gagccggaaa tcatcaagca tctgctgtcg accccggcct tctatattgg cgcccagggc + 40741 agtcgcgcca cccaacgctc aagattgcag cggctggaag agattggagt ctcccttaac + 40801 cggtcgagcc gagttcgcgg gccgatcggt gtgatcccgt catcgcgtga tccgaagtcg + 40861 ctggcggtgt cggtgctggc cgaaatcatg gcagccggtt cgcaagtctc caaagcctcc + 40921 gagcagccta acctgaagaa ggagtgctgc ctgtgacgga ctgcattgtt tccagatcag + 40981 cgcgaggtga gccatgacgg cagcctggtg gaccgcccat cttcataagt ggttgtcatg + 41041 gttgcgccac gaggcggcgc ggcttggcgc ggcatattgt gtttatgatt gcgacgagct + 41101 ctgggctggt atggaccggg gttcgaccgg acatcgaaag cacacgggac gagccgacaa + 41161 agaataaccc cggaaggcat gcaagacgtt agcgacaact ttccacggta agatggaact + 41221 gatcgcgata ttgagctctg cgtccaggcg atcgctgaac gacttaaggg ccgcggcaag + 41281 cgagcgccga tcatgccgcg tcgatggtgg acgatcgcga gccacgcatg gcaaacgcag + 41341 acatcaccat tgtccagagg ctacgggagt gcctgatgtg aactggaaca gcgtcgacca + 41401 cctgtccttc gccggaacgg atctgcagga gctctcggcg tcgctttcct acctgcacaa + 41461 tggagaagtg agaagggaaa gcagggggac tacgaaatac cggatcgact tggtcggcac + 41521 gcaggggatc tcgatcatag cgagcagcta tgacggagct ttcacaattc atttccccgc + 41581 gtcgatcgac acggccacga ttctgatccc gcttacaggc aacgccatag tcgcggtcac + 41641 cgaccgcaaa ataccgtcca tccaaaacct tggcgttttt gtcgatcggc tgcagaatta + 41701 tgagcttcac gttgccggcc ctcgaaggca tctttgcttc agggtgcctc acgacgagct + 41761 ggtgcggcgg atcgaagttc ggttgaatat ttcagtgcgg ggccgcctgc aattcgccaa + 41821 tgagatcgat ctttctcagg gaccgggttt ggatttggcg cgactcgcaa tgaccatgca + 41881 tgatggcctt acgggcgaga ccctgcataa aacaccggtg gcgctcggac atctcctgga + 41941 ggcttgccta gagtttctca tcgatgcggt tgggcacact acgcccgcga attttctcgc + 42001 gggtctccca ttccgcggca cgtcaagcgt gcgatccact atatgcatac caacgtctcc + 42061 cggggggtga cgctgaacga gatcgccgag gcatgcggtg tcagtatccg taccctgcaa + 42121 agtggtttcc gaaaatttcg tatgacgacg cccattgcgt acctagagca tcttcgtctc + 42181 gaatgctgtt cgccgggagc tttcatctgg cgaatctgat caatcggtaa gaggcgttgc + 42241 gcagaaatgg ggctttacgc atgtcggccg gttctccggc caatatcgta ggcgcttcgg + 42301 cgaactgcca tcgcaaaccc ttcgtaggac ggaggacaat ctggcatagc caaaacaggc + 42361 cgcatgcggc caatggcgag cgcgctagct gatgagaacc gacgttccgg ccgccttgtt + 42421 cctctaacgc aagcggtctt ccactcccac gttgtcctgg ccgcccgcct tttgattgga + 42481 acgggagttt ctcaatgtag acggatgttc tcgcgaccag gaaaatctgg tcgtgagctt + 42541 gaagcgtcga agacgacgag ctgggtttcg tcctgcgcgt cggcaaaggg caacggccgc + 42601 cgtaacatcg caataggatg atccgggatt gagacaggga ccattcccga ctgaccgatc + 42661 agctcgagaa cgacccgata ggtattagct ttgatggtgc cgctaaatcc gacagcagct + 42721 acagctcttc cgttgatcgt tcaccagatc aatgaaaccc tcacgcgccc aaccggtcca + 42781 ttgcatcggc atcgccttaa gcgaaagaca ggttgtataa ttaccctcga ccgctctgca + 42841 cccgtgttgt tgctggcatc gatcgcgtcg atgcctggat gcgtacacgt gcgatgccgt + 42901 cgcgagaacg attgaccaat taagcgcttg aacgtaggac cagcattccc gacagtatgc + 42961 gcgtgaatcc attatcggac gccttgatgt tgggatcgtc gatcagactt actgcctcct + 43021 gcaccatccg ctcgaccggt tggcggatcg tcgtgagttg gtaggaactc cagctcgcca + 43081 tcgggatatc gtcgaaaccg gcgacggcga gatcctgcgg gatgcgcaat ccccgctctt + 43141 ttgccgcatc gatgaccgcc aaagccatga tgtcgttgca gcagaagatc gcgtccggcg + 43201 gctggacggc cgaaagaagg tggacagctg cctgctttcc gatctcatac tggtagtttc + 43261 cgcttgcttg cgcgtgaatg gttcgccccg cctcggcaag cacatcacga aaccctcgaa + 43321 cgcgctcggc atgggtcgat gtattggcga cgccaccgat caagctgatc ctctggcaac + 43381 cacggtcgag caacagccga ccgatatcgc ggccaccctg gtaattgtcg caggagaccg + 43441 cgttgatgcg catgtcgcgc tgtatccggt tgagcagaac ggccttgatc tggttttgcc + 43501 ggcacgcacg cgccatgttc gatgtcagag atgcggaagt gacgatgatg cccgacggac + 43561 gtgccgccag tacctggtcc gtgaccgcgt cgatgttgca gtcactcggc agggcgtaaa + 43621 ccagcaactg gcgcccgctc tcttgcaggt gtcttgcgag cgtatcgagc acgagcgggt + 43681 aaaaggggtt gtgaaggtct cccgcaacca gcgttatcgt gcttgccgcc acggctttgg + 43741 cggtccgtcg ggccgggctg atgtaaccga tctcctccgc gaccgcgaag atcttcttct + 43801 tcttttcggg gttgatcggt gcccctggtg tgaaggctcg ggaaacagcc gcgatcgaga + 43861 cgccggctac atgagcaact tgttttgccg tgggcttcat cgatggcgtc ttgaaaattt + 43921 ttcactacat gcatgaaatc ttgcaacgca gatgtagcag cttatgctgt cttcagaaag + 43981 ggagaagacg atggctacta cctatatcaa acgcggaaag ccggaaaacg agcgctccga + 44041 agacgatcag aaggtccgtt ttaccgttga aaatatcctg aaggatatcg aggctcgagg + 44101 cgacgccgcg gtccgcgaac tttcagaaaa attcgacaaa tactcgccgg cgtccttcaa + 44161 gctctctgcg agcgagatcg aggcgctgat gaaccgggtt tcggctcgtg acatggaaga + 44221 catcaaattc gcccaggcgc aggttcgcaa tttcgcacag gctcagcgcg actccatgct + 44281 cgatatcgag gtcgagacat tgccgggcgt tattcttggg cacaagaaca tccccgttca + 44341 gtcggtcggc tgctacattc ccggagggaa gtttccgatg gtggcttcgg ctcacatgtc + 44401 ggttgcgacc gcctctgtgg ccggcgtgcc gcggattgct gccgcaacac cggcattcaa + 44461 gggtgaaccg aacccggccg tcatcgcggc gatgtatctt ggcggcgccc acgaaatcta + 44521 cgtgctgggc ggcatccagg ccatcggagc acttgcaatc ggcaccgaga cgattgagcc + 44581 ggtgcatatg ctcgtcggcc ccggaaatgc cttcgttgcc gaggcgaagc gtcagttgta + 44641 cggccgtgtg gggatcgacc tttttgccgg cccgacagaa acgatggtca ttgccgatga + 44701 aaccgtcgat gcggagatct gcgcgacaga cctgcttgga caggctgaac atggctataa + 44761 ctcgccggca gtcctggtca ccaattcgca caagcttgcg caggcaacgc tgacggaaat + 44821 cgaccgcatc ctgaaaatct tgcccactgc cgacaccgcg tcaaagagct gggcggacta + 44881 tggtgagatt atcgtttgtg acacctatga ggagatgctc gacgttgcca atgacatcgc + 44941 ctcggagcat gtccaggtca tgaccgaccg ggacgactgg ttccttgcca acatgcattc + 45001 gtatggcgcc ctgttccttg gcccccgcac caatgtcgcc aatggcgaca aggtcattgg + 45061 caccaatcac acgctgccga cgaagaaggc cggccgctac accggcggtc tttgggtggg + 45121 gaaattcatc aagacacatt cctaccagaa ggttctgacg gatgaggcgg ccgcaatgat + 45181 cggcgaatat tgttcgcggc tttgcatgct cgaaggtttc atcggtcacg cggaacaagc + 45241 caatgtgcgc gtgaggcgtt atggcgggcg caacgtcggc tacggaacgg cggccgaatg + 45301 attgcctgaa ttaggctagc gttcagacgc ttttgccttg gtccctcgga ccgcaaaggc + 45361 atctcatgac atcgccaaca ataagttttg gccggcggct gtcgccaccg tccatgactg + 45421 cgcgctcacg agtctgtgag aacgcgagga ggagaggagg ggcaatgagg atcggaactc + 45481 cacgtgaaat atgtgccgat gaagcgcgcg ttgctatgac gcccgacagc gcggcccatt + 45541 tgcagaagct cggccacacc tgcttgattg agtcaggcgc cggtcttctg gcgggattta + 45601 ctgacgaggc ctaccgcgat gcgggtgtcg agattgtcga aagcgcgacg gagctcttcc + 45661 ggtcagcgga cgttatcgca aaggtgcgcc cgccggaact catcgaaatc gacagtatcg + 45721 acgctggaaa gacgatgatc tcgtttttct atccagccca gaatgacgtg ctgcttgcgc + 45781 gggcgatcga caggggcgtc aacgtcatag cgatggatat ggtgccgcgc atctcgcgcg + 45841 ctcagaagat ggatgctttg tcctcaatgg ccaacatcgc cggctaccgc gccgtgatcg + 45901 aggcgggcag caatttcggc cgcttcttca ccggccaggt gacggctgcc ggcaaggtgc + 45961 cgccggccaa ggtgctggtc atcggggctg gcgtcgccgg cctggccgcc atcggcacgg + 46021 cgacatcctt gggcgccatc acctatgcct tcgacgtgcg cccagaggtt gccgaacaga + 46081 tcgaaagcat gggtgccgag ttcgtcttcc tcgactttgg cgatcagcaa caggatggag + 46141 cggcgtccgg gggatatgcg acgccgtcgt cgccagagtt tcgtcagaag cagctcgata + 46201 cgttccgctc gctcacaccc gaaatcgaca tcgtcattac caccgccctg atctccggtc + 46261 gcgatgcgcc aaagctttgg ctggccgata tggtggcgat gatgaagcct ggttccgtgg + 46321 tcgttgacct tgccgccgaa cggggcggaa actgcgagct cacggtcgaa ggcacacgga + 46381 ttgtatcggg caatggtgtc atagtcattg gctatacgga ttttccgagc cgcatggcga + 46441 cacaatcttc caccctttac gccaccaaca tccgccacat gctggcggat ctgacccctg + 46501 caagggacgg aattctcgtc cacaacatgg aagacgacgt catacgcgga gcaaccgtcg + 46561 cctttgaaag cgccgtgaca ttcccgccgc caccgccgaa ggtgcaggca attgcggtgc + 46621 agaaggtcaa gcagaagcca aaggaaccga gccgcgaaga gcggcgccag cgggaagctg + 46681 cggcctttcg agcacagacc cgcagccaag tcgccttgct cgccttcgca accttactct + 46741 tgctggttgc cggcgcttac gccccggcaa gcttcatgaa ccatctgatc gtcttcgcgc + 46801 tgtcctgctt tatcgggttc caggtgatct ggaacgtctc gcacgctctt cacactccgc + 46861 tcatggctgt caccaacgcc gtctccggca ttgtgatcct cggcgcgctg ctgcagatcg + 46921 gctcggcatc gatcccggta acgatccttg cgtcgatcgc cgtcctcata tcgacgatca + 46981 acatcgtcgg cggcttcttc gtcacgcggc gcatgctggc aatgttccag aagtcctgaa + 47041 aaactggagg aggaatcgtt ttgacaatcg gtatcatttc agcagccaat atcgcggcgg + 47101 caattctgtt catcctctcc cttggcgggt tgtcgggtca ggagagcgcc aagcgcgcgg + 47161 tctggtatgg tatggcgggc atgggccttg cccttgtcgc cgccatcttc ggcgcggctg + 47221 gcggcagctg gctaaatctt atcctcatga tcggtggcgg cgccctcatc ggttatgctc + 47281 tcgcacgccg cgtccagatg acggagatgc cgcaacttgt ggcggcgttg cactcatttg + 47341 tcgggctggc agcggtcttc atcagcttta acacccatct ggaggcggcg cgcgttgccg + 47401 cgctcgacga gacgagccgc actggtcttg ctggttatcc cgcgatcctg gcgcagaagg + 47461 acgctgtcga gcttctgatc atgaaggctg aaatcttcat tggcgtcttc atcggagctg + 47521 tgaccttcac gggctcggtc atcgcattcg gcaagcttgc cggcaaggtg gatggcaagg + 47581 cgaggaagct cgcgggcggg cacgggctga acgctgctgc agcacttttg tccgtcgcac + 47641 tcctcatcat ctattatcaa agcggcggtc tcttgcccct tgctgtcatg acggcgctcg + 47701 ctctgttcat tggcttccac ctgatcatgg gaatcggcgg cgccgacatg ccggtcgttg + 47761 tttcgatgct caacagctat tccggatggg ccgcggccgc gatcggcttc acgctcggca + 47821 atgatctcct gatcgtcacc ggcgcgctcg tcggctcctc aggagcgatc ctctcctata + 47881 tcatgtgcaa ggcgatgaac cgctcgttca tttcggtgat cctgggcggg tttggcacca + 47941 aggccggacc gctcctcgag atcactggtg aacaggtcgc catcgattcc gccggtgttg + 48001 ccgcggcgct gaacgacgcc cagagcatca ttatcgtgcc gggctatgga atggctgtgg + 48061 cccaggcgca acaagccgtg tcggacctga cccgcagact gcgcgccacc ggcaagacgg + 48121 tccggttcgc catccatcct gtcgccggac gcctgccagg ccatatgaac gtcctgctcg + 48181 ctgaagccaa ggtgccttat gacatcgtgc tcgagatgga tgagatcaac gaagacttcg + 48241 cctcgaccga cgtcgttatc gtcattggtt cgaatgacat cgtcaatcca gccgcgcagg + 48301 aagacgcgaa ctctcccata gcagggatgc cggtgctcga agtctggaag gccaagcagg + 48361 tcttcgtctc caagcgaggg caggggacag gctattccgg catcgaaaac ccgctgttct + 48421 tcagggagaa caccagaatg ttctatggcg atgcccgcag aagcctcgaa gaactgatgc + 48481 cgaaggtcgt ttcagtctga tcaggcggcc gggtgatttg cttcccacgc cggccgctcg + 48541 cctccacgta ccgaccgggg cgtgggtccg gatttagcgg caatgaccgg tcttcgttga + 48601 aagtcaataa ccgacctgac cgggcgtgtt tctcaatcaa aatggtgccg aatgtctgac + 48661 atcgtccgtt atccaccagc gaaatcaacc gttgcgcgcg ccctgatcga taaggagaag + 48721 tacctcaaat ggtacgagga aagcgtcgag aacccggaca aattctgggg caagcacggt + 48781 aggcggatcg actggttcaa gccctatacc aaggttaaga acacttcctt taccggcaag + 48841 gtttcgatca agtggttcga ggacggtcag acaaacgtct cctacaattg catcgaccgt + 48901 catctgaaaa cgaatggcga ccaggtggcg atcatctggg agggcgacaa cccctatatc + 48961 gacaagaagg tcacctataa cgaactctac gagcatgtct gccggatggc gaacgtgttg + 49021 aagaagcacg gcgtcaagaa gggtgatcgc gtcaccatct acatgccgat gatcccggaa + 49081 gccgcctatg cgatgctcgc ctgcgcccgc atcggcgcgg tgcattcggt cgtcttcggc + 49141 ggtttctcgc ccgaggcgct ggcggggcgc attgtcgact gcgaatccac cttcgtcatc + 49201 acctgcgacg aaggcgtgcg cggcggcaag ccggtgccgc tgaaggacaa taccgatacg + 49261 gcgatcgata tcgccgctcg acagcatgtc agggtcagca aggtgctggt cgttcgccgc + 49321 accggcggca agaccggctg ggcgccgggc cgcgatctct ggcatcacca ggaaatcgcc + 49381 acggtgaagc cggaatgccc gccggtgaag atgaaggcgg aagacccgct cttcattctt + 49441 tatacgtcag gctcgaccgg caaaccgaag ggcgtgctgc acacgacggg cggttatctc + 49501 gtctatgcgg cgatgacgca tgaatatgtc ttcgactatc atgacggcga cgtctattgg + 49561 tgcaccgccg atgtcggctg ggtcaccggc cactcctata tcgtctacgg gccgctcgcc + 49621 aactgcgcaa cgacgctgat gttcgagggt gttccgaatt tccccgatca gggccgcttc + 49681 tgggaagtca tcgacaaaca taaggtcaac atcttctata cggcgccaac ggcgatccgc + 49741 tcgctgatgg gcgccggtga cgacttcgtg acgcgctcgt cggtgcgcct gctcggcacg + 49801 gtgggcgagc cgatcaatcc cgaggcctgg gaatggtatt acaatgtcgt cggcgacaag + 49861 cgctgccctg tcatcgatac gtggtggcag acggaaaccg gcggccatat gatcacgccg + 49921 ctgcccggcg ccatcgatct gaaacccggt tctgcgacag tgccgttctt tggcatcaag + 49981 ccgcagctgg tcgacaatga aggcaaggtg ctggaagggg ccgccgacgg caatctctgc + 50041 atcaccgaca gctggccggg ccagatgcgc acggtctatg gcgatcacga ccgtttcatc + 50101 cagacgtatt tctccaccta caagggcaag tatttcaccg gcgacggctg ccggcgcgat + 50161 gcggacggct attactggat caccggccgt gtcgacgacg tgctcaacgt ctccggccac + 50221 cggctcggca ccgcggaggt ggaatccgca ctcgtctcgc acaatctggt ctcggaagcc + 50281 gcggttgtcg gttatccgca tccgatcaag ggccagggca tctattgcta tgtgacgctg + 50341 atggccggtc acgaaggaac ggacacgctc cgccaggaac tggtgaaaca tgtgcgtgcc + 50401 gaaatcggcc cgatcgccgc acccgacaag atccagtttg cgcccggcct gccgaagacc + 50461 cgttccggca agatcatgcg ccggatcctg cgcaagatcg ccgaagacga tttcggcgcg + 50521 cttggcgata cctcgacgct cgccgatccg gccgtcgtcg acgatctgat cgccaaccgg + 50581 cagaacaggt gaccgtgacg actgcaggcg ttgccgagtg ctgctaatcc tgcatatctg + 50641 agcgaagtcc gcgcccgctt acgggcgagg tgacgtaccc cgttacccga ggggattgtg + 50701 gcggaagcgg ttgtagcctt gccgtattgg cgatctggcc aataccctca atcgaaacca + 50761 cgaccttgtc gccatccttc aagaaccggg gtggctcaaa gccgatcccg acccgacacc + 50821 ggcaggtgtg ccagtggcga tgatgtctcc gggcaacagc gtgacgctgc gggagatgac + 50881 ctcgatcagg gtggggatgt cgaagatgag atcctttgtc gaagttttct ggcgctcttc + 50941 gccgttgacc gtacaggtga tttccacgtt cgctatatca agttcgtctt tcgtcacgat + 51001 ccaggggccg atcggtccga acccgtcgat tcctttgccg aggaaccact gcttgtgctt + 51061 cttctgcagg tcgcgtgctg taacgtcgtt gaacacggta tatccgaaga cgtgatccaa + 51121 cgcgcgctcg gccgtgatga agcgaccagc cttaccgatc acgacagcca actccgcttc + 51181 gtaatcgacg gcctcgtcaa ggcccggcca gagaggaata tctccgaagg gtcgcgcaag + 51241 ggaggaagaa ggttttgtga atatgatggg atgttccggg atggcttcgg ccgcggtcgc + 51301 tccggcgtcg aacccgcttt tcgtgaactc atgggcgtga gcgtgatagt tcttgccaac + 51361 gcagaacaca ttatgcggcg gattggtgat tggcggcagg agatcgtcga tcgcaatggc + 51421 cgtgctcccc tccaacggcg ccgatcgttt ctgccggcct atctcagcga tcagactgac + 51481 catgtcttca tgggtggacg gcgccaggtc cggaaaggct tccgctgccg gcgtgtagcg + 51541 ttcgccggaa ggatctatga cgaccacttt gcgctgaccg tcgacaatta tcgttgcgag + 51601 tttcatctta tgaaccaatc taaacaaatg tttcacgcta aaaagcggtg tcttgagcgt + 51661 gttacccgga gagcatcgtc agcgccccgc acgccgggtc cggtagtggc caagaaagat + 51721 gtccacggcc tcatcgatca cagcctgacg ttcagattgg ctgaacttct catcgcggcc + 51781 cagcacacgt atccaaagca gcggctcatc gatcagtccg aggaactgac gggcggcacg + 51841 ctcgccgtcc ttgatcgtta ggagcccccg ttcggcttgc atttcgatat aggcccgaac + 51901 agcgcccatc gctggcgtct tgccatgctt gaaaaagctc tccgtcaagt caggaaaacg + 51961 ggtgccctcg gcaatgatca tgcgcagaaa ggcaatcgct agcggcggtt cccaaaatgc + 52021 cgccacggct gctgcaatcc gacgcaagcc gacgtttgga tctgcctgcg tctcggcttc + 52081 ggcgacgata tccaagacgg ggaaggcgct ccagacgcgc tcgacgacgg cgcgaaacag + 52141 ttcttcctta ccgtcgggaa actggttgta gagagtgcgc cgggctgcac cagattcctg + 52201 cgccaccagg tccatgctgg tgccctcgta accttgccgc aagaaaacac gggttgcagc + 52261 cgccagaatc gcatctcgct tggagagagc ggccgtgggt gtcgccacag cctcgaagat + 52321 cgaaaccacg cccttgggag acaaggcggc gggcggacgg gacggcattt cagttttcct + 52381 cttgactgaa ttgcactggt gagtgcagaa ttacactata tagtgcacca tgattgggcg + 52441 caacccccac ggcccaggga caacggatct tccatggcag gcgagagcac tcctcactca + 52501 cagttgacgc gcggcacgct gccaatgctg gcggccgcct gtgggatcac cgtcggcaat + 52561 gtttacctat gccaaccgct actcgaccag atggcggtga gcctgagggt gcccgagcaa + 52621 acagccggct tggtcgccgt cggcgcccag gtcggctatg cgctcggcat cctgttcgtg + 52681 ctgcctcttg ccgacgtgat tgcatcgcgt cgattggtgc gcacgctgct ggtgttgact + 52741 tcgctctttc tgttggccgc agcgttctcg tcaaggactt cgctgctggc cgcggccagc + 52801 gtcgcgctaa cggcatcgac cgttgtcccg caaatcttga tcccgatcgt ctcgggcatg + 52861 acggcgcctg agcatcgcgg acggaccatc ggtgcgctgc aaaccggact gatccttggc + 52921 atcctgttgt cgcgcacagc atcgggatcc ctcgcacagg tgaccggtac ctggcgttct + 52981 ccctacttgc tggctgccgt cttgaccggc ctgttggtgc taatcgtgcc gcgcctgatc + 53041 cctgagcgcg aaaccaagcc ccggcatacc ggctatctca gcctgctgcg ttcgctaccg + 53101 ccactgctgc agcaccgccc gctacgcctg tccatgacgc tgggcttcct ggtcttcggt + 53161 gccttctcgg ctctatgggc gacgctggcc ttctacctcg ccggcccaga ctttgggttc + 53221 gggccagcaa cggcaggtct gttcgggctc tacggcgcgc caggcgcgat cctggcgccg + 53281 atggccggtc gactgtccga tcgcgttggt tcgtccaaga tcaacctggt gtcgctggct + 53341 gccagcggca tagccttagc actggcaggc tggctcggcg gcggctcgct gctcatcttg + 53401 gtggtggccg tgaacctgct cgacttcggc ctgcaaagcg gtcagatcgc gaaccagacg + 53461 cgtattttgg ggcttggcga tgacatccgc gctcgtctca atacgcttta catggccgcc + 53521 accttcggcg gcggagcggc cggttctttt gcggggatgc tggcatggag cttcggcggc + 53581 tggaccagcg cctgcggcct atcgctagcg cttatagctg cggcggctag cacgctcgtc + 53641 ctgaactgga agaacgaata ccgatcctct cacatgaaag gagaatgaaa tgagcgatct + 53701 taccggacga gtagcacttg ttaccggcgc ctcgcgcggt atcggccgtg acatagccta + 53761 cgctctgtcc tctgccggcg ccagcgtcgc ggtgggctac cacagcgatc gaacgggggc + 53821 agaggcagtc gccgagacga tccgccagga aggcgggagg gcggtcgcgg tcggcggcga + 53881 tgtgtccgac ccgcagatcg cagtggatct cgtccgcgag acagaggcgc aattggggcc + 53941 gttgggcatt gtcgtgaaca atgccggtat caatccatca cgcccactcg accagatcac + 54001 ggcggctgac tgggacgaga ccattcgagt caatctcact tccgccttcc atgtcactca + 54061 ggccgctgta cctggcctgc gggaacgcaa gtggggcagg atcatcacca tatcgtcggt + 54121 ggccgcacag ctcggcggcg tcatcggtcc gcattatgca gccagtaagg ctggtttgat + 54181 cgggcttgcc cattactacg cggctgcgct ggccaaggag ggcatcacct ccaatgccat + 54241 tgcccccgcg ctgatcgaga cggaaatgct caagagcaat tcggcgatcc agcctacgct + 54301 catcccggta gggcgattcg gccagaccca cgaggtgtcg tcagtggtcg tgctgctggc + 54361 cgggaatggc tatatcaccg gtcagacgat cagcgttaac ggtggctggt atatgagctg + 54421 agtagaggat aaagcgatgt taacacgcat catccccgcc acgaaggaag ccctgcccgt + 54481 gatagggctc ggcacctatc ggggcttcga cgttacgctc aatgcgccgg gcgaagaacg + 54541 gctctccaat gtgctcgaca cgctttttgc ggcgggcgga acgctccttg acagctcgcc + 54601 tatgtatggg cgtgccgaag aagtcgtcgg cgcgctgctc actcgtcagc cccgcgctga + 54661 ctcgcctttt cttgcgacca aggtttggac atcagggcga gaggctggcg tccggcagat + 54721 cgagcagtcg ttccgtctgc ttcgcagtga cgttatcgac ctcatacagg ttcacaatct + 54781 tcaggactgg caaacgcatc tgcagaccct gcgcggtctc aaagaagccg gacgtatccg + 54841 ctatattggc ataacccatt acactcgctc gggttatgcc gaggtggagc gggttctgaa + 54901 cacaacgccg gtcgacttcc ttcagatcaa ctattcagtg gaagaacgcg aagcggagaa + 54961 gagactactc ccactggctg aggacaaggg cgttgcggtc ttgtgtaacc ggccctttgg + 55021 aggcggtgat ctgttgcgcc gccttaaggc gaagccgctg cctgactggg cagaggaggt + 55081 aggtgcgact tcgtgggcgc agcttgcgct caaattcgtg cttggccacc gagcaatcac + 55141 ctgcgcgatt ccgggaaccg ggaatccagc ttccatgatc gataacacaa aggcagcgag + 55201 cgggtctgtg ctgaccccga aacagcgtgc tgaactgatc gagaccgtct aatgacctcg + 55261 gctaaacatc tgacacgagc gattgggggc ggcgccgata agcgcggcgc cgctcctgct + 55321 ccaatctgag gcacctgcgg tccacgtcca aaacgcaatc aagtccggcc tcagcgccct + 55381 gagacagagc ttgttcagtg agaagagacg atttatcggc tcgttgttgc tcactcgcag + 55441 gggagcgacg atgaggcgga tatccgcgcg tcgaaatctg cgacccggtt atcaaaaacc + 55501 tccgcgcgct ctgtaaaagc accattaggc aggtgttaga tccatatcgt gctgggtgac + 55561 ttgcgcgtga cgacagcgtc tttcgttccg gcaactgagg gatcagcgag agcctttatt + 55621 tggctggcct aagaattcca gcgcccaact tacggcgaaa ccaacaattc agcgatcagc + 55681 gacgacgtga tcgccgcgcc aagcgagcct cgaggtatca cgtggtggtc gacctccagc + 55741 tggactctgc tgcggccagc acctcctgcc gcccgacgtc aagcaagcgg cgggccatcg + 55801 tggttccaag cgtcggacgt ggatccagta agttgcctta ccctcgacaa tcgctgcact + 55861 ggagcacacg ggcctagcga ctggcctcgt gttgcactct caacatgtgg gagatgaaaa + 55921 gctgacatca ctcccgctgg cagactcgac gagaacttcc agattgagga ccccgcccct + 55981 atcaggtgcc acctgcaata gcttcgtggc gcgaatgata tcgatcttgg catgagctgt + 56041 catttcctgc ggttggataa gatcctaggt tggtggccat cggcgaccag cgcgtgcaac + 56101 ctggcttcga tgctacccgt tcgccggtaa tcaaaagcga acgcgatccc tcaacgggcg + 56161 accggctcta gcttaaatct ttttgcatca tttttataaa ggccatatca tgaaaatcgt + 56221 ccaaaccgtc atcgccggcg ccgttgccgc cgctgtagcg aacgtggcgc tggccgctcg + 56281 tagccctggc gtcgaacgca atacggaggg cttcctggaa gcgatggcca aggctggcgc + 56341 tccgcccctg gagtctctct cccccgccga tgcacgggcg gccgtggcgg ctgcgcaagc + 56401 cggcgccaag ctggcgccgg ccgacatcag cgagaagacg ataagcgtcg acggcaaacc + 56461 tctgaagctc accatcgtac gtccggctgg caccgatggc gtgctgccgg gtttcatgtt + 56521 cttccacggt ggaggctgga tcctcggcga cttcccgacg aacgagcggt tcgtccgcga + 56581 cctggtcgca gactccggcg cagcggcaat attcgtcaac tacacgctgt cgccggaggt + 56641 gcgctatccg gtggcgatca acgaggctta tgcagccaca aagtgggtcg ccgagaatgg + 56701 cgaacagatc agcgtcgatg gcaagcggct agctgttgtc ggtaacagcg tcggcggcaa + 56761 tatggcggcg gtggtcgcgc tgatggctaa ggacaagggc ggtccggctc tgcgtgccca + 56821 ggtcctgttc tggccagcta caaacgcgaa cttcgaggac gcctcctacg acgcgtttgc + 56881 caccggtcac ttccttacca aggacatgat gaagtggttt tggagcgcct atgtcccgca + 56941 cgctgagcag cgccaggaaa tctacgcccc gccgctccag gcgacgcctg agcaatggaa + 57001 aggcttgccg ccgacactta tgcagacggc ggaaatggac gtctgcgcga cgaagccgaa + 57061 gcctatgccc gcaagctcga cgctgccggc gtcgacgtgg tcgttacccg ctacaacggc + 57121 atgatccacg atttcggggc gatcaacccg ttgtccgatt tgcccgcctc gcgtgccgcg + 57181 atgcatcagg caagcgagga actcaaggct cgcctgcgct gaggtcggcc ggcaagtcct + 57241 ggaagtcagc ccttcccgaa gcctgacgcg ggcacgggga ggaatcccgg gcgacgagtc + 57301 cgtccatctc tcgtcgcccg gtgtctccgg tatagattgg cagttatgcg caccgttgga + 57361 gtttgtccgc gcctactagg cacctctcgt gcatcgacag cagcacgtgg ttggtgggca + 57421 agcagggcgt cagccctatt cgctggcgtt gatgtcacca gaccatcccg aagccaaggc + 57481 cgacggattt caccgcttaa acccgaaagg gcgaatccct gttctcattg ccgagaattt + 57541 tactctcact gaggcgccgg caatcttatt tcacctgggc ctgacgaatc cgggtgcagg + 57601 gctcctggga gcgggtgcag aaaacatcgt ccgttccatc gaatggttca actggctgtc + 57661 gagcgcggtt cacgcggtcg ccgtccgaat gatccggcgc tacgcatgat ctggcgcgcc + 57721 gactttttct taccggatca atcgatgtac gcgcccttag ttcataaagg aaaggagcat + 57781 ttggcttccg cgtttgcgct tatagaatcc aagttgacgg accgagattg ggagagtcct + 57841 attcaatcgt tgatccgtac tttctcgtag tccaccgctg gggaagtcgt atggccgttg + 57901 cgatgcgaaa tctctacccc gcttggacct ctcatgcacg tcggctggag gagcgggcgg + 57961 ctgtgcgacg agccctaacg caagagggaa tatcccagtg gggtagcggt cgcgttgccg + 58021 aatccaagaa taacaagatc acatacgcgg tcgtgctaaa aagtctggta gcgttcatta + 58081 gcgggaaacc aactctacgg ccgcattgac gctcatctcg aacgccaaag ccgatggagc + 58141 gagttcaatg tcgtccgtgc aggccaacgc cgaagtatga tggagttggc ccgccacaaa + 58201 cgaccggttg gttcgctgcg ggctatttcg tcgccatgat gtcgatcttt gaaatggtgg + 58261 gcgcatgatc cagcaattcg ggggccttgg ccatcaatgc ggctgcgacc ttgccagcca + 58321 ggtgcgccga ccgacctgcc tcgtcgggga acgtgtcaaa gattgcaaat tgggtcttac + 58381 ttaaacgaac cgcataccaa gcgatcgttg ccggctcgtc ctgcacgata gcgcggccgc + 58441 cgcgcaggaa ttgcgcgacg tcctcttcct ttccagcctt ggcatctagc acgacccaca + 58501 agccgacgtt cgcctgcgcg gttgaagttg gctcggcggc atgcgccggt gcaccgagaa + 58561 tgagtgcagc gctgatgttg agggcgacca ccatcgcggt tgcggagtac ttcagagaga + 58621 ccatcatttt gatccttgag gatgcgttga agagagcctc agtctacgcc agcgaactcg + 58681 cagcgcgagt ggcgataatg acatctttga tatcattatc gccaagtttc gcatctgcca + 58741 acgggcgatc cttgacgcgt gttcgcagcc aaatcacgaa gatcagcgca atcaaaacgg + 58801 accccagcca tgctgcgata aagaggtcgc ccaggatagg cgaagctccg atcgtctgga + 58861 tcacttcgat ggagccaaga gtatcaaacc aggcaagtga tgtctcaaga agacgttcgc + 58921 ggaaaaaggc atccagttcg caactccgac ggttcacgtg tccggcggga cggtcgatgc + 58981 agccgcgtct gccgctgcgg cggagctgcg gagaccacca aaactcatcg caacacttca + 59041 ggaatagcag cgtcgatgcg cgacctatct tgtaaaaccg gctgaaaggc gatgtgccgg + 59101 acaataggtc taacctttgc cggctcatca ctatcggaca atcagaacag atctccaata + 59161 tatcgatcgg gccgacaggc cacatttcgt cttgctccgg tttgaccggg aaccagtagg + 59221 caaggtcacg ctcctgacgt cctcaacagc aagcccataa tcgtcccaag actcacgcgc + 59281 tcgtcattgg tctcatttta cgtcaatcaa tcagcactcg gtggcgtgac gcaattcgaa + 59341 tatccgggga acagggtgcg ctcggtggcc actaattcct ggcggcccga cctcaggccc + 59401 actaaaagtt caggagccaa gcctgctctt gacttaaggc tctctttccc actccaagat + 59461 accgcgcaac gtctttccgg cttgggtagc cggaacgctt tcgatgatca gggtgtcacc + 59521 gtccacgcga taggtccgtt gttgttcgga gcccacccaa ttctgatcac ccgcaatgtc + 59581 aattttcgtt atgaaacgat cgttctccag ccgatattta ccggagtacg tatacaagct + 59641 gcggaaggcg gcactctgcc catcggcacc ttcggccggc ttgcgatctt cagccgtgat + 59701 cagaaccaat atgcgacccg atgagaagaa gacacagccc gcggggcgct cgccaaacgg + 59761 agagggcttt tgttcctttg tgtctacgtc ttccaacacc caacgtttaa gcttccatgt + 59821 gccggtgagc gttttggtgt cggcgaaagc cgggtgaacg atagaatgtg cgaccattgt + 59881 caagctccct tggatctggc gctgtaattg cactccgtgg ccttgctcgg tcatgctaga + 59941 tcgcatttct tgcgttaccc aatatcaaat tcatctgccg ccatgccaaa cgaatatgcc + 60001 ccccgacgac gaccgatggt gcatctgctg atcctaccca gaacaattaa gctctgatta + 60061 atgcgacatc tgacacgatc accggcacaa aagtcgacgt atcagagctc agctgctcaa + 60121 tttggcaaag ctccttacgc cagcggaaaa gttgactgac atggatgccg gccgcgcggg + 60181 caatctcgga gatcaccgcg tccggctcaa aacatgcagc gacgaggcgc tctttatctt + 60241 cgcgtgacca gcgccgacgg cgctctacgg acgtgatcac ctcaattgaa tgcttcgtca + 60301 tatggctact cctagtgtta ccactaggac tcgcagtcag cagcccgtca ctgcaagacg + 60361 gccttcaccg tgggcttacg gtgaagatgt cgtgcagacg ttggaacggg tgtgccggaa + 60421 tgtcggctat ccaaagacca tcagggtcga tcaagggacg gagttcgtat cccgcgatct + 60481 tgatctttgg gcttatgcca agggggcgac cttggacttc tcacgcccag gcaagccgac + 60541 cgataatgca ttcattgaag cattcaatgg ccgctttcgt gccgaatgtt tgaacctgca + 60601 ttggttccta acccttgcgg atgcccgcga aaagatggag gattggcgta gatactacaa + 60661 cgaagaacgg cctcatggtg caatcggcaa taagccgccg atctcgctga tgaattcggg + 60721 aggcgcaacc agctcgcctc cctgaatgaa gccggaaaac tctagcgtcc gctggtccaa + 60781 cgtttgggag cggttcaaaa ccgccgaggc tctaatcgca gccggatgaa acttcagtgg + 60841 caggtcaccg gctatgtaat tgcgggggga gcgcaattag ccaaggggcg gccgccatcc + 60901 ggcttcctcg gcgccgaatg tgctgccgcg cagttccata catatgctta cgccgcgaaa + 60961 gccaactttc gcaacaggtt agggcggcga tcgcgttaag aataccgagg tcgccgcacg + 61021 gttgctcgct attccttatt tggctttcta agtaggtcag caggttcaac acccagcacg + 61081 gcagcgaatc ggtctagcac ctcaatactt gggctgtaga cgcatcgttc tagagagctg + 61141 atgtaggtcc gatcaacgct ggcacgatgg gcgagctcct cctgcgacat ttttctcgcc + 61201 tgtcgaagag ctctcaaatt tcgagcgaag gccggatctg ccgacggatc gtattcgtcg + 61261 cgccgatccg tctattccgc ctgaacaggc gatcgccgtg atcgctagga gtgggtctaa + 61321 acgctgggtt tcggccgcag acgcggccgc ccgaaaagcc ggggtgcatg tcggcatgcc + 61381 ggcggccaaa gcacaggcgc ttttccgtgg cctgatgttg gtcgatgcgg atcctgtaaa + 61441 ggatgccgca gcactcgaac gcatcaccct ttgggcgctg acgctctact caccgatcgt + 61501 cgcagtcgat ggcatcgacg ggatcgtcat ggataccgag ggtgccgatc acctgcaggg + 61561 cggcgagctg ccaatggtga cgaagatcgc caatcagttc ctggcaaaga agctcactcc + 61621 gcgggtcgcg atcgccgaca cctggggtgc ggcccacgcc tgcgcccgtg ccatcagccg + 61681 cgagacggtc atcgtaccga tcggcgagac ggtgcgcgcc gtcgaaaagc tgccgatatc + 61741 gttgctgcgc cttcccggga aagtcgtcag cgatctgcgc acgctcggtt tccagacgat + 61801 cggcgaattg gccaacacgc cgcgcgcgcc gctgacgctt cgtttcggtc cggagattgg + 61861 ccggcggctc gaccagatgt tcggccgagt ctccgaaccg atcgatccga tccgcaccgc + 61921 cgagctgatt gaagtgagcc gcgcctttgc cgaaccgatc ggcgctgccg aaaccatcaa + 61981 caaatatgtc ggtcggctgg tcgtgcaatt gatcgaggag cttcagaaac gcggccttgg + 62041 cgttcgccgg gcggatctga tcgtcgagaa ggtcgatggt gccaggcagg caatccgcgc + 62101 cggcgcggtg aagccggtgc gcgacgtcgc ctggctgacg aagctgtttc gcgatcgaac + 62161 ggagaagatc gagcccggct tcgggatcga gaagctcacc ctggttgcgg tcatagtcga + 62221 gccgctggag gagcgccaga ggtcgtcatc gctggtcgag gaggaagtga aggacgtgac + 62281 gccattgatc gatatctatg gcaaccgcgg gcagcgcgtg tatcgggtgg cgccggttgc + 62341 ctccgacgtg cccgagcgct ccgtccagcg catcagtcct gctgccgatc cggtcgaggt + 62401 cacctgggtc agccattggc gccggcctgt ccggctgctg gcccgtcccg agctgatcga + 62461 agcgattgcc ttgctaccgg accgtccgcc ggtatcgatc acctggcgcg gcaagcgccg + 62521 gaaggtcaag cgggccgatg ggcccgagag gatttttggc gagtggtggc ggcgcgacgc + 62581 cgagatggag gcggtgcggg attatttcgt catcgaagac gaggccggcg agcgtctctg + 62641 ggtgtttcgt tccggtgatg gcatcgatcc tgaaaccgga aaccatcgct ggttctgcca + 62701 tgggatcttc gcatgagtta tgccgagctg caggtcacga cccatttttc cttcctgcgc + 62761 ggcgcaagct cagcgcagga actgttcgag accgccaagg ctctcggcat cgaagctctc + 62821 ggcgttgtcg atcgcaattc ccttgccggg attgtccggg cgctcgaagc atcgcgcgcc + 62881 acgggattac gcctccttgt cggttgccgg ctcgatctgc aagacggcat gtcggtgctg + 62941 gtctatccga gcgaccgggc tgcctattcc cggctgactc gtctcatcac gcttggcaag + 63001 tcgcgcggcg gcaagaacaa ctgcatcctg cattgggatg atgtcatcgc atacaccgac + 63061 ggcatgatcg gcattctggt gccggacctg ccggatgacg tctgtgcgat ccagctgcgc + 63121 aagatggccg agctgtttgg tgatcgggcc tatgtgtcgc tctgcctccg gcgtcggcag + 63181 aacgaccagc tgcggctgca tgagatttcc aatcttgcgg cgcggttcaa ggtgaagacg + 63241 gtcgtcacca atgacatcct cttccacgaa ccaggtcgcc ggcagttgca ggacatcgtc + 63301 acctgcattc gcacccgcac cacaatcgac gatgtcggct tcgagcgcga gcgccacgcc + 63361 gaccgctatc tgaatccgcc ggaagaaatg gagcggttgt ttcggcggta ccggcacgcc + 63421 ctcgcgccaa ctatcggcgg cctttgtatt gacaccgttc cttgctgtcc aggatcaccg + 63481 gaatttggcc ggaatgcgac aaatgcagca tgcacgctat caaagtgccg ccttcgagcg + 63541 agtggcgtgt tacgcattgg ctcgggacgc tgctccgcgt aatacatctg agaccagcgc + 63601 aagctcatcc tcattgctgt agtagtgggg cgaaaaacgg acggcgtcgt tcctgacgct + 63661 gcatttcacg tcggcagcat ctaagtctgc aaacaggacg ctggtggcca tctttggatt + 63721 cttcagcagg atgatgcccg actgggattc ttccggccaa tcactcatga tttcgtaccc + 63781 tgcggattta gcctgcgtcc gaattcgcgc tccgagttcc ataacataac tctcgatctt + 63841 ctccatgccg atggaatcga tctcttcaag cctcttggcc aatccgaaga tgcctgcgcc + 63901 gttctcggtg cctggttcga accgcttggc gtctggaaga aaatcgagaa cgcgatgaaa + 63961 ggagaaggcg tcgttcacac tcaaccaacc gacgatgcgc ggcttgatgc gttcgaaagc + 64021 cctttcggag aacgcggcga agccaattcc gactggtccg agcatccatt tatgcgcgct + 64081 cacaacgagg acatctacgc cttcgccctc catgtccacg ctcagcaccc cgaccgactg + 64141 cgtgccgtcg acgaccaaca atgcgtcgtg atcggcgcag atttgcgaga atccggcgag + 64201 atcaacccgg aaaccgctgt agaactgaac atggctcact gcgaccagtt tcgtcctgct + 64261 gtcgaccagg aaacgaagat catctggagc aagcgctccg ttgcgcgcct tcatcgtgcg + 64321 gatttccaca ccttgctggg aaagcgtctc ccagatgaga aagtttgagg gaaattcgag + 64381 ctcaggaacc accaggttgt caccaggctt ccagtcgatt cccaaggcga cgagcgacag + 64441 gccatgcgag gtgttctgga tgtaggcaat ccgttccgca gtcgagccaa caagtttggc + 64501 ggcaaggctg cgtcctttgt cgcatgtcgg cgcgcttgcc accatgatcc cggacgtgtc + 64561 cgtggcatgg agtgtcgcct gccgggcgat cgcatcccgc acgccgatcg atatagggga + 64621 aacagcggcg ttgttccaat agatgcattc tttgacgatc gggtattggc tgcggatgac + 64681 tgcgagttcg gccgcgttca gataggtctg gtgcctcgtt gaagcgtgtg cgttcataca + 64741 tgatctcggt gaaaaaatgg cgcagtttcc gtctgctccc gtcgctcgtg aatttggtca + 64801 cggcaaacaa gccaaagcga gggtgaaagc gagcggatgg cgctcatccg ctcgtaaggg + 64861 gcttattcgg caatctgatc gacagaggtg atgatacggg attttttgac cgccgcgagc + 64921 gacatgccat acttgtcgag gacggtctcg tacttgccgt tggcgatgag ttcctgaaaa + 64981 attgcctgca tcatttcagc tgaggccttg tcgcctttct ggacggcgac gcctaaaggt + 65041 aggacatcgt agactcccgg agcaacgacg agctttccat tgctggtctg ctcgtaatat + 65101 ccagctgcgg tggagtcgtc gaaacgagcg tcggcccgac ctgtcagcac cgcctggatc + 65161 gtgtctttct gctcgggata gatgaccttg tcgatcttgg gcaggttgga tttcgcgcag + 65221 tcctcgctca gcttgtccgt ggtgaaatcg gccgccgatc cggtctggac agccacccgc + 65281 ttgccgcaca ggctcttcgc atccgagatg gagtcttttt tgtcagggag aaccatcgcc + 65341 acggttcctg actggaggaa cgtgacgaag tcgacctgct tcaagcgatc ttccttcacc + 65401 gtgaacgtcg cccacgctac cttgaccctt ccggccgaga gcgaaggaat ttgggaagcg + 65461 aaaggtatgc gctgaaggtc gagtttagcc gtgatgagct tcgccgcctc gtgcgccaaa + 65521 tccacgtcca ggccagccgg ctggccgttt gcatcgacga actcaaatgg tgcgtaatcg + 65581 agcccatttg cgattgtgag cgtaccggac gacttgacgg attccgtgac cggaacctcc + 65641 gctccgatgg cgttctgcgc aagcgtgagg gcgacgaaaa tacccagatg cgtggctttc + 65701 atgatgatcc ccttttttgg acgttggcgt ttagcacggg taaggaagca catcctggac + 65761 agtgaatcga attgaaaaga tgcattgaca atatgagctg cgttcatgat cttctgcgga + 65821 ccgccgatat cttttgacgg cgagcaaaag cacccgaggg tcccgatatg gccgagttct + 65881 ctttgaacca aatcgatctc aacctcctaa ggaccttcga cgtcctcatg cgggagcgga + 65941 gtgtgaccag ggctgcagac agacttgggc gcacccagtc agccataagc cattcactgg + 66001 gcaggcttcg cgacgtgttt aaagacgacc tcttcacccg agaagcgggg atcatggagc + 66061 cgaccgcacg ggcgaaagaa ttggcggagg tcatttccca ggccctgcac gaaattcggg + 66121 tggcggtgga ccgacacctc aatttcgatc cgacgaccac ctctagaaat tttcggattg + 66181 gcctttcgga ctatacggcc gtcacctact tgcccgaact gatcgagaat ttctcgatgc + 66241 tcgcgcccaa tgcttcgttg aatgtggtcc acgccaggga acccgacgca ctcggatcgt + 66301 tgaagaaccg cgaagtggaa tgcgcggtgc ttggcaaccc caagctcgat gccgagcact + 66361 tcgaggtcgt ggagctttcg agggatcgga tggtttgtgc tggatggact ggaaatccgg + 66421 cgatggccga catgtcgctg gatcgttacc tcgcgtcgcc gcaccttcag atatccgcgg + 66481 acggcatcgc agccggcgtc gcggacataa ccttgcagaa gctcggcctt catcgaaaag + 66541 tggttgcaac gataccgcac tatctcgttg ctccatgggt cattaaaggc acggagctga + 66601 tttcggcgtt cggtgacggc gtgttgctgg cactctctga agaaagcgag acagcgatcg + 66661 ttcctccgcc gctcgaactt cccgatgtca ctatttcgtt aatcttcgat agatcgaatg + 66721 aactagaccc cgggcatgtg tggttccgga acctgatcaa agacgtgtct gacaggcaga + 66781 ggacgctcaa acaaggcgtc tatgaacgac tggagctttg aacgcgcctg cggcctgaaa + 66841 agccgcgttt gcgacaaacc gccgaagggg gcatgcaggc atttgtcagt ctagctcgaa + 66901 cagggcttcg acctcgactg cggcctgcaa tggcagggaa gaaaccccga cggtgctccg + 66961 cgcgtgccgc ccttgctcgc cgaaaatctc aacgatgacg tccgaggcac cgttcatgac + 67021 gcgagcgtgc tcggaaaacg cagacgtcgc agcgatgaac ccacccagtc gaacgaccct + 67081 cttgatcttg tccaggcttc caaggctggc tctagcttga gctatgacat tgatcgcagc + 67141 ggcccgagca gctcgtattc cgtcttcgac agagacgttt tccccgagcg atcccaacgc + 67201 cacgagtgtg ccgccgggag acacgcaaag ctgccctgaa atgaacagga ggttcccgac + 67261 ttgatgagtc gcgacgtagt tcgcgaccga tccgggcggt ttcgggagat cgatggccag + 67321 ttctatgatg cgctgctcga aagaagagtt cattggcaag tcctgtcctt cagcttctgt + 67381 ggctgtggat agatgaattt cggagggcgg tccgggagtt tttccgcgcg tcagaggacg + 67441 cgcgccaaaa actcacgtgt tcggggagac tgcgggttgg caatcatctc tcgcgggttg + 67501 ccggattcga cgacaagccc gccgtccatg aaaataagct gatcggcaac ctcgcgggcg + 67561 aacccgattt catgcgtcac gactaccatg gttattccct cgcgcgccag actcttcatc + 67621 acctcgagca cctcgcccac gagctctgga tcgagtgcgg atgtcggctc gtcaaaaagc + 67681 aagacctttg ggttcatcgc gagcgcccgt gcgatagcca cgcgctgttg ctggccgccc + 67741 gacaactgcc tcgggtaagc gtccctcttt tcggcaaggc ccacccgttt gagaagttca + 67801 atcgcctttg ccgtcgcctg cgcaaccggc tcccgcttca cgcgtaaagg agcctccata + 67861 agattttcga tcacggtcat gtgaggaaac aagttgaact gctggaaaac cattccgatc + 67921 tcggccctgc gctggcatat cgcgctcggc ggcagttcgt agagtttgtt gccttctagg + 67981 cgatagccaa cgaagtctcc gtcgacgagc atcaacccgc cattgatttc ctcgaggtga + 68041 ttgatgcagc gcaggaatgt gctctttccc gatcctgaag ggccgatgat gcaagcgacc + 68101 gagccggacg gcaccgtgag gtccacgcct ttgagcacct cgagatggcc gtaggatttg + 68161 cggatgcctt tggcatgtac catcgtcgta gatgcgatcc cgaagcgctc gtgatgtgtg + 68221 ccgggtgcag tcatgccagc gcctccttcc gaccgatctt ggccaggttc ccgagtattg + 68281 ccctcgtgat tccgggattg atgcgatgct cgtcgcggct gaaatgtttt tccaggtaga + 68341 actgaccgcc agacatgatc gagactacgg cgaggtacca gaaagacaca acaaggagca + 68401 gagggatggt ctcgaacgtc cgggcataga tggactgggc cgagtacagc agatcgccca + 68461 cagcgatgat actgaccaga gaggtcgtct tgagaaggtt gatcgtctcg tttccggtcg + 68521 gcggaatgat tattctcatc gcttggggaa ggacaatcct cctcagcgcc aggctaggat + 68581 tcatcccaag acatgccgac gcctcgtact ggcccttgtt gaccgacttc aatccgctcc + 68641 tgacgatttc ggccatatat cccgcttcat gaagcgagag tgcgacgatg gccgagacca + 68701 gtggggtcat gatgtcattt gtagacaccg aatacactgt gccgaccccc ggcagccaaa + 68761 gcgtgatatc tttgacgacg atcgacagat tgtaccagat gatcaactgg acgagagctg + 68821 gagtgcctcg gaagaaccag acgaaggccg cagcgggcac agaaagcagc ttgctcggag + 68881 acagcatcat caatgcggcg atggttcccc cgagagtggc tagggccatc acgaccgctg + 68941 tgattagaac ggtttgccaa agaccatcga gtatgagtgg gttgaacaga tattgcgcga + 69001 cgatgggcca ctgaaactcg ggattggaaa ttgctgacct gatcatgccg gcgaccgaca + 69061 ctaagaccaa ggcgacggcc acccacctcc aagggtgtcg cagtggaacg atggtcagtt + 69121 gttccgggtc gatcttgtcg gccgatctcc ttgtggattg actaccagtc atactcgcct + 69181 ccgcttgact agatgggcac attccgttcc cgatatctgg gttgtccccc gattggcgcg + 69241 gtcacccacg tctcggaaat tttgatctcg cctcccttcg ttgtgaggcc cgacggtgca + 69301 ttcaatcaaa ttaataattc gcatcgtgaa aatgcacgcg gtgcatgagg actagaaacg + 69361 tggcgggacc acccgcaggc caatcctcaa aggcaagggg gctcgccctt aaaatgggcg + 69421 aaaagctgcg acgccgcgtc ttgcgggttc gttacgattc agggattccg tctgggcatc + 69481 atgtcctcgc tcaaaatacg aggacaaatt gccagatcgg tgctcacgaa aaatcgatct + 69541 aaatccgacg cgacttttgt ccgcgactcg gtgaaatcgc cgtcccgaac tttgtgcaac + 69601 ccgcaatcct tgcgaagcgg ctgatagcca gtgacgccct ggggcgaaga gcagaccctt + 69661 gcgcgattga catgccttac ccgttccgca atcaacactt ccccgcaaat gcgactccca + 69721 tcttcatcgt ctgggcccgg cgttttgacc attgcgcata cacgtggctc gccgcgcttg + 69781 aacgcaagca gtgctcggag caggcctcac catcaatctt atcgtcaagg cggctcgttc + 69841 acgtccatga ggcgctgtcg aaggagactt gtattttccg ctgcggcctt ctggtgcttc + 69901 ctcttatcga cttctcggag tcccgtcgtg atggtagacc gatctgcgac tggttgttgg + 69961 cgcggcctgc cggttccaca cgtcgatctc gcaggcggcc gcgtcgtata gaccaccgcc + 70021 cgcggcaaac acccttgctc ccgtcagccc tcaacccatc tctcacatat tgacgtagcc + 70081 ggaaaaaagg tcctccgcga ttgccactgg aggcgtttcc ccacatcgtt gagtggaaac + 70141 gccgggtcag aaaaccggcg gggtcgcggc atccatacga tctgatgccg ggtacaatgc + 70201 gctttcacat gcgcctcggc cgttcggtag aagttcgact gcattggtcg gtgcacgcat + 70261 gtttcgcagt ttcttaacag ggctcataaa ggctcattgg aagctgccgg attttggccg + 70321 cggcgaacct catccctccg cacaacggag gaaagagcga atgccgtaag ggtgtcccgt + 70381 acgcccggga gctcggatat cagctgccag atctggcgaa ttcgatcggc tcgcggagat + 70441 tccactctga cgacgagatc gaaatcgccc gtcatcacat cgcaagcgac gacctcggga + 70501 attgtccgca gagcttggat gatctcgcca cccctcatac ggtcctggcg gtagacaaac + 70561 atcaatgcgg tggtgatcgg gctatcgttt ccgccatccc cagtgacgat cgtgtaaccc + 70621 ttgatgaagc cctcgcgttc tagccgttct atccgaacgc gaacagcatt acgggacaga + 70681 ttgaccttgc ccgatagctc agcgtgagag gcccgtgcgt tggtccgcag cgcgcttagg + 70741 atttgttcgt cgatgcggtc gagaagatat ctcatgcccg acaggccggt tgccgggcaa + 70801 cttggccaaa tcccgaagat tcaaatgcgc gtacagccgc ctcaagcata taaatctttt + 70861 cctttttgct agcatagtgt ctggcggcgg atgcgaggac gaacgcggga agtgctttct + 70921 tttccgcctc agtcagctgt cttacactct catagcctgc caagatggcc cgggctttct + 70981 gttcgtccag ctctccagcc ggctggctcg cccagctaat gagaacgtcc gcaatctcag + 71041 aaatcagcac gtcatcgtgc cggagccgga agttaatcac accgctgacg ttctctccga + 71101 ggaagaagac attggaaggt accagggctc catgcagggc acccgttggc agctcatcca + 71161 gagaatgttt tcggccccct tgcaggatca cgtggatccg cgccatgagg cggccgagat + 71221 tttcgctctt ttctggcgtt ggattgtttg tcgatgatcc cgctacaaag ctcacgatgg + 71281 caacaaggcg accggcagcc tggaaggtgg cgtcgccatc aaccgtccgc gtcggtttag + 71341 gacatggaat acccctgttg ttcagcttct ccatcgttgc gaacgcgcgt tccagatcca + 71401 gcggctcggc gccgttttcg aaaagcgtga cgatgaattc accgcccgct gtccggaaaa + 71461 gataggtcgt ttccctgtca ccgtcggcaa taccgatcac cgacgacaac gatgtcatac + 71521 cgtaggctgc ggctatcgag tttcggtctt cgtcggatat ttcagtgaaa acggccatct + 71581 ttcccctccg tccaaccact tcttgagaaa ttcgcaaact ccgcggtatc tccataaatc + 71641 tgatgaacct ggctttccgt cagacctctt tcttcttcga caacatctcg taagtttctc + 71701 ttttcgacca gcgccttttt ggttagatct gccgctttgc tgtatccgat caacttcgcc + 71761 agcggcttag caagaacgat agattggtcg agcaacgcct ggcaacgatc gacgtcaact + 71821 tcgatgccat cgatgcagcg ttccattaag atcaccatgc ctctggttag catgcgcatc + 71881 gactgcagga tattcaaaac gatgaccggc tccatcgcgt tcagctgcag ttgaccggca + 71941 ctggcggcga gcgtcaccgt aagatcattg ccgataacct gaaagctaat ctggttcatc + 72001 atctccggga tgaagggatt gcccttgcca ggcatgatcg acgagccggc ttggaccggc + 72061 ggcaagcgga tctctcctag accgccacgg ggaccgctac tcaacagccg aaggtcgttg + 72121 caaatcttcg tcaggttgaa ggcaatgcct tcaagcacga aggagacgaa cccgcccgtg + 72181 tccgatgtgg cttcgatgaa gttgcgggcc ggaattaacc ggtagccgga tttctgggac + 72241 aggtgatcgc ataccgcctg cgcgaagcct gagggtacgt tgattgccgt gccgctcgcc + 72301 gaaccgcaga ggtttatctc cttaagaagc ttagaggcgt agtggacttg aacgatatcc + 72361 tcgtcaatga attcagcaaa cgcttcgaat tccttccccg cggtcacggg cgcagcgtct + 72421 tgcaactgtg tccggccgac tttcacagaa ctcgcggcat gactgagcgt cggactttcc + 72481 cccgtcgttc cgcacggaac aaggccgaag gacccttcgt cgatctgcca ttcgatgaga + 72541 tcgcgcagcg cgccctcatc cacatggcca tcagcaaatg gcgtgacagg cgcagtgata + 72601 gaacccttga acattgttta ccccgacatt gctgatggca ttgagcgtgg aaagagctgt + 72661 taaggcttca accggttgat gcacacgacc ttcggctggg tcatgtcctc ataggcaaac + 72721 cgcacaccct cgcggcccat gctgccatat ttgaagccgc caaacggcat tgcatcgaac + 72781 cggtagtcgg aggagtcgtt gaccatgacg ccaccggcct cgatcttgct tgcagtttcg + 72841 agcgccgctt cgaggtcatt ggtaaaaatg gcggcgtgaa ggctgtagtc agggccgttt + 72901 gcaagaccga ttgcatcagc caatccatcg aagggctgaa gaatgacgat cggcgcgaat + 72961 acttcttcgc cccacacgtc gcaggatatc ggaacatcct ccaggacggt cggcgggtag + 73021 agattgccgg ctggcctatg gccgcaaagc aatgtcgcgc cggcagcgag cgcccgttcg + 73081 accattgcga tcgctctgct gacagccttt tcggagatca acgggccgac atcggtatcg + 73141 gcatccaagg gattgccagt cttgagcttc ttggtatctg cgacgaattt cgccttgaat + 73201 tgctgataga tcggccgctg gatcagaatt cgctgcgtgc caatgcagtt ctgaccggcc + 73261 gcccagtagg ctccggatac gcaggcctcg acggccgcat cgaagttgca attttccata + 73321 acgatgacag gcgcattgcc gccgagatcc atggcaagtt ttttgaggcc cgcggtcttg + 73381 gcaatcgctt caccggtggc aaagccgcca gtaaacgaga tcatgcgaac gtctctcgcc + 73441 gcaacaagag cctttccaag ctcagggcca ccgatcgcaa ccgttataat ctcttcaggt + 73501 agaccgcttt cgatcatgac ctcgacgagc ttgacggcag agagtggggt aaactcggac + 73561 ggcttcaaaa gcactgcgtt gccaccggca atcgcgggac cgagtttgtg ggcaacgagg + 73621 ttgagtgggt cgttgtaggg cgtgatggca gcaataatcc ccagcggttc tctcgtatac + 73681 caaccctggc gagactccga gcccgcgtaa gcatcgaacg ggatgacttc gcccgcgttg + 73741 cgtttggcct cgtcggcgga cagcttgagc gtattgacgc accggattgt ctctttgcgg + 73801 gcctgcgtaa tcgtcttgcc cgcctctttg gcgatcagga gcgcgaactc ctcccgcgag + 73861 gcctcgattg ctgctgcggc cgtttcaagg atcgccgacc gcttgtgacg gggcagggtt + 73921 ctggcaatct gcgcgccgcg cctggctcgc tcaaggagcg catccacgct tgcagcacag + 73981 gtttccacaa ctgttccgac cagcgtaccg tcgaaggggc tggtgacggc aatttcttcg + 74041 aggcctgtcg cttgggtgag tgcaaggttg tgcatcatgc agcttcctta gcggtggaag + 74101 aagcgcccgc ggaatgaatg gcgacgatat cggaaagaag agcatatgca gtttcgatgc + 74161 ggccagcgcc aggtccggtg atcgtcacgg cgcccagcag ttcggtgtcg agtgagactg + 74221 cattagtggc gccgttgacg cctgccagcg gatgatcgag gccaagccgt ttgggggaaa + 74281 cactgccggt cacactgccg tcgtcattgc ggaccgcagc cccgatgagc ttccagcggc + 74341 tgtttgcctt ggcggcttcc tcgatgtcgg agagtgaaag accggagacg cccttgcagc + 74401 ttatatcctc aggcttgaga ttcgcgccaa gcagttcgtt tgccaggatg acaaccttca + 74461 gacgaacgtc gaaaccttcc acatctgcgg tcgggtcggc ttcggcataa ccgagtttct + 74521 gtgcttcctt gacggcggag gcgaaatcca gaccgctttc catgcgaccg aggacgaagt + 74581 tggatgtgcc gttgaggatc ccctcgaagc ccttcaattc ggcgccggca agggttcttt + 74641 ccgccatgcg aataacaggt gtgccgctca tcacggcgcc ttcgtactcg aagcgcacgc + 74701 cgttggtttt tgcgaacgcc ttgagcgctg gggcagcgat ggctacaggg cctttgttgg + 74761 tggttaccac gtgtttgccg gtctcaagtg cccacctgca atgcgagacg gcgggttcgc + 74821 catccttggg attggtgtac gttgcctcga ctacgatgtc tgccggcgcc gtcttgatga + 74881 tggtctcgtt gtcagcttca gcacttccac ccgaaagttg tccgaagcca cccttttcga + 74941 acttcgcgtc gaccagagtt tttgcatcaa gaccatttgg cgagattacc gaacccaaat + 75001 agaggtcgct cacagcgaca atattcaggc ggaagccaag gtcgcgttcc cagagctggt + 75061 tcttggcggc gatgagttca gtcagggcgc gattgacgcc gccaaatccg atcagagcga + 75121 tattgtaaat tgtcataacg atcctccgtg aagtttatgg cgtaatcata ggatgcgaaa + 75181 cgatccaccg gaacgtggca aacgagccaa agggcagcgc actttgctca atttgctgcc + 75241 gcgctggtca ggtccttgag agtttcatca cgggcatccc ggacacgatc aaagcgccgt + 75301 ctgccagacg cataggtgat gcttgttctc caaggatcgt cgttccgagc aggcacgatc + 75361 tcgccggctg taattggagc cggcaatgcc gtacacgatg caggcgacca aatggggccg + 75421 caaatttgcg ttgtatgttc ggttcgctaa acggactggg aacttgaatt gccggtagaa + 75481 agccatccgg acttgatgag tccgggcaag ccgcggtgcg cgtggccatc tcctcgaccc + 75541 agcggccacg ctttccaagc gtggccgtcg ccgccatcga tgaggcggtt ggccctgact + 75601 ttccgaccat cattagcgta agccagtgga agcagcagga ttccgcggca cggctatgcg + 75661 agaccagcgc tgatgcacat ttgttgctgc cgttggtgga cgctgggtag atctcctaca + 75721 tcacctgcag cggcgtttct gaaaccggac tttcctgcga tcgatatcga gaaaggcgta + 75781 aacctcggcg gacgcggccg agaactcacc cgtgctacaa cattggcggg gtcttattcg + 75841 gcttcgacgc agacttcctg aacgccttta caggtcaagg ggcggggcgg agacaagggc + 75901 tagaaagatt gatcggaacg agcgcgacga gtgcgattta atcgccgttg acagagctct + 75961 gactagcgat caggcgagtg tggaccctgc gacgctgcgg ccctttctga actcgtaaat + 76021 ttaaatatca ccggtggaag tgatgatatt cgcttgcctg ctatctttct cgcgcgatac + 76081 attacgacgc caccaccaac cctgaagtcg agaactgcca cttctccgac atgtaccccc + 76141 gaagccggtg tgggtggcgc cagaggcact cttttaaggc caggatgaac aaaacgacga + 76201 tgaagtctgc tcgccgccta gatgggacgt tgcccgtgtt cccgcgaaga agcctcacca + 76261 atagccgaac aaaatggcat tattggtatc cagttgatcc agtcgcgccg atcagttacg + 76321 cggcaatacc agcaattgcc tctaggaatg cctcaatttc ctcctccgaa ttgtaataat + 76381 gtggggaagc tcgcaccacc ggcgggagtt gtcgcgtata cgcgtccacg ggtgtactag + 76441 agggcgggga aaccgagacg ttgattccct tgccagccag ataggccatg acggctggcg + 76501 aatcccagcc gttaacggtg aatgagatga tggacgcgag cggcgcccca aggtcatgta + 76561 cggatacggc acgcatcccc ctcagacctt cccgaagcct tgaagaaagg tgactgcagc + 76621 gcgcttcgat gttttcgagc cctatttcga gcgcgtagtc cactgctgcc cgcaagccga + 76681 gacgaaccga acagttcttt tcccacgttt caaagcgtct ggcatcaggt cgcaactcgt + 76741 atcgatcggg tgcggtccag ggcgcgccgt agagatcgat catcgctggc tcgatcttct + 76801 ccagaactga ttttcgcatg tacatgaaac ccgttccccg cggcgcacgg agaaacttcc + 76861 tgccggtagc ggtgaggatg tcgcagccga gagcattcac atcgatcggc gtttgacccg + 76921 cagcttggca ggcgtcgagc agatagagaa tgccgttgtc tcgcgctatt cgaccgatcg + 76981 ctgctgctgg attgataagc ccgccgttcg tcgggatcca ggtgaccgct atcaatcgaa + 77041 cgcgctcatc aatcattttc gtcagcgcgt ccggatcaag gacgccagag gcgtcattcg + 77101 gaatgacttc gatcgatacg ccggtgcgct tggcaacctg cagaaaggcc acgtagttgg + 77161 ccgcaaattc ggcacttgcc gtcagaatcc tgtctccagg tccaaaagaa agggaataga + 77221 aggcgcgctg ccaggcgatc gtcgcgtttt cagctatcgc tatctcgtcg cgagcgcagt + 77281 ttacgaaagt ggccaagctg tcataggctc cctccaacag cgagttggcc tccgctgctg + 77341 cctcgtagcc accgatttcg ccctcccggc tcaggtaccc aatcactgcg tcaatcaccg + 77401 gactcggcat cagcgccgcc ccagcattgt tgaggtggtt tctgtttctt gtgccgggag + 77461 tatcagcacg caggcgagga aggtccaaag ccttgcgggc gtcgcggttg atttgcgaca + 77521 tgaggagttc cttcctttaa tctagccacc catcttgttg acgcaatcaa aactgcggcg + 77581 gaagccgata cttttgcaac ctactgtgca gatttgaacg gcgtggcgtg acccggcgcc + 77641 agttaacagg ccagcctgct cggcgggcag caacgtgcgc ctttcaccgt gaccagggta + 77701 cgtttcccag aaggcgttgt tcgtcccgac gaaccagcgc tttcacgaac tgctttacgg + 77761 ctctcactct cggcagataa gctaggtcct gatgtgccga catccagatc tcacgttcgc + 77821 cgctcaactc ttcaagcact ggaatgagcc cgagtttcag actgcgtgcg aattcaggaa + 77881 gcgctacaat tcctgcgccc gcagacgccg cgaacatctg cgacatcatg ctgtttgagc + 77941 tgaacgccat cgttggggcg ggtacgagtt cctctagcca tagaacgctt tcgagcagaa + 78001 ctagctcttc gatatagccg acaaagcggt gctcgcgaag atccgccgct tgagagggaa + 78061 cgccattgcg ttccagatag gcctgcgaag cgaataggcc agtcttgaag cgtccgatca + 78121 gctgactgtc gagagcagtt ccatgtggct tgaagaagct taggaaaaga tctgcctccc + 78181 ggcgagcgac ccgaacggtt tgcggcgatg tcacaagctc gatgtcaaga tcagggtacc + 78241 gactactgag ttcaacgagg cgctcggaga gataaagcgt tgcgatgccc tccatggtgg + 78301 ctaatctgac cgtaccgcgt acctcgtctc gattgctcac gtcgctgcga agcgcattca + 78361 ccccattctc gatttcctcc gctcgccgca tcgtaacaag cccggctggg gtcagcagaa + 78421 gtccatccct ggtcctttcg acgagtgcgc cgccaaccgt gtattccagg cgagcaagcc + 78481 gacgggaaac agtggagtgg ctgactttca gttcccgggc ggctccggtc acgctcttgc + 78541 accgtacgac gatgaggaaa agtttgagat cgtcccagtc aagatgatcg attgcagtcg + 78601 ccatgcgacc ctccgtctat ttttgcacag agtagtgcat ccaattcggg ttggcgcaca + 78661 aattcgtttt gatagtttcg agcgggctgc aaaggcgaac atgcagcata tcatgacgca + 78721 aaccacttgc ggatatgctg gtcgccagca ggtcgcgctg cttaacctgc cttgggagaa + 78781 accatgtcta gacgaactgt gaatgcttcg aacgctgcgg cagtaggacc atattcgcac + 78841 gcgacatggg ccggcaacct gttattttgc tcgggccaga cgccgctcga ttccagcacc + 78901 ggcaaacttg tcgacggaac cgtcgccgat cagacacgcc agtgttttga taatctgttt + 78961 gaagttcttg aagcagcagg cttggggtcg gatgatgtcg tatccgtcaa cgtttatctc + 79021 accgacatgg acgactttgg tcagatgaat gagatttacg ccacacgttt ttcatcgccc + 79081 tatcctgccc gcaccacaat cggctgcgcc agcttgccgc tgggtgcacg cattgaaatc + 79141 ggtctaacgg caaaacggca atcatgaccc ggtcattggc gagcttgtcg aaatggctga + 79201 tggcgagcct tgatcagtag gggaatatag atgggaaaca tgaaattctg ggccgcggct + 79261 gctcatattg cccccgtcta tctcgatcct ggggcgagcg cggaaaaagc ttgctcggtg + 79321 atagcagagg cggcccgaaa tggcgcatcg ctcgtggtat tttcggagag ctttcttccc + 79381 ggtttccctg tctgggcagc actttacccg cccattcaat cgcacgagca tttcaaacgc + 79441 ttcttgacag cttcggtgta cattgacggc ccggaaattg agcgtgtgcg aaaggccgcc + 79501 tccgataacg gtgttttcgt ttcaatcggt ttttccgagc gcaacccggc aagtgtcgga + 79561 ggtttgtgga atagcaatgt tctgatttcc gataccggcc aaatcctgat ccaccatcga + 79621 aaactggtcg caaccttctt tgaaaaacta gtttgggatc cgggcgatgg cgcaggattg + 79681 gtcgtcgcaa acacgagaat cggtcgcatt ggcggcctaa tttgcgggga aaatacaaat + 79741 ccgctcgcgc gctacagcct gatgacgcag ggcgagcaag ttcatataag tagctatccg + 79801 ccgatctggc ccactcgcgt tcccacagaa agcgacaact acgacaaccg ggcagccaac + 79861 cgcatccgtg cctcagcgca ctgcttcgag gccaagtgct tcgggatcat cgtggcgggg + 79921 cacctggacg aagtcgcacg caaatccatt gctttggatg atcctgcgat cgaagcgatc + 79981 atagatgcca gtccgcgggc cactagcttt ttcctagggc cgactggcgc cgcaactggt + 80041 gacgaaatga tcgatgaagg tatcggctat gcccaaatcg atctcgacga ctgtgttgag + 80101 ccgaaacgat ttcacgacgt cgttgctggt tacaaccgct tcgatatttt cgacgtcacc + 80161 gtgaaccggg tgcgtcgaaa cccaatcaga tttttggaag gccgcgctga ggacgctcta + 80221 acgagccccg aggccgtggc cgtgccggag taaaggtgat ggcgcggacg ggtatgcaca + 80281 tgaccaagaa gggtttctcc ttccagggcc ctgccgacgg cgttcatgcc cttaaatcgt + 80341 ggatcacccc tgaagatgat ctttttctgg tgacgcatat ggggttcctg gagatcgacc + 80401 ccgagcattg gcatttggat gtggatggcc tcgtcggcaa cccgacgaga ttgcatctgt + 80461 ctgacctcca agcaatgcca cagcgcgagt acatgtcgtt tcacgaatgc gctggaagtc + 80521 cgcttgcgcc gacagtggct aaacgcagga tcggaaatgt ggtttggaaa ggcgtgccgc + 80581 tgtcgcttgt tctggaacgt gcaaagatca gcaccgatgc gtcatatgta tggacttccg + 80641 gcctcgagtg gggcgaatac gccgaaattg aagaagccta tcaaaaagac ctgcctatcg + 80701 agaaagcact cgcggaagag gttcttcttg cacttgagat caacggccgg ccgctcactc + 80761 cggagcgtgg aggtccggtt cgactggttg tgccgggttg gtatggcacg aattccgtaa + 80821 agtgggttgg ttccatcacg gccgcgaacc ggcgcgcaag cggcgcctat acgacgcggt + 80881 tctataatga ccccaccgcg tcgggaacga agccggtttg ggacgtgacc ccggagtccg + 80941 ttatcgtatc gccgtcccca aatgatctgc tgtcagcgga catgccaacg aaaatatggg + 81001 gctgggcttg gggagattgc cagatctcta gcgtcgaggt cagcgtggac ggtggaggct + 81061 catggcggac tgcctcagta gggccgcgcg agggaagaag ttggcagcgg ttcgagttga + 81121 cttggtcgcc ggagccaggc ccccacgtcc tgttatgtcg ctgcaaaaac gagcttgggg + 81181 aagaacagcc ggtatctgat gcgagaaacg cggtgcactc ggttcaagtt caagttgatt + 81241 tctagtgccc cgcttctaca gcaatcagga ggtgacacgg caccacgatt tttccgatgc + 81301 aaccgatgcc gatctcgcca tctatcgcgg agagatctct tcggttgtcc aactcctcat + 81361 cgcacccacg attgcaagct tggcgattgg aaatcggcac gccttctggg ttagcgctgc + 81421 gacccgttct caagcgagcc caggcacgaa aaactaaggt ccacatgaac atttaggaga + 81481 agtgcgttcc ggctcgttga gtcctgcttc agaaagatcg cttttcatgc gggcggagtg + 81541 ccgaacccac ttcgactcct aagcgatcga agcacgcatc atcttgtcgt tcagtgtttg + 81601 ggccaagtcg gccgagaggc agccgaacct acaggccgcg gaccagatct cgcccggcgc + 81661 caggatgggg acattgtttt cctgcttttc ctttaagata accctcgatc cccgaggcgc + 81721 acggcagcgc caggcccggt gcgttttggt ccggggtgca ggacatcagc gcacggccct + 81781 cggcagctcg gccatccgga agcccatatg gtcggccacg ccgtcacggt ggttttggag + 81841 cgattcggcc cagccgtcgt tcttggcggc gaactggaaa aaacggcgac gaaaattatc + 81901 aatcaaccct cggacttctg agcaagctga gcatctgacg cgcagcgctt tcatcagtca + 81961 ccaaagtatt gcagccgatt cgcctgatag ctgccaagat tgcggcggct cgttgttctc + 82021 cccccgatgc gataaccacg tgccgtgcgc gtctcactgt atcgagatcg actgacatca + 82081 cccgatcgtt taccggatgg tccaccgtgc ggccgtctcg atcaagaaaa ttgcacagaa + 82141 catcgcacat tgctcccttg ccgatgagcg tctccaactc ctctggcgca agagatgcga + 82201 ccgacagcga tgttgaatgg gtgccgatat cgccgacgct caccagtgca atgtccagat + 82261 cggcggacag cttcatgatc cgattcaagc cgcacttttc gattaggcgt tctttggtgt + 82321 cgggagagtc aaccagcagc ggcgccatca aaagaaaaca ttgcgccccg agttgattgg + 82381 cgagctgcca ggtaaaatcg atcgggtttt catgctgggc ttccactgta cctccaagca + 82441 gagacacaat ttttacccct tcgcgccgca gcgggcgaaa actcgaaagc gcggcactga + 82501 gtgttcgtcc ccagccaacc ccgatcgtcg cgttgttggg aatggtgtcg gaaagaaact + 82561 gacccagcgc ttgcccaacg gcccgggccg ttccgtcgac atctttcgcc ggcggcgtga + 82621 cgatcacttc gtcgagattg agcgcggctt ccagctcaga tgccagctcc aactcgcttg + 82681 cggcctgctt cacccaaatc tggatttcgc cccgcttcag cgcctcatcg aggagcttaa + 82741 cgattgtgct gcggctgacc cccagttttg cggcgacttc gttttgggta agtttctgat + 82801 tgtaataaag aaaagctgcc ttcaacctga ggctttcagc gtccagcaag ctggtggggg + 82861 gcatccttgt cagttttgtc atccaatccg cccctccgaa agtgtcggca gcccataatt + 82921 ctaccaactt ggataatcgt tgcgcccacg atgcgaacgg cgcccaaatc cgttcgcaga + 82981 tcttcgagta aagcatctct agacgtggcg attcaccctt aactggcgcg gctaacctcc + 83041 gcaatagctt gcggtgcttg gcgtcacatt tgactgctct cgtcgtatcc agccacgatg + 83101 atttctctgc gccacggcgg gtcacttccg cggtgctcgc aggcaagact tgcaacatgt + 83161 gcggcaaact cgagagcttt gcgagcccgg tcacccgtca aggcaccgat gctatcgcga + 83221 tgcaagtcgc cgctctgctg gagatgggtg agaaagcctg caatgagtgc gtctccggct + 83281 ccaacggtat caacaaccct tgcagggcgt gctggaactt caacgactct gccacccgca + 83341 agataaaccg ttgcgccgcc cgcaccccgg ctgacgagga cgatcttggc ccggtttgaa + 83401 agccaactct ccgctgctga atgcggccca acacccggct gcatgaatcc gagatccgtc + 83461 accgacagct ttacgatatc agccataccg cagaggcgat tgagacgctg gcgatatgcc + 83521 tctggatcct gcgccagacc gggccggcag tttggatcga ttgagagcac gcgcttaccc + 83581 tgttcgttct cgaacagcga ctcgcaagcg ctcgagacag gggggtgaat caacgtgacc + 83641 gaaccgatgt gcaggacgtc aacctcggtg cccaaagatg gtgccccttt cagcgtccag + 83701 tgccgtgccg cactgccgct gtcatagaac gcgtactcag gctcgtcgtc gcccagacgc + 83761 acaaaagcca acgtcgtgtc gaaaggcagg cgggttgcat agcgaagatc aatccccgct + 83821 tcgttgaatt gctgaaccag catggcgcca aaaaaatcct cggacagccc tcccatgaat + 83881 ccgactttcc cgcctaacct gccaatcgct gtcgcgatgt tgcagcagga tccaccacag + 83941 acagggatgt agccactgcc accttccggg agcgcgaccg gcaagaagtc aatcaaggca + 84001 tcaccgcaga caacgatcat gaactcatcc ccccaattcc aatacggcac tcttcagaaa + 84061 ttcacaaaga tgccggctta cggctctgct tgcctcaacg tgaggaagat gcccggtatc + 84121 cggcagagtc agcacttgag cgttgaacct cgacagtctc ggacggtcga tcggatttat + 84181 cgtgtcttcc tctccccaaa tgacgatccg cggaactctg ctgacggcta tggaggccac + 84241 atggggttca atcgaggtct cgacatggct taactccgaa gccagggctc taagagcggc + 84301 acgaatgccc ggtcgttcca ggtgatcgag gagccgttga gcgatcggtg gcgttatcag + 84361 gcgcgggcgc gagaccaaac gctgaagtag cgccaacgca gagcccaact cgttaagttc + 84421 aggcagttcg ctcagaaatt ctcgaccgac ctctttgccc agaccagccg gtgcgatcag + 84481 gccaaggccg gcgaccaaat ccggtcggcg ggctgccagt tcgatggcga ttgcgccgcc + 84541 cagtgaatgc cccaccacta caaagcgatc gttctgtcgc gcgagtgcgt caatcagttc + 84601 atcggccatc tccgagattt tcatgccatt cctgccaggc ggttgaccgc catgcgcagg + 84661 caggtcgcag ctaaaaaccg caaacgactg catgagcgcg tcctgattgg ctatccagct + 84721 cattcgatca gctccgaacc cgtggagaag cagaaggttc ggccccgacc cacccaagcg + 84781 gaagactgga agcggtaagt ttgcggtcgc gtttgtcatg tctcgatgat cccaagcgtg + 84841 gagcccactg ccgcaagctc gtctgcggga atcaggattt tcttcagaat gcctgtcgct + 84901 ggggcttcga tttccatggt gaccttgggt gtcgtgatca ggacgagttg ttcaccttcg + 84961 gtgaccatgt cgccttccgt cttgaaccac tcgtcgatct gggcctcgtc gatttcgttt + 85021 ccgagattcg gcatgatgat cgggatatcc atatgtgttc ctcgctcaaa aatgtgttgg + 85081 ttcacagacc gagacgggca agttcccgat cgtcccagtc acgtatgctg cggcggttgc + 85141 tgcgctggcc tggaaacaac tcgctggtga tgacatcaat gatgtcatct ttctggggaa + 85201 agtaggttga ttccatctca gcgccgggca cgatccaatt cggagagccg attacacgag + 85261 gcgcggcgtg aagagtttca tagccgaatc gcgtaatgtt tgcggcgagc gtcatcagga + 85321 agctgccgcg ctccgaggcc tccgatacca gtacgatgcg tccggttttc cttatcgagg + 85381 caagcacggg ttcatagtta aatggaacaa gcgaacgcgc atcaatcact tcgaccgaga + 85441 tgccgaaggt gctttcgagt tcttcagcag cagcgagcgc ggagtaaagc gaaggtccca + 85501 ccgtcagaat tgtgacgtct tctccagcgc gcttacaatc gggttcacca atgggaagtt + 85561 ggtagtagcc ggtcggcacg ccttcattgc ggaactcttc cacggtgtcg tagaggcgct + 85621 ggctttcaaa gaaaaccacc ggatcgttgc ctgacaatgc ggaagccaac aggcctttcg + 85681 catcatacgg cgttgccgga tatacaactt tgaggcctgg aatatgggcg cataatgccg + 85741 tccagtcttg actatgctgg gcgccgtact tcgaaccgat cgagcaacgc aaaacgacgg + 85801 gaaccttcag ctcacctccg gacatcgatt gccattttgc catctgattg aagacttcgt + 85861 cgcccgcccg gcctagaaag tcgccataca tcagttccac cagcgcgcgg ccgccctcga + 85921 gcgcaaagcc gaccgccgtc gcaacgatcg cagcttcgga gatcggtgaa ttgaacaacc + 85981 gatcatgcgg taggatttcg gcaagccccc ggtaaacgcc gaatgcgcca ccccattccc + 86041 ggcattcttc cccataggcg acaagactgc cgtcgtgcgt catgtggtgc aagacggact + 86101 cgaacaacgc gtcacgcaat gtgatagccc gcatgggtga cagcttggct ccgtcctcgg + 86161 caaagccgta tcgacttttc ttcgcgtcct gacggatccg gctcacctcg tcgacaggct + 86221 tgaggagcga aacttcttgc gagggcagct ctatgttggt gttcgagaac atgagcttgc + 86281 cgatcagcgt tggatctgca tgaatatcaa cgggaggagt aagtgccgga tcgaccgcga + 86341 tcgcagtaat cgagcgcatg cgatccgttg tctgttcgcg cagttcggcg acctggccag + 86401 cggtcacaat cccggcctcc tgaaggcgat tggagaacag aatgatagga tcgtgctgtt + 86461 cccacgcctg catctcgtcc ttcgtccgat atgagttgat atctgttgtt gaatgacccg + 86521 agctgcggta acattctacg tccagcaagg cgggcccgcg gccttgaacc aaaagctcgc + 86581 gcttgcgtgc gacggcgtca gcaacggcga gcggatttgt tccgtcgacg gtttccgcgt + 86641 gtatggcttg ctggttgacc gccagcccta tgcgagacag gcggtcccat cccattgttt + 86701 cgccgatggt ctgaccgccc atcgcataaa agttgttggt gaagaaaaac aggaccggca + 86761 aacctccttt gaaagcatct gcccacagcg tctcgaattg tgccatagcg gcgaaattca + 86821 tagcttccca gacgggcccg cagccagtag agccatcgcc cgcgttggcc acagtgatgc + 86881 cgctggcccc cgcaagcttc tttcgcaatg cggcccccgt ggctattccg gcggacgcgc + 86941 cgacaattgc gttgttcgga taagtcccaa aaggcgggaa gaaggcatgc atcgacccgc + 87001 ccatgcctcg gttgaaaccg gttgagcgca tgaagatctc tgaaagcagg ccgagcagta + 87061 gaaaattctc cgcggtctcg cttgttttcg gaccctgcag cgaggtctcc acggtcctga + 87121 gaagcgtacc acgctcatgg ctctccataa tcagacgcaa ggcatcatct gggagctttc + 87181 ggatcgcaga aaggcccttc gcgatgaact cgccatggct gcgatgactg ccgaaaatat + 87241 ggtcatccgg ctcgagtgag gctgccgctc cgacggcagc agcctcttgc ccgatcgaaa + 87301 ggtgagccgg ccctttgtag ttaaactcaa tgccgcaata tgctcccttt ccctttagcg + 87361 aagccaaaat cgtctcaaat tcgcgtatga taatcatatc gcgcagaatt tgcagcaatc + 87421 cctcgtcgcc ataccgggca cgttccgcct cgatattctc atcgtacgcg tggataggaa + 87481 ttgagggtgc tttcaacgta ctgcgcacct tgaaggtcct ggggtcgatg tggatttcct + 87541 tgggcattct gacctccgat gggatttgcc ggggtcacaa tccccttagc cactagacat + 87601 atatcgtcac aacaaatgta tttcaagaaa gatttttgtg ttaggttctt tttaagcgat + 87661 gatcgtacaa atgcgacgga acggcaattt gtggcgtggc atggtgttgg agcctgtaaa + 87721 caaatgtgtt tcacaagcta tatttgtgtt gacctttgca aatttttcga tgactattcc + 87781 cataatgatc agcggtgaag agatgtcgcg ctgacgctgt ctggtccagc gttctcgttg + 87841 gctctgcagc ggattacggg aggataaatg caatggcaaa atttggaatg cacttcagtc + 87901 tgtgggcgcc cgagtggacg acagaggcgg ccaacgccgc tatccctgag gcggcgcgat + 87961 acgggctgga aataatcgag atccctttgt tcgagccggc caaaatcgat ctggatcacg + 88021 caaagtcgat aatccgagat catggtcttc aggcaaccgc gtcgctttgc ctgcccgaag + 88081 acaagatggc tcacctggct ccggaagcct gtacgcaata tctgtttcaa gttctcgacg + 88141 ccgctcacca cattgggtgc tccatgctga ctggcgtcac ctactcggcg ctcggctaca + 88201 agaccggtgt tcccccaaca agtagcgagt acgaagcggt ggttcgggca ctgaagccgg + 88261 tagcccggcg cgctgctgga ctcggcatga cgttcggtgt cgaaccttgc acacgcttcg + 88321 acacccacat tctcaatacg gccgcgcaag gaatatggct gcttgagcaa atcgatgaac + 88381 cgaatacctt cgtccacctc gacacctatc acatgaacgt agaggagagt ggtttcgacg + 88441 acggcatccg ccaggccgcg ggacgctcgc cctatatcca cctttccgaa agtcatcgcg + 88501 gtgtcccagg caccggtacc gtagattggg aactggtctt ccggacattg cgcgacaccg + 88561 gattcgatgg cgatcttgtc attgagagct ttgtctcggt accgccacag cttgctgcgg + 88621 cactctgcat gtggcggccg gccgcgccca atgctggcgc tgtacttgat caagggctgc + 88681 cgtatttgcg cggcttggcg acacgctacg gcctctgagg cgacatctgg cggcgacggg + 88741 ccgcagacgc aattccgaaa atctagagtc tacaacgctc acaggaggag aagagcatga + 88801 aacgtcgtga cattttgaag ttttctttgg ccgcaggcgt ggcatggctg attgccacgc + 88861 caaacctcgc gatggcggcc gacccagtca tggtgactgt tgtcaagatt gccggcatcc + 88921 cgtacttcgg tgcccttgag cgcggcctgc aggaggcagg gaagcaattc aacatcgacg + 88981 tttccatgac cggtccggca aacatcgacc cggctcagca ggtcaagctg ctggaagatc + 89041 tgatcgccaa aaaggtggac gtgatcggcc ttgtgcccct ggacgtcaag gcctgcgagc + 89101 ctgtgctcaa gcgggctcag gcagcaggca tcaaggttat tgtccacgag gggccggaac + 89161 aggagggccg cgactgggac gtcgaactca ttgactcgac caagttcggc gaagttcaaa + 89221 tgcagagcct cgccaaggaa atgggcgagg aaggggacta tgtcgtctat gtcggcaccc + 89281 tgactacccc gctgcacaac aagtgggccg acgcggccat cgcctatcaa aaggcacatt + 89341 atcccaagat gaacctcgtg gccgatcgct tccctggcgc cgacgaaatc gacagcgcct + 89401 accgcacgac catcgacgtg ctgaaggcct atccgaagct caagggtatt ctcgcgtttg + 89461 gctcgaacgg tccgatcgcc gccggcaacg ccgtgaagga aaagcacctg agcaagcggg + 89521 ttgccgtcat aggcacggtg ctgccgtcgc aggccaagga cctgatcatg gacggcgtca + 89581 ttcgggaagg tttcatgtgg aacccgagag aagctggctc tgcgatggtt gccgtcgcca + 89641 gattggtcct cgacggaacg aaaatcgagg acgggatgga tgttcccggc ctcggcaagg + 89701 ctacggtgga tgttcccgga aagctcatca aggtcgacaa gatcacccac atcaacaaag + 89761 aaaccgtgga tggcctgatc gcccaaggtc tgtagccgtc cttgcaggaa ataccgcagg + 89821 cgggtccgcg ctcgcctgcg gcaggacaga tgccagcgac taagggtaaa gcccactatg + 89881 accacattcc ttgaattgac ccatgtctcc aagcatttcg gcggagttcg tgcgcttcga + 89941 gatgtcgatc tgtcgctgga ggccggcgaa gttcactgtc ttgttggcga aaatggttcg + 90001 ggcaaatcca ccctgatcaa gattattgct ggcgtgcagg cgcccgatcc aggtggtagc + 90061 atcgttctgg agggtcgcga acatgcgcgt ttggacccca ttctttccac caagagtggc + 90121 atccaggtta tctatcagga tctctccctc ttccccaata tgtccgtggc ggagaacatc + 90181 gccatcggca gccacatggg cttgcctcgg ctcgcgaatt ggaaccgcat aaatgacatt + 90241 gcggcgaaag ccatggcccg catcaacgtc aatctcgacc tggagacgat ggtgtcggac + 90301 ctctctatcg caaaccggca actggttgcg atttgccgcg ccatggccgc cgacgcaaag + 90361 ctggtcatca tggacgagcc caccgcctcg ttgacccggc atgaggtgga ttctctgctg + 90421 cgtgtcgtca acgaccttaa gagcagagac atttgcacgg tgttcgtatc ccaccgtctc + 90481 gatgaagtaa tggagatagc cgagcgcgta acggttctgc gtgacggcgg taaggtgggg + 90541 accttcgatg ccagcgagat cacttcacgg cggctcgaga ccttgatgac cggtcatgag + 90601 ttccactatg cgcctccgcg gcccggaggc gaggctgcgg aggttgtgct ggcggttcga + 90661 aacctgtctc gaccaggcca ctacgaggat atcagcttcg acattcgcaa aggagagatc + 90721 gtcggcctta ccgggctttt gggctccgga cggacagaac tcgcactcag catcttcggt + 90781 atgaatccac cgtcccgcgg cacgattgaa gtctccggaa agccgctgat agccagctcc + 90841 aatcgtgtcg caatagccag cggcgtggct tacgtcccgg aagacaggtt gatgctcggc + 90901 ctggcgctgg ggcagcccat ttctgcaaac atcctcgcga cggtcctcga cagcttggcc + 90961 ggcaaattcg gcctcatcaa ccccgccaaa cgtgtagccg cagccgatga ttggatcgtt + 91021 cgcctgaaca cgaaagtgtc cgacctcgaa aatcccgtcg gaactctgtc aggaggcaac + 91081 cagcaacggg tagtgttggg caagtggatg gccacaaaac cccgcgtgct gatactcgat + 91141 agcccaaccg tcggcgtgga catcaaggcc aaggacggaa tctatgagat agtgcaccga + 91201 ttggcggcag aaggcgtcgg ggtactgctc atctccgacg aagcccagga ggtcttctac + 91261 cacacccatc gtgtcctggt catgcggcaa ggcaggctcg tcagcgaagt ggatccgctt + 91321 tcgtcgaccg agagaaatct gcaggaggaa atctatgcct aagacaattc aacgctggac + 91381 acgaagtcac gaattttggc tgcttgcggt cgtcatcgta ttgtccctgt tcctcacggc + 91441 agcgacggac agttttctga cgctgcaaaa ccttttcgac cttctcacct ccacttcgtt + 91501 tgcaggcatt cttgcagccg gtctgcttgt ggtcttggtg ttcggtggga tcgacatctc + 91561 ctttactgcc atagccagcg tcgctcaata tgtggctctc atgattgcca agacctatcc + 91621 aattgggtgg ttcggagtgt tccttgtcgc ctgctgcact ggtatactat gtgggctatt + 91681 caatgcggct attattcata aggttcgcat ctcttcggtt attgtgacta tatccactct + 91741 caatattttt tatgggttgc taatatatat cacccgcggc gactatatca cctcgcttcc + 91801 cagctatttc cgcgaaggga tctggtggtt cgaattcacc gacagtaatg gatttcctta + 91861 tgcgatcaat ttccaggcgc tgttgctagt ggtcgcattc tttatgacct gggtcttgct + 91921 caacaagaca aacatcggtc gtcaaatata tgccatgggg gggaatgaga tagccgccga + 91981 acgccttggt tttcatgtct tcggcctgag gtgccttgtc tacggttata tggggttcat + 92041 ggctgcgatc gcatcaatat cgcaagccca gctggcccaa tcggttacgc cgacgacgct + 92101 catcggcaag gaactcgaag tgcttgcggc tgtcgttctt ggcggcgcga gtctggccgg + 92161 tggcaacggc tccgtgtttg gcgccgtact gggcgtgatg ctcatcgcaa ttttgcagaa + 92221 cggactgata ttgcttggcg tgtcctcata ctggaatcag ttctttgtcg gctgcgttat + 92281 tctgcttgcg gtctcggcga ctgcgctgtc tcagcggcgt cggcatgctg gactcgcgtc + 92341 ctagggagga aattaccatg aagccgtccg ccaatcgatc tctcatatcc cgtttcgttc + 92401 gagagaacgc cacaacgatg acgcttgcta ccattttcat ggccgttcta gcggtgtttg + 92461 gcctcattct cggcgatcgg ctgctgagtg tgggaacatt tcagtcgata gcctttcaga + 92521 ccccggaact tggcattctt ggtcttgcga tgatgcttgc gctcctttcc ggcgggctga + 92581 atttgtcgat catttcgacg gccaatctct gcgctttgac gatagcgtcg gtgttgcagt + 92641 tcacaattcc gtggggcgat gctggaagtg tgctttggct cacctggcaa gtaggcgcgg + 92701 ttgccgcggg cctcgccgtc gcaattctga taggcctgtt gaacggcttc atcatcgcct + 92761 atctcggagt atcccccatt ctggccacac tggggaccat gatcgcctgc aaaggtcttg + 92821 ccattggact gacgcgtgga aacgtcttgt caggcttttc agatccaatc gttgcgattg + 92881 gaaatggaac ctatctgggg gtaccgcttg cgttcctgct gtttgttgcc ctttgcgtgt + 92941 tcgtgtcggt ggttcttcgg cgatcatcct tcggtcagaa ggtctacctt gtcggcgcca + 93001 acgagaaagc tgcccagttc tcgggcatac atgtcaaacg agttttgctt ttgacgtatg + 93061 ccttatcggg cgcgcttgca ggatgcggcg ggcttgtaat gatggcccga ttcaattctg + 93121 ccaacgcgtc ctatggcgag agcttccttc tcatcagcat cctggcggcg gttctggggg + 93181 gcatcgatcc atatggcggc accggaaaag tatcagggtt gtttgctgcg ttgctgctct + 93241 tgcagctgat ctcctccgcg ttcaatttga tgaactttag ccagttcctg acaattgcca + 93301 tctggggggc acttctgatc ggcgtttccg cgctccgctc aggcacaggc atattcgatc + 93361 ggttgcacct cttcgaattc tggagagcaa agagcgccgt gtccgaaaac ccctgatccg + 93421 ttcgtcttgg aaagtggacg gaggatgcct taacgacagt ggcgctcaaa aatgtctgca + 93481 ccaaaccacg aagctgagcc ggcgtggagc accatcgatg tgtggaattg ccgagcgatt + 93541 gtcacttgcc gaatagatat gaggagaagt tcagttcatg gacaagcttg cgcaattgcg + 93601 tcaaatgacg acggtagtcg ccgacaccgc cgacgttgaa gccgtcaagc gcctcaagcc + 93661 aatggattgt accaccaatc ccaccattgt gctccaggcg ctaaaaactc ctatctacga + 93721 tgaagatttt gaggaagcat tcgagtgggg gagcaaggcc tccgccgaag gcgcgctata + 93781 ccttcgccgc gctgtggatg tgaagaatat ctggatcccc tacggagatt aagggctcca + 93841 actctggtca tgcgggcggg aggagaaccg caccggcagg cgagaacgac ggaatttgag + 93901 ggaatcgcag agttgtaacg gacttgcggt atgggctcgg ttttgcggca cgaggtgctg + 93961 cgaaatggtg caactaaggg aggattttca tgctgaaaaa tatcgatccg gctctgaatg + 94021 cggatgtgct gcacgcgctt cgatccatgg ggcatggcga cacggtcgtc gtttccgaca + 94081 ccaattttcc ttcggattcc attgcccggc aaacggtgct cggcaagctg ttgcggatag + 94141 acaatgtgtc ggctgcccgc gccatcaaag ctattctttc ggtaatgccg ctggatacgc + 94201 cgctacagcc gtccgccggg cgaatggaaa tcatgggcgc gccggacgag atccctcccg + 94261 tgcaacagga agtccaggcc gttgtcgacg gcgccgaggg caaaccggcg ctaatgtatg + 94321 gcatcgagcg tttcgccttc tatgaggagg caaaaaaggc ctattgcgtc atcaccacgg + 94381 gcgaaaaccg cttctacggc tgcttcctct tcaccaaagg cgtcattccg ccggaaaccg + 94441 tttgagggga aacgataatg aaggtgggtg tttcgatcct cgggattttc gtcgtcggct + 94501 tctcggcagc gctgtcgcgg ggcgttactc cggtcgaggc cctgcgcttc ggctgcgcga + 94561 ccgccggtat cgcggtgacg cgacggggta ccgcgcctgc aatgccgaag aacgaggaaa + 94621 tcgaggcgct tctgcagaaa ggaggcgcag catgacccgc aacgatccgg tcaaacactt + 94681 cttcatctgg ccggcgctgc tgatcgtgct ggtgatctcg atcttcccgc tgatctattc + 94741 gctgaccacc agcttcatga gcctacggct cgtgccgccg atccccgcgc atttcgtcgg + 94801 cttcggcaac tatgcggaac tgcttcagaa ccctcgcttc tggagtgtca cctggacgac + 94861 gacgatcatc gcttttgttg cggtgtcgct gcagtatgtc atcggttttt ccgtggcgct + 94921 ggcgctcagc cgtcgggtgc ctggcgaagg tttgttccgg gtcagcttcc tcgtgccaat + 94981 gctggtggcg cccgtcgccg ttgcactcat tgcccgccag atcctcaacc cgacgatggg + 95041 ccctctcaac gagttgatga ccgccttcgg ctttcccaat ctgccgtttc tgacgcagac + 95101 cagatgggct atcggcgcca tcatttccgt cgaagtgtgg cagtggacac ccttcgtcat + 95161 cctcatgctt cttgccggcc tgcaaaccct gccggaggac gtctacgagg ctgccgcgct + 95221 tgaaaatgcc agcccctggc agcagttctg ggggatcact ttcccgatga tgctgccgat + 95281 ttcggtggct gtggtcttca tccgcctcat cgagagctac aagatcatcg atacggtgtt + 95341 cgtgatgacc ggcggaggac ctggcatttc gaccgaaacg ctcaccttgt tcgcctatca + 95401 ggagggcttc aagaagttca acctcggcta cacctccgcc ctgtccttcc tgttcctgat + 95461 cgtcattacc gtgatcgggc tcgtctatct cgccatcctg aagccctatc tggagaagca + 95521 caaatgagcg tgcgtgacct caaaggatcc ggccgctggt gggcgctcgc aggctgcctg + 95581 ctctggctgg ccttcacctt cttcccgctt tactgggtgg cgatcacctc gttcaaatcg + 95641 ccgctcggcg tcgtcggcgg gccgacctat gttccatttg tcgatttcga cccgacgttg + 95701 acagcctgga gcgagcttct gtcgggcgcg cgcggccagt tctataacac cttcatcgcc + 95761 tcgacgatcg tcgggctttc ggcatcggtg ctggccacct ttatcggatc aatggcagcc + 95821 tatgctctgg tgcgcttcac cttcgaggtc aggctgctgt ccggcgtgat tttcgtcgtc + 95881 gtcgccttcg gcggatatct gcttggccgc catgtgctgg gcttcggaca ggctatctcg + 95941 ctaatctacg cctttgtcgc agcgcttgca ctggccgtcg gctccagccg gatcaagctt + 96001 ccagggccgg tcctcggcaa tgacgatatc gtcttctggt tcgtgagcca gcgcatgttt + 96061 ccgccgatcg tcgccgcctt cgccttgttc ctgatgtata cggaaatggg caagatgggc + 96121 atcaagctgg tggataccta tacgggtctt accttcgcct acgtcgcttt ctcgctgccg + 96181 atcgtgatct ggctgatgcg cgatttcttc gcggcgctgc cagtcgaggt ggaggaagcg + 96241 gcgatggtcg acaacgtccc cacatggaga attttcttcg gtatcgtcct gcccatgtcc + 96301 aaaccgggcc tgatcgctac gttcatgatc acgctcgcct ttgtctggaa cgagttcctg + 96361 tttgctctct tcctgaccaa ttccaaatgg cagacactcc ccattctcgt cgcaggccag + 96421 aacagccaac gcggcgatga atggtgggcg atatcggcag ccgccttggt cgcgatcata + 96481 cccatggtcg tcatggcggg cattttgagc aggctgatgc ggtccggcct gcttttagga + 96541 gcaataaagt gaccgctcaa caaccctagt cattgttcaa ttccggggag gaaaacatga + 96601 gaagattgct attgagttca acggctgcgg cactgcttgc tgcggcgggc accacatccg + 96661 cgcttgcctg cgaaccggat tataccggtg tcacgctcac cgccacgacg cagacggggc + 96721 cctatatcgc ctctgcgctg caactcgcgg gcaagggttg ggaagaaaag acttgcggca + 96781 aggtgaacgt cgttgaattt ccctggtcgg aactctatcc gaagattgta acctcgttga + 96841 cctcgggcga agacacgttc gacgtggtcg ccttcgcacc ggcctgggca ccggacttta + 96901 ccgactatct ctcggaaatg ccgaaagcga tgcaatcggg tgccgactgg gaggacattg + 96961 ccccggttta ccgcgagcaa ctgatggtct ggaacggcaa ggtcctgtcg cagaccatgg + 97021 acggtgacgc ccatacctat acctaccgca ttgatctgtt tgaaaacgcg gaaaaccaga + 97081 gcgccttcaa ggcgaagtat ggctacgatc tagccccgcc gaagacatgg aagcagtatc + 97141 tcgacatcgc tgaattcttc cagcagccgg ataagggcct ctggggcacg gcggaagcct + 97201 tccgccgtgg tggccagcag ttctggttcc tgttcagcca tgttgccggt tacacaagcc + 97261 atccggataa tccgggcggc atgttcttcg atccggatac gatggatgcg caggtcaaca + 97321 atccgggctg ggttcgcggc cttgaggaat atatccgcgc ttcgaagctg gcgccgccga + 97381 atgcgttgaa cttctccttc ggcgaagtca atgccgcctt tgcgggcgga caggtcgcgg + 97441 aatcgatcgg ctggggcgac accggcgtca tcgccgccga cccgaaacag tcgaaggttg + 97501 ctggcaatgt cggttcggca tcgctgccgg gctcggacga aatctggaac tacaagacca + 97561 agaagtggga caagcaggcc gaggtcgtcc agacctcctt catggccttc ggtggctggc + 97621 aggcggctgt tccgtcgtcc tccaagaacc aggaggcggc ctggaactat atccacttcc + 97681 tgacgagccc ggcggtttcc ggtcaggcgg cgattaccgg cggcacaggc gtcaaccctt + 97741 accgcctttc gcatacgaca aataccaagt tgtggtcgaa gatcttctcc gagcgcgagg + 97801 ctaaggaata tctcggcgcg cagaaggacg ccgtcacagc gaaaaacacg gcgctcgata + 97861 tgcgcctgcc gggctacttc tcctacacgg aaattctcga gatcgagctt tccaaggcat + 97921 tggccggaga agtgacgccc cagcaggcgc tggataccgt ggctgacgga tggaacaagc + 97981 ttacggacga gttcggccgt gacaaacaac gggcagctta tcgctcgtcg atgggcctgc + 98041 ctgcgaagta ggccgtcttc aataaatccc atcccggctc tgcggggtgg gatcagcatc + 98101 gttggtagag gaagggacca tcgccgcttc gggtaggcga tggaattctc atgaaaaggt + 98161 ttcagcatgt cgcaggtgcg tttggatcag gtcaccaagt ccttcggtag cgttgcggtc + 98221 attcctccgc tcgatctggt gattgccgac aaggaattcg tggttctcgt cgggccttcc + 98281 ggttgtggga agacaaccac gctgcgaatg atcgcggggc tggaacaggc gacgtccgga + 98341 gaaatccgca tcggcgagcg agaggttact gcgttgcgtc cgggtctgcg caattgctcg + 98401 atggtgtttc agaattatgc cctctatccg catatgacgg tcgccgaaaa cattggctac + 98461 ggcatgaagg tgcgtggaac gccgaaagag gacatcgaca ctgctgttgc gaatgctgcg + 98521 cgcattctca atctcggcgc ctatctcaat cgcaaaccga gcgcgctttc gggcggtcaa + 98581 cgtcagcgcg tcgccatcgg gcgcgccatt gtacgccagc ccgatgtttt tctgttcgat + 98641 gaaccgctat ccaatctgga cgccaagttg cgcatcgaaa tgcgcaccga gatcaaactg + 98701 ctgcatcgcc gtctgcagac cacgatcgtc tatgtgacgc acgaccaggt ggaggcgatg + 98761 accatggccg accgggtcgt ggtgatgaac cagggccgga tcgaacaggc tgccgacccg + 98821 atcacgcttt atgaatcgcc gaagaacctc ttcgtcgccg ctttcatcgg cgcgcccagc + 98881 atgaatttcg ttcaaggacg gttggaggcc ggcgacggcg gcgttgtctt ccgggcggaa + 98941 ggcgatgtgg cgattgtcgt tcccgcacgc atggaggagc atctctcggc gggtattggc + 99001 caggctgtcg ttctcggtat tcggccagag cacacgatga cagcggacag cacctttccg + 99061 atgatccgcg tgcacgtcgc ggatatcgaa cctctcggcc cgcacacgct tgccatcggt + 99121 aaagctggtg cgagtgcgtt taccgctcag attcacgctt catccagggt cagaccggag + 99181 gacacgatcg atgttccgat cgatccggaa aagatgcatt tcttcttgaa aagtaccggt + 99241 gaggctctaa ggcgctgaac ctgcggggtg tcaccgttca gccaggtctg gtgacattgc + 99301 catctgtttc tctcaaagct gcagttccat aaacctgctt atgcaacgtt ataacattac + 99361 gaggtacatt ctatttccaa gtgcgacatg ccagtaattt caatggcttc actttggaga + 99421 ttttggcatg tctcgatgct atatgcagct tgccctcgcc gaccggcgac gcctacatca + 99481 gctggtggcg gcaaaagttc ccgtcaacga gatggcacgt cagcttggcc gtcatcgctc + 99541 gacgatttac cgcgagatca agcgcaatac gtttcacgac cgtgaactgc cggactacaa + 99601 cggttattac agcacggttg cgaacgacat tgcgcaggac cgacgtcgcc ggctgagaaa + 99661 gctgcggcgc cacccgacct tgcgcacaga gatcatcaac cagttggagg ctcgctggtc + 99721 gccggaacag attgccggac gcctgctgtc ggacggcctc agtcgtattc gcgtttgcaa + 99781 ggagacaatc tatcgcttca tctacagcaa ggaagattat gggcttggtc tctatcagta + 99841 tctaccggag gcacgccgca aacgtcgtgc aatgcgctcc aggaagccgc gcgacggtgc + 99901 gtttccagcc acgcaccgca tatcccaacg gcctgatttt gttggagata gatcgcggtt + 99961 tggccattgg gagggtgacc ttctcatctt cgaacgccca ctcggccatg ccaatatcac + 100021 cactctcgtc gagcgcaaga gccgctatac cgttctcatc aagaatccga gccggcactc + 100081 gcgcccaatc atggacaaga tcatcagagc gttttctcct ttgccggcgt tcgcgcgtca + 100141 aagcttcacg ttgaatcgag ggaccgagtt tgccgggttc agggctttgg aagaagggat + 100201 cggtgcgtgc agttggtttt gcgacccaag tgcgccttgg caaaaaggga cagtcgagaa + 100261 caccaacaag cgcattcgac gcttcctgcc aggcaccaca gatctggcag ttgtgtcgca + 100321 acgcgacctt ctccacctca cgcgtcatgt caacgatcaa ccgcgcaaat gcctcggata + 100381 caggacgccg accgaggtgt ttatggcgca tttgcacgaa gataggtgat cccctactct + 100441 gcaaaacagg cgtgatgcac ttgggttagc ttttccacga attgattcgt aatgtcggaa + 100501 gctgctattc gtcgtcggga gtctgcggtt tgactgccgc cagaaggcat tcctcgcagt + 100561 cgttcctgaa cctcgtcgcc tgcaggtgct tgccgtattc ctctcgagtt ttttcacccg + 100621 caatgacctg ctccgatgtc accccatccg gatcgcccag tattgtcgca agagccatcc + 100681 ggaatattgc ggccgggaag gacaggatca acgtcccgtt tttgagtcgg aagtcctctg + 100741 gttcgacggc atttccatga aggaagttgt tcctcaggtc attcatcttc atatagagcg + 100801 cattcgcgag ccggaggtcg aaggtctttc cgccgccgaa gtcagccgaa tgtgtctcct + 100861 ccctcgccat ctcgctctgc catggaatgg acttcagcag gtcgagcacc ctctgacggg + 100921 tcgccttgcc ggtcgctccg ggatgcacga ggatctcgaa cgcgctgacc cacagggaca + 100981 ggacgcgacc gtagtcgtag aaggattcgg tggaaccggt aggagacgac aggcgcatcg + 101041 cggacgcagc cattttcaga gaccgtagaa tagcgcgcga cgtccaatcg ttgtgcccgt + 101101 cgatgaaccg ggacttccat agcgcgagga gctgcttgaa gaaccgggga tgcgccggct + 101161 cgtttccgac gtcctggacc ggaacggccg ccgacgccgc gccgcggaac ttgtcgagtt + 101221 cgtggattgc ccacagcgcg gcgttaccgg cgacgagtct cttgccctgc gggtcgatca + 101281 tccacgggta gatctcgaag gcatcggtat gaaacgggcc gacgttcctg cgccatgtca + 101341 ccgccctcac tctggcatca agcacgacac aggcgacaac tatatccatg aagctcgaaa + 101401 ttgcctccga gttcgggcga tccggatagg catccgatct caccagcagg ctcggtctta + 101461 gccgctgacc gtggttgcct tcgaatttcg agagaaaggt ctggaggttt ggaccgttgg + 101521 cgagggcatc taggaagcga tcatcctgga agcccaccag cgccatgtgc tcgcctgcaa + 101581 agacgtccga catcatgtta atgttcggca ggacaccaag cgcggtccac ttcagtgcca + 101641 tctccatctc tcctgtaacg tttagcgacc gggaaggtac aaggctggcc aagattcctg + 101701 caaacgcgcg atgcgggccg tccacagaaa ctcgggctag aaggtcttcg gcattgcaag + 101761 tcaatagcgg cggccaggtt acccgccctt gcggaaatcg gcgtttccct ctggcggccc + 101821 cctccccgat gcagtatcga ggcgcgttcg cgctaggcaa gcgtcaggcc agcgacgctc + 101881 gctatcacga gcgaagaagg agcatcgcat gaagacagtc tacaccatcg gctacgaggg + 101941 caccgacatc gagcgtttcg tgaagacgct gaccgccgtc ggaatcgaag cggtagcaga + 102001 tgtccgggca gttcccctgt cacgcaagaa gggattctcc aagaacgctt tgcgggagca + 102061 tcttgaaaaa gccggcatca agtatctcgc gatgcagcag cttggcgatc cgaaagaagg + 102121 ccgtgaagcg gcgaaggcgg gcgactacga tcggttccgc tcgatctatt ccgggcatgt + 102181 tgaccttccc gaagtagccg cggcaattga agagctcgcg accgcctccg aggagcaggc + 102241 ggtctgcctc ctctgcttcg aacgcgatcc gaaaacctgc caccgattca tagtcggcga + 102301 acgcatgggc gctttcggat acgagatgac gcatctcttc ggcgacgatc ctgcacggta + 102361 catccgaaac caggaccggc ttccgaagcg gtcgtagctc tttagctata gcagcagccc + 102421 ctgggcaaga tgtggcggat agatgaggct gatgatcaac cactggtcct gaaaccggtg + 102481 ctggttgccc atgagaaaca tcaggtcctt cgctggaatc ttattttcga gttggtcccg + 102541 gaagtatttt tcccagtcgg ctccatggga gcgatgacag ttcaaataga gagcacccac + 102601 ctcccagtcc acaagcttgt gcctgaacat tttcgcgcct ggaccttcgc cgcactggta + 102661 gcggtagtaa aagtcgaagg gaagcttctg cagggtcctg atcgatggct tgtcttcatc + 102721 atcgaagaga ccgccctgct tttgctcttg cactagcttc gccagttcct cttcggtcca + 102781 ttcgggattc gagactggct cgatatccag accaaggatc cgcgccggct tcaacatggc + 102841 gagcgacata ccgctggccc gctgctcggc atctatctta tcgaaatgat cgtagacatg + 102901 gatgggcgag atcaggtgtc ttcgttcggc ccaatccttg cccggctgaa cgacggcttc + 102961 gccttcgatc gtgtcgacct tgatggtcaa gctttcctgc cgatgatctt tgcgcgcttt + 103021 ttcaaccttc gcacggatcc actgccactt cttaaactgt tgacccgtgg caataagacg + 103081 aaaaggaacc gggaagaggc ggaccagctt gccactctcg tccatcccag caacacacgt + 103141 cgtctccgca tactttccgc tcggcgacgg gtaggtcttg cagagaataa gaattcgagc + 103201 atcgacgacg gccatggatc tctccggggc gaatcaaccc agcttcaagg agttgcgcaa + 103261 tagcgcctgt agccagtgca ttggtcaatc ctgcgttgct tcgatctgag ctagaacgca + 103321 caaccttgtg tctatgcgcc cgtggccacg cccaaccgcg taggcttttt tgtttgacga + 103381 gatcgattaa gaaggagacg gcctcggccc tcttgaagcg atggtcatcg actggccccg + 103441 atccgcccca acgatgcgtc aaggaggctc cgccaaaaag actgggagtc ttcgacggct + 103501 tgagggaata tgtctcggac gaccaggcag cagcctgaag cggaacgccc ccgccgggcg + 103561 cagtcggtcc cggcttccac tgaccggcgc tatgctaagc ctcctgcggc tgcgctactt + 103621 ccggcagtgc cgtgtcgaga attcccgaaa tcccggccga gcggccgtcg ctatgctgat + 103681 ttctccgatg ttctacattt tttgtcccgg atctcggcct aacgccggtt gacccgggcc + 103741 ggctatggct tcgagcgcag tggcgccgcg gatattccac ggaagcgagt atttctcgat + 103801 caatccgagt tcaaggcccg cacttccgtt gaccggcagg ccattccatt cgagatccgg + 103861 aggcatggct atggagatct caatggcctc agggcctgac agctctgcca tgatgatggc + 103921 ttcgaggcgc tggttggttc gctgcgagat ccccgggcgc ccatatgaat agagcctgcc + 103981 cgcgagccct tttgttgcga cgccgacata gagcgcgacg ccggctttta cgaaggcata + 104041 gacacccttg gacttcggcg caggcgtttc gacagcaagg ctgtcccccg agatgatcca + 104101 gcgtccgacc cgtgtaaagc cgccgctgat caggatgtcg gtcgtgagcg cgggtttgct + 104161 ttgtgggcgc gtttccggtc gaccgatggg ggacggcttg gtcgccgctt ccaacgtccg + 104221 acggggaccg gcattgatga cattgtaggc atgctgatag cggatcccca atcgagcagc + 104281 gatgtccgcc gccttcaggc catccgccgc taatgagcgt atctgatctg cttttgtcat + 104341 gtcagttcct gttaaggccc atttcccggc tctgcagctt gtccttccag cgtcgttcga + 104401 gctccccgag gtccgcgatc gtcgcggtcg agccgaccgt ttaaggatcg atagctggta + 104461 gtcgctggcg tcgcgactct taagcccgac attaccgccg tggcccgttg caaagtattc + 104521 gcgccacctg ccgatgaagc cgtcgacacc cgacgccatg ccgacatatt gctcgcgcgt + 104581 tctcgggcag gtcagaagat agatgccgct cgtcgcagac agaggtgcct gccaggtcgc + 104641 gggaaggctc tcgatatcgg agagattggc gataaaggcc gcaaagccgg ggaatggagg + 104701 atcgcctaga gttcgccgca gttcgattag gcgcttcgcc ttaccgtcac cgcgctggat + 104761 ccaggaccta tagctatcgc cccagtcgat ccagagccgg ccggcgtatt cggaaagcga + 104821 aggcaggcga tcaacttcgt aaacgttgca gcttccggcg ggctctatgc cgccggtgat + 104881 cgggtgctgg cgatcggccg gcaatgggcc gagaagccgg gctgcgtata gcccaacgaa + 104941 caacgtttcc cgctcgggaa tgccgacaaa agaggcccag taggtcgcgc gcagtttcgg + 105001 ctcactggcg aaggattggg tcgcctggta tgcctcgaag cgttcgcgat ggtcacgcca + 105061 taaagcatag ggtgtcggga acccggcatg ccggttgtcc tgatgtcgaa gcagccggac + 105121 cgtgcttcgt tcaattcccg ccaggtccaa catgtttccg aatgccagca ttgttcgccc + 105181 tcacctcctc catgacatgg cttatcgtcg agcggtttgc aaacacatgc gtgtgcatac + 105241 acgcctctcc ctttcgtctg tgaaggacat ccgcaaatgt cccgaatgtc tggcttgtac + 105301 gcggctgagt gtccaactcc attcgaattc ggctgtggct gggcggcacc attcggaagc + 105361 acgccaaacc cgctctcttg cgatggcgtc ctggcccaga tccggatctt ccacgatcaa + 105421 cccgcaaggg gtccgctagc aagctgcggc cactcgggct gcaccctcgc ggtgatcgcg + 105481 cccgtcccag ggcctcttga ggagccatcg agcgggaatt ggcccgctcc caaatggagc + 105541 ctctcacatg tcgcacgatc tttctctcgc acagtcccac gccttccagc tctcccgcga + 105601 cctgatggtc cccgtcaccg tcttcgaagt cgacggggag tatggcgtcc tgccctctga + 105661 cgagatcgac gccgatgatg acctttcggt catccatgaa tttcatccgt ggccggctca + 105721 ctgagccggc aattgccgtt ccgagtgccg ctggcgacct atcggcggca agggaagctt + 105781 cgccgcgggc ggtgcaacct tgcaccttgc ttcccgcgcc cttgccccct ctgctcttcg + 105841 cggcggctgg aagatgtccc gccgatcatg cggggagtta aattgaggaa cgatcgcagg + 105901 cgatcggaaa aatcatgaaa agagaagaaa tcgagaaact acgcgaggcg gtgagttgtg + 105961 ctgccgtcct cgaacaggct gggttcgcgg tcgatgtgaa ggaaagcacg cgccgggcgg + 106021 tgaagtttcg tcgcggagcg gaaatcatca tcgtcaccca cgaaggtcgc ggctggtttg + 106081 atcccttgag cgacgacaaa ggcgacgttt tcgctcttac ttgcttgctt cagcatttag + 106141 gattttcaga ggctgtcgac cgtgttggcg atctcatcgg cttcacagct gcacccgtca + 106201 tctggaagaa gccgccgtcc aaggtcgagc ctgcggatat tctgacacga tggcaaggtc + 106261 gcggtcttcc agctactgga tccggcgtct ggaggtatct ctgctggtcg cgtgcgatcc + 106321 cgatctccat cctgcgctcg gccatcaatc aagggatcgt tcgagaaggt ccattcggca + 106381 gcatgtgggc cgcccatagc gatggcgccg gattggtggt tggatgggaa gagcgcggac + 106441 ctgactggcg cggcttttcg acgggtggca gcaaagttct gtttcgcctg ggtgcgcctg + 106501 acgctctccg tctctgcgtg acggaggctg ccatcgacgc gatgagcctc gccacgatcg + 106561 aagatctgca ggacggaagt ctttacctca gtaccggtgg cggatggtcg cccagaaccg + 106621 aagcggcgct ggtcgacctt ctcgcttgcc ctggcactca cctcgtctgt gccaccgatg + 106681 ccaatagcca aggcgatgcg ttcgcacgtc gtcttcaggc actggccgca caggtcgatc + 106741 gcccttcggt ccgtctccgg ccgccggcgg aggattggaa tgaagtcctg caggagagaa + 106801 ggagagagaa attgaagagc gaaggaagag agaggcgtgc cgcatcgccg ccgaccgcat + 106861 caagggaggc ttcgcccggc taaagccggc ccttgacgcg gccgatcggg atgccggcgg + 106921 cccggaaggt gtcatgaggg actaaaacag aggtgatgga gtggccatcg ccaggtcccg + 106981 aaacctgaag gagccagaga tgaacacccc tgcccccatc cgcaagattt tcgagggcgt + 107041 cgcaacgcgc ccgcaaatgt tccggctttt cgatcgccat agtcagcgcc ccgaccggtg + 107101 gcggagcgat gctgctccgc tttacagcgg agaatggttc gaacttgatg aggcactcta + 107161 cgattacatg ctcaatatcc tgccgccatt gtggatgtgc gggccgatct tcgcgttgcg + 107221 cgaattccta acgggctcga ccaccagcat cttccttgcg ctgcggatcg acggaaagcc + 107281 tcggtacttc cacggctatt gcgacctgtc cgatcccact tcggtcgaga cgatgcgggc + 107341 gacgatcttc gagcgcgaaa cccagccggt tcacgccatg tcccgtgaag agctactaga + 107401 acatatctgg agcagcacta cgaatgccta tcgcggctat gccggcgatc ggtttccacc + 107461 ggtgatgcag gggcagcgca tggtcatgct gtggagcggc accaatggca cgcttctgaa + 107521 gctgctggac gaccttaccg atgatgagac cgccgccaaa ctgcccgttc atatgcggca + 107581 tctgcctgac atcgcggcct aggcgtcatc aatcccttca atccatcagg cgcttccagc + 107641 ccgctattgc ggcgggcgag cgatggcgcg catccgaaaa ggaaaactcc cgatgagcaa + 107701 cgatcctttc actctcgaca tgttcggtag ctctgcgctc tcttcaggcc ttgcgctagg + 107761 tgtcaccgca tttggcggct tcgacacagt ggcggccaat gacgacgatc ccgatcccac + 107821 tccgccggcg cccgccccgg ctttgccagt tgtgacttcg gcggcgtgcc cgaacagtca + 107881 gcgccagaat ttctatctgg acggcgaccg cggtctcggt gcctcatgga aggatcgtgc + 107941 tcgcgtcaac gtggcggcaa tcctggtcac ggagggcata gtgaagcagg agcggccggc + 108001 gaccgccaag gagcaggcac agatggtccg cttcactggc ttcggtgctg gcgagcttgc + 108061 caatggcatg ttccgccggc cgggcgaggt cgatttttgt gatggctggg acgcgctcgg + 108121 ttcttcgctc gaaaccgccg tgtcggaggc tgattatgcc tcgcttgccc gctgcaccca + 108181 atatgcccat ttcacgccgg aactgatcgt ccgggcgatc tgggctggga tccagcgtct + 108241 cggctggcgc gggggccggg tgcttgagcc cggtatcggg acagggctct tccctgccct + 108301 catcccacca gagtaccgcg acaccgccta tgtcaccggg atcgagctcg atccggtcac + 108361 ggcgcgcatc gtgcgccttc ttcagccgcg gtctcgcatc attgaaggcg acttcgcccg + 108421 cacggatctg gcgccgatct acgatctcgc catcggcaat ccgccctttt ccgatcgcac + 108481 cgtccgctcc gaccgggcct accggtcgct tggcttgcgg ctccacgatt atttcattgc + 108541 gcgctcgatc gacttgttga agccgggcgc actggccgcc ttcgtcacct cgcatggcac + 108601 gttggacaag gccgcgacga ccgcacgcga gcatatcgcc aaaaccgccg acctgatcgc + 108661 ggcgatccgc ctgccagaag gcagctttcg gcgcgacgcc ggcaccgatg tcgtcgttga + 108721 catcctcttc ttccgcaagc gcaaggcagg agaaccggaa ggcgatcaga tctggcttga + 108781 tgttgatgag gtgcggcctg ctgtcgacga tgagggtgcg atccgcgtga accgctggtt + 108841 tgcacgccat cccgacttcg tgctcggcac ccatgccctc acctctgggc cgtttggcga + 108901 gacgtacacc tgcgtcgccc gcgacggagc cgatctcgac accatcctcg acgccgccat + 108961 cgaacttctt ccggccgacg tctatgacgg cgagccgacc ccgattgata tcgatctgga + 109021 agatgagctc gccgagatcg tcgacctcag gccgaaggat agcccggtcc gcgaaggcag + 109081 cttcttcgtt gaccgggcga aggggctgat gcagatgctc gacggcacgg cggtagcggt + 109141 gaccgtccgg aagggccgcc ccggtgacgg gatctctgaa aagcatgttc ggatcatctc + 109201 gaaactggtc ccgatccgcg atgccgtccg tgagatactg aaggcccagg agacggaccg + 109261 gccgtggcgc gacctgcagg tgcggctccg cctcgcctgg tcggcattcg tccgcgactt + 109321 cggtccgatc aaccacacaa ccgtctcgat ccaggaggat ccagagaccg gcgaggtgaa + 109381 ggagacccat cgtcagccga acttgttgcc cttccgcgat gatcccgatt gctggctggt + 109441 tgcatcgatc gaggactatg atctggagac cgacacggcg aagccgggtc cgatcttcgc + 109501 cacccgtgtg atcgctccgc cgatgtcgcc tgtcatcacc aatgctgctg atgcgctggc + 109561 ggtcgtactc aacgagcgcg gccatgtcga tgtcgatcat atcgccgaac tcctccatcg + 109621 cgaaatctcc gcagtcatcg atgacctccg tgacacggtg tttcaagatc cggctgatgg + 109681 ttcctggaag actgccgatg cctatctctc cggatcggtc cgcaccaagc tggcggccgc + 109741 gcaggcagca gcggaacttg atccggtcta cgagcgcaat gttcgtgccc ttcaggcggt + 109801 ccaaccggcc gacctgcgtc cgtcggacat cacggcacgc ctcggcgcgc cttggatccc + 109861 ggccgcggat gtcgtcgcct tcgtcaagga aaggatggaa agcgatatcc gcatccacca + 109921 catgccggag cttagctcct ggacggtgga agcgcgccag ctcggctatt ctgcagccgg + 109981 cacatccgaa tggggcacgg gccgccggca tgccggcgag ctgctcgccg atgcgttaaa + 110041 cagccgcgtc ccgcagatct tcgatgtgtt caaggatgtc gatggcgagc gccgggtgct + 110101 gaacgtcgtc gataccgagg cggcgcgcga caagttgcag aagatcaagc aggcgtttca + 110161 agactgggtt tggaccgatc cggaccgcac cgaccggttg gcccgcgatt acaacgaccg + 110221 cttcaacaac attgcgccac gcaaattcga cggctcccat ctgaaacttc ccggcgcctc + 110281 tggcgccttc gttctttatg ggcaccagaa acgcggcatc tggcggatca tcgccgatgg + 110341 ctcgacctat ctcgcccacg ccgtcggcgc cggcaagacc atgacgatgg cggccgcgat + 110401 catggagcaa cgccggctgg ggctgatcgc caaggcgatg ctggttgtcc ccggccattg + 110461 cctggcgcag gcggcgcgcg aattcctggc tctttacccc aacgcgcgca ttctcgtggc + 110521 cgacgagacc aatttcacca aggacaagcg cgctcgcttc ctgtcgcggg cagcgaccgc + 110581 cacttgggac gcgatcatca tcacccattc ggcgtttcgc ttcatcgccg tgccgtcggc + 110641 cttcgagcag gagatgatcc aggacgagct gcagctatac gaggatctgc tgaccaaggt + 110701 cgacagcgag gaccgcgtct cgcgtaagcg gcttgagcgg ttgaaagaag gaatgaagga + 110761 gcggctcgaa ggtctcgcta cccgcaagga cgatcttctt acgatctcgg agatcggtgt + 110821 ggatcagatc gtcgtcgacg aggcgcagga attccgcaag ctctccttcg ccaccaatat + 110881 gtcgacgctg aagggcatcg atccgaacgg ctcgcagcgc gcctgggatc tctatgtgaa + 110941 gtcccgctac atcgagacga aaaatcccgg ccgggcgctg gtgctggcct ccggcacgcc + 111001 gatcaccaac acgctaggcg aaatgttctc gatccagaga ctgcttggcc acgcggctct + 111061 tttcgagcgc gggctgcatg aattcgatgc ctgggcgtcc tgcttcggcg atacgacgac + 111121 cgaactggaa atccaaccat ccggcaaata caagccggtc agtcgctttg cgtcgttcgt + 111181 caacgtcccc gagctgatcg cgatgttccg ctcgtttgcc gatgtggtca tgccggatga + 111241 ccttcggcag tatgtgaggg tgcccgacat ctcgaccggc cggcggcaga tcatgaccgc + 111301 gaagccgacg gcgctgttca agacctatca gcagacgctc ggcagcagga tcaagatgat + 111361 cgaacagcgc gagggtccgg ccaaacccgg tgacgacatt cttctatccg tcatcactga + 111421 cggccgccac gccgctatcg atctgcggtt cgtcatgccg gcggctggga acgaggacga + 111481 taacaagctg aatctcctgg tccgcaatgc gcaccggatc tggaaggaga ccggtgatgc + 111541 ggtctaccgg cgccccgatg gcaaggattt cgaactgccg ggggccgcgc agatgatctt + 111601 ctccgatctc ggcactatga atgtcgagaa gacccggggg ttttcggcct accggttcat + 111661 ccgcgacgag ctgatccggc tcggcgtacc cgcggcggag atcgccttca tgcaggacta + 111721 caagaagacc gaggccaagc aacggctgtt cggcgatgtc cgcgccggca aggtgcgttt + 111781 cctgattgga tcctccgaga ccatgggcac cggcgtcaat gcccagctcc ggctgaaagc + 111841 gctccaccat ctcgatgtcc cgtggctgcc gtcgcagatc gaacagcgcg aaggacgcat + 111901 cgttcgacag ggaaaccagc atgacgaggt cgacatcttc gcctatgcca ccgagggaag + 111961 tctcgacgct tccatgtggc agaacaacga acgcaaggcg cggtttatcg ccgctgccct + 112021 ttcaggcgac acgtcgatcc gcaggctcga agatgtaggc gaaggcgccg ccaaccagtt + 112081 cgccatggcc aaggccatcg cgtccggcga cgagcgattg atgcagaaag ccggcctgga + 112141 agccgacatt gcccggctcg aacggctccg cgccgcccac gaagacgatc agtatgccgt + 112201 ccgcgggcag atgcgcgatg ccgaacgcga gatcgagatc tcgacccgcc gcatcggtga + 112261 ggtcggccag gacctcgagc ggctccagcc cacgtcaggc gacgccttta cgatgacggt + 112321 gctgggagag agccataccg agcgcaagga ggccggccgc tcactgatga aagagatcct + 112381 cacgcttctg cagctccagc acgagggcga ggttcatctg gcgacgatcg gcgggttcga + 112441 tctcgtctat gagggcgagc ggttcggcag gggcgatggc tatcgctaca agacgctcat + 112501 ccagcgcagc ggcgccgact acgagatcga gcttgctatc actgtcaccc cacttggagc + 112561 gatctcccgg ctcgagcatg ggctcgacgg gttcgaggag gaacaacggc gctatcgcca + 112621 gcgcctggat gacgccgagc gacgcctgac gtcctatcgt tcccgcacgg gcggaacctt + 112681 ccagttcgcc gatgagctct ccgagaagcg ccgtcttctc cttgggatcg aagacgaact + 112741 ggcggccgcc gccgtggatg acggtgccca ggaggccgct tgaaactgcg gccggaaccg + 112801 gaggtcttca ggtctcgcct ggttccggtc gtgctcgcct gcgcgcgagg cggccgcgtt + 112861 gctcggcctc ctgcagggcg agtctgccga ggtagcgctc ccgcatctgc ttcatatcga + 112921 tctcgccgcg aacccattcg tcgagaagcg cgatgaaggc cggattttca tcgatgggca + 112981 cgccatggcc tttcgccctg tcgatggccc gctgcgcgat ggatttccgg gtgttggggt + 113041 ccgtgctcat cttgcctcga cttgggagtt atcggatggg cggcgacatc tatcaggggg + 113101 tcgctcacgc cgcgacttga aatctgtggc tatcaacaat atctgagcat cacttcccgt + 113161 ttcgatgatg tcctgccgcc gggcaaccgc cgcggctttc gatatcgtcc gggatcagag + 113221 aaaagggcgc tccccgcacg ggtcgagcgc ccttttcgct tcggccgcta agtcgatgtg + 113281 gatacgagat acggatttcg ttttcgcttt tggcgccttg accgatgcgc gcgcgatgag + 113341 acctgaagaa taacactgcg ggaacccatg cctttttatc ttgtaatcca gacctctctc + 113401 attgaagccg acgacgagga ggccgctgct cgaattgcgg tcgaccagat ccgatccggc + 113461 aacaaagtcg ccgtcactgt gaagtccgac gaaacgaccg tctcgcacat cgtcgttgct + 113521 gcgaaaccgg ctatctcact cgcagatccg gttgcggatc ctgaggatgg cgggccggtt + 113581 ccggccactc accctgcccc aacgtccgcg gtcgaggccg atcgaaaggc gatgctgaag + 113641 aggatagtgg ctgatgcgtt ctcgcttctg aaacggcgcc cctaagtctt tcagacgggc + 113701 cgcgtctata cgcagatctg gcgctgcatg atgaaccgga aatcgctgcg atgtaaatta + 113761 gaaaaggggc gaagtatagg aaagaaaatg gaaccggtcg cagatcggtc ctcctgccgc + 113821 cgcttccgct caaccgccgc ctgtgcgatc ccggccggtt gtccttcgca gggcagatgc + 113881 cccgctcgtc ctgcccagca gggatgctcg gccgcagttg cggcagttcg gccgaccccg + 113941 cggtcccgga ggtgtcttcg agaaaaatga agacaggaag ggctggcaag gcgccggctc + 114001 gtaaccctcc cgaaaggaca gtcccatgca gatcaagacg atcgatcccc gcgctctgaa + 114061 ggaaaacccc gaccgcatgc gtcagacgaa atcgtcgcca caggccgatg cgctgatgct + 114121 ggcgacgatc aaagccgtcg gtatcgtcca gccgcccatc gttgcgcccg aagcggatgg + 114181 tggcaatggc tacatcatcg atgccggcca tcgccgagtg cgcctcgcga tcgctgcggg + 114241 cctcggggaa atcgagatcc tcgtggtgga tgctgcgaac gacaacggcg ccatgcgctc + 114301 gatggtcgaa aacagcgtgc gcgaagcgct aaacccggtc gaccaatggc ggggtatcga + 114361 gcggctcgtc gcactcggct ggaccgagga agcgatcgcc gtcgcgctgg cgctccccgt + 114421 tcgccagatc cgcaagctac gcctgcttgc aaacgttctg ccggcgatgc tcgagcagat + 114481 ggcgctgggc gatatgccct ccgagcagca gttgcggatc attgctgccg ccggtgaggt + 114541 cgaacagaag gaagtctgga aggcgcacaa gccgaagaag ggcgacaccg ctgcctggtg + 114601 gtcgatcgcc aatgccctga ccaagaagcg catgtatgcc aaggatgcca gcttcggcga + 114661 tgatctgcgc gaggcctatg gcatcgaatg ggtcgaagat ctgttcgcgc cggccgacca + 114721 ggacagccgc tacaccacca acgtcgaagg ttttctcggc gcccagcacg aatggatgac + 114781 gtccaatctg ccgaagcgcg gcgcgatcgt cgaggtcaac agctggggtc agccggaact + 114841 gccgaagaag gcaagccagg tctacggcaa gccctcgaaa tccgacgaca caggccttta + 114901 tctcaatcgc gacggcaagg ttcagaccgt ctactaccgc atgcccgagg cggcgaagcc + 114961 gaagggcgac agtgcaaatg gtgccgagag cgccatcaat gacgataccg gctcaacacc + 115021 gaaggcacgc ccggatgtta cccagaaggg tcaggatatg attggcgatt tccgcaccga + 115081 cgccctgcac gatgccctcg gccgcgctct gatcgaagac gatatgctga tggcgctgct + 115141 cgtgcttgcc ttcgcgggac agaacgttcg cgtcgactcc ggcgcgggcg gcaaccttta + 115201 cggcggtgcg cgcttcggcc gccatgccgc tcgcctcttc accgaggatg gcaagctgtc + 115261 tttcgacatg gatacagttc gcggcgctgc acgtgcagcc ctgatcgacg tgctctcatg + 115321 ccgtcgcggc atgtcgaaca gcggcgtcgt gtcccgtgtc gccggccagg cgatcggtgc + 115381 ggacagctac ctgccgaaca tgggcatgga ggatttcctc gcatccctat cgcgtcgggc + 115441 ccttgaaacc gtcgccaagg atgccggtgt cgagccgcgt gcgcgtgtcc gcgagacccg + 115501 ctcagctctc gtcacccact tcgaaggcga tacgaacctc gtccatccgt ctgctctgtt + 115561 cgcgccggag ccgacggagg ttctcgccct cctcaagcat cttgatcacg aagcgacccc + 115621 gcaggatgag ggcgagaccg acgatggcac tgacggcggt atcgtcgcta atgacgggca + 115681 tggagaggtt gatgtctccg acatggtcga agcggatgag ggtccgggca cgctcgacga + 115741 ccacgaaacg gcctacggga tcgcggccga gtaatcccac cagtcctttc caaatcttcc + 115801 gctccgccgc cggtcatccc ggcggcggtt tgcttttcag gaccttcaaa caaaggagcc + 115861 cgaccatgaa cggagcatca tttacctccg ccacagaccc agtatccatc tcctcagccg + 115921 ccgatggcgt cgagttcggg acaagcgccg acggatttcc tgttgcacgc atcggcgaaa + 115981 tcctcctcgg cctgatctcc aacggaagcg gcgacttctt tctggccagt gcttggcgca + 116041 tcaccaagcc gttagccgaa gtgcggcgtc accattttta tcgccacgat gggcgcgtga + 116101 aggacgaagc ggcgttccgt ctccgcgcaa tcgagaccgc ggagcacatg cgggaacttt + 116161 cagccttctc ccgcatccag acacgcatgt cggcaagcac gccatggggc ggctcgcagc + 116221 tggcgacgat ctatgccgag ggaattgtca gccactcgac ctcaggacat ggcggctttc + 116281 atctttctcc cgatcgaaat cttcaggtcg atgcctcggt tcgcagcgcg ggcggttggt + 116341 acgaggaaga tagcgaatgg gcgatcgtcg ccttgacgtt cccggatctc ttcaccggct + 116401 acgagcgcca gtgcgccaat gaggccgcgc gcaacacctt cccggattat tgggagaagc + 116461 ttcgcggccg tcagctcagc gccggggaat cctggttgaa ggatagcgct gagtttgatc + 116521 gcgtgcatgc tgacgactgg atcgtcatct ctgcaatcat ctcctctcac cactccggca + 116581 tgactgaggt gttcgccaag agaggcggca atcgcgaacc ccagcgagag gagcgtcgct + 116641 ttctcgttcc tcatgaagaa tatggtaggc gcggtcgctt tggctttgtc atcgatctcg + 116701 cgcgtcatgc cgcctatgat ggtccgtcga gcttcgtcgg ttggagcgcg agggcggcat + 116761 gatggcaccg gtaatgtcac ccgaaaccca gctctctcgc atggaggacg cccgccgtca + 116821 gacccagagg cagttggaac tcattgacag gcagatcacc cgccggatga cggcaatcct + 116881 gccgaagctt gcgaggcgcc agactggcta ccatcgcgga aaggcgcccg atggccgcac + 116941 cctgctcgag cgctatcgcg ccaacctagc cggcctcaca gcagagcgtc agccagaggc + 117001 cgaagcccta tcaagaaagc tggctcgaca ggacgctgcg atcgcagcgc tacgcgaccg + 117061 tttatcttct gccggcccac actcgagtcc ggagggctga tgacatggca acaataggtg + 117121 acctcgagcg gaacgccggg atcggatctt ccaatgctga gcgaactgcg ttctggctgc + 117181 ggtttcatca tctcgaaggg aaagcatgcc ttgacgcggg cgtggccgag ctcaagcgca + 117241 tgatcgcaga gcgcaacggg accgaattgc gtgctgccaa acacaggcgc cagcaatggc + 117301 cagccccgag cgacgatcag gaagctgcac tacaagcgta tgccgccagg catggacgac + 117361 gctggaagag catcttcagc gatgtctgga tggggggtgg accgccttat gacgatggcg + 117421 ggatcttgcg cggtctccgt aacacccatg gtcccacatg gcttcaatcc taccggttgc + 117481 cgaaggcagt attgcggagc cagtccgacg ggaacgcagc ggtctcgcat gtgggcatcg + 117541 gcaaaagcga ggagtagaac gggctggagc catatccttg tcgtgtccgg tgatggcacc + 117601 gtgacggttt gtccttcggt tttgctgcgg tctgagccag cggccgtcca cttcaaggcc + 117661 gctatcgcgc cgcggggcgg ccggaacccg ccgtcctgcg cggagcagtt ccgctcggcc + 117721 tcgcagggcg gttgccctgc cggctcgcag atcggctccg cttggcctgg cacgcccgga + 117781 tcttgaagcg cccgtcttcc gccggttcgg gtcgaccgca ccgaagggca agccggcacg + 117841 gggccggaac cacttcggct cgaaaggaag agacatggcc aagcccgtaa ccactcgcca + 117901 gaccgcccgg gtcgtccagc tccgcaaggg cgccaccgtc gaaatggtcc gcctcacctg + 117961 cccagacagc gcccaggcga tcaagatcgc cgaaagcttc gggaccgccg tcatcgacag + 118021 cgaggggatc cgcgacctcc acgaacggct catcaccgag accgcggatg ctctcagcga + 118081 aggcctcggt gaccgggcca tgcagatcca cctgcagcgc atcgtcggcg cctatgtcgg + 118141 ctccgcccac ggcgccggcc aattctacag caatgccgtc acccaggcac gcgatgccac + 118201 ggccaaggcg gcaaacgatg cccgcgacga agacctcgac ggtcccgtcg gctacgatag + 118261 cgccgcccac cgcaagcggg aattcgccgc cgacatgggc atccaggctc acgcgttacg + 118321 gatggcagcc gaaggtgccg tcgccgccta caagcatatc gtcggtgaga gctggaagcc + 118381 gttcgatcgc ccagtcgaaa accccggaca ttctgtggat cgcaaggcgg ccgcggccca + 118441 gatgtccgct ttcgactgac accatgcggc ggggcccgcc cccccttatc gttagaggcc + 118501 gtctctttgg ggcggccttt tttcatgccg gctctcgatg gagggaaagc ggaaatagga + 118561 atttgatcga gccgttgcgg cccgtctcgt gcgtcggacc tgcaggagcc gcccacgtca + 118621 ggcaacggct tcgccgtcct ccacttcgtt tcggccgttc cggtgcaagc cgtcggctca + 118681 tggctcctga ccttggcctg cgcgacgggg ccgcgattgg cgcggctccg aatagtggag + 118741 atgagaaatg agcaagaagg tcgaaagcca gcgcaccgac atctattccc gcattaccga + 118801 tcggatcctc gaggacctcg caagcggggt ccgcccctgg atgaagccct ggaatgcggc + 118861 gaatacggat ggccggatca cccgcccgct ccgccacaac ggccagccct attcgggcat + 118921 gaacgttctc ctgctctggt cagagcagat gtcgcgtggc tttgcttcgt cgatgtggat + 118981 gacgttcaag caggcgttgg aactcgaggc cgccgttcga aagggcgaga ccggatcgac + 119041 gatcgtgttc gcgagccggt tcaccaaatc cgaagcggac ggaaaaggtg gtgaggtcga + 119101 tcgggaaatt cccttcctga aggcctattc cgtgttcaac gtcgaacaga tcgatggtct + 119161 gcccgaccac tattattacc ggccggcgcc agcccaggac catgttgagc gtatcgagca + 119221 ggcggaccgg ttcttccgga ataccggcgc ggtgatccgc catggtggca atcaggcatt + 119281 ctatgcgcca ggtcctgatc tcatccagat gccgcccttc gagacgttta aggatgcggc + 119341 gagcttttat gctacgctca gccacgaagc gacgcactgg actgccgccg agaaccgcgt + 119401 cggccgagac ctgtcacgtt atgccaaaga caggagcgaa cgtgcccgcg aagagctgat + 119461 tgccgagctt ggcagttgtt tcctttgcgc ggacctcggg atagcccctg aactcgagcc + 119521 gcggcctgat cacgcgtcgt accttcagtc ctggttgaag gtgctggccg acgacaagcg + 119581 ggcgatcttc caggcggctg cccatgcgca gcgcgcaacg gtcttccttc acgggctaca + 119641 gccggaggcg gccaactttc gggacgctgc ttgagtgctt gaacgcttca ggtaggctta + 119701 cggtcggtcg ccgctctcgg tggtggccgg ctctgtcaac gggtcagcgg gagccagatc + 119761 ttccgtagtg ctcaagtcta accttcccca gatcgacggc ttaggagcgg aaggcttcag + 119821 cttccgggac tgcttgattt cgacgataaa cggtttcgtc tgctttgcca taaatcctcc + 119881 aggcgatgat tcgcggggca actctatcgc ggcaagttcg tctcgcaagc ggaccttgat + 119941 cggacgtcat gcgaagaatg tgcatatcgt tgtggtagct atgtcagatt gctgaccagc + 120001 tgtggcagtt cacctcgatt ccccggtgcg ggcccgccat tgccatatcc ctttcccttg + 120061 acgacagggc tccattcgcg ggctcgatga ggcatgtctg gccggagagc ggctacgcct + 120121 cgctcgtatt cccgcaacgg cgctccgaaa ctttcatccc ctgccggaac ccaccccacg + 120181 cgacaggtcg agtggggacc ccaggtcgcc gtcattcctc gcgaaacaag aaagtgtctc + 120241 cgcgccgccc tccacgttgt tccggcccta cggatgcggt ccgatcgtcc ccggcctttg + 120301 cgactgccat cgaggccgca atagagcggc ccgaaacagc gaaaaggaat acgacaatgg + 120361 ctactatcgg caccttcacc tctaccgaaa acggcttcac cggctcgatc cgcaccctcg + 120421 cgctcaacgt caaggcccgc atcgcccgca tcgaaaatcc ctccgacaag ggcccgcagt + 120481 tccgcatctt cgcgggagcc gttgaactcg gcgccgcctg gcagaagcgc tcggaacaga + 120541 ccgaccgcga ctacctctcg gttaagctcg atgacccgag cttccccgcc cccatctacg + 120601 caacgctctc cgaggtcgaa ggcgaagatg gctaccagct gatctggtcc cggccgaacc + 120661 gggactgaga actccggccc cgcccgaaag ggcggggtct ttccctcggc aaatatctaa + 120721 agccggatca ggcttcgagc ctggtccggc ttttttggtg gcacacggga gggagggata + 120781 gctgctcaga aaccgggagt gtgccgctcg cgcatgccgg cctcaaggat gagcggttcc + 120841 cccaattttc tgcagccgcc tgtggcgcct tccgaaaatc ggctccccca ctccccggcc + 120901 ctgccggtcg cttgcgcgat ccctgacccc gccatcccct cccggccgct gtcgtcgtct + 120961 ttcacatcca aggagatcag acgatgacct acgacattga aatccagatc gaagaactgc + 121021 ggtccgagct cagggagtgc atcgatcccg acgaacgtcg cgagatcgag gtcgaactcg + 121081 agctggcgcg cgccgagctg gccgtcatca ttgccgagaa ggatggctgg ctcaaggcgg + 121141 agcccccctt ctgagggggc ttttgccgca tccgtgagct tggtcagccg gatcgctcgg + 121201 ccgaccaagc tgtcagggat ttcacccctc gacactcccc gccggctcac gccggaatcg + 121261 gtcgcgtccc cttcgggtcc gcgggtctct tctctctctg acgtcctatt ctgctccagc + 121321 aaatccggcg tcaaggacgg cgcggtctcc cccaattttt ccgccgctgc gttgcactgc + 121381 gctctggaaa aatcggttcc cccgcttgca agccgcttcg cggtccttga cccctcattg + 121441 caacagagcg gccgatctcc gtcatcaaga tgaaggagat ccactatgac atccgatcag + 121501 aacctcatgc tctacgccaa gctcgtcggt ttccggctcg tcgtcctcgc agatcgggtc + 121561 ggctgcgaca ccgatttctt gcaagagctt catgaccgcc tggttgaagg acttgaggcg + 121621 gcaatcgctc gtatccagac catcatggca ttggaacgaa gcgttctcac cggcgacgaa + 121681 gccgcctatc agctggacgg cgagaccgag attttcggac gctgcgccat cagcttgttg + 121741 gacgacctcg aaatcgattt cgacacgcat gaatatcgca ttaacggcag cgactggatc + 121801 aacgccttga cggcggatta cagcggcgtc gatatcgatt gtccggaatt ggttgcgttg + 121861 acggaggacg agctcggctc gctggcacaa atcgtaaagg atatcacacg ggaaaccgga + 121921 atccccgttc atgcggctcg tgccgtctag ggccgatggc atcgccagaa ccggatcaaa + 121981 tcttagacgc ggagaggtgg caaaggctgc ctctcaattg ggaatggttc aggccgatcg + 122041 cgtgagcgac aagttcgcag ggtctgcgct tgactgtaag gataacggtg cgagcgttca + 122101 taatgcctgc gacgccggtg agcgtcgcca gatccaggcc gagctggatg tcgctcaggc + 122161 ggagcttgcg gcggccgtcg acctgcaggg caacatcttc gattgggagc cccctcactg + 122221 agggcaaccg atagcgccgc gccattatcg accgtcgcgg ttctcctgtg atcccatatc + 122281 atcgacatca gggatggtgc gtctcgagcg tcgtgccgca acctgatgcc gccagaattt + 122341 tccccgcctt cggcttcctc gcgcagcaaa attcagtcgt tctcaggtcc tccgcttcgc + 122401 tgcggtcgtg ccgatgcagc actccttccc gaggcacatc tcatccccgc gatgaaacgc + 122461 aatcacagga gaaagatcat ccaacagctc gccaaattcc tcgccgctac cggccgccgc + 122521 ctgcgcactc tcggcaaggt cgtcggtcac ttcgtccgca agggtaaact cggcctgaag + 122581 gtcgcgatca agatcccctt cttcgtagac atcgaggtca acttcgagac cgactggaac + 122641 cggcgcccgt aaggcgcctc ggcccgcttt ggcgggccgg atcctcagca ccagggctcg + 122701 cgccggcccg accaggtccg cgccaggcct tccgacacca gccgatcgcc gaggcttcgt + 122761 ccttcccgga cgagaacccg gagcttgcgg ccgtaacgat ccgtatcgcg gcctggccac + 122821 gcctggagtt ggaatggtcc ctcgttgacg agctcgatca accgctctgt tgccttgttg + 122881 ccaagtgtca gttccgacgc gcatttcggg gtgctgatct ccggggcgtc gatatcggca + 122941 actctcacct tctggccgta gatccacagc gtatcaccat cgacgacgca gttgtaacgg + 123001 gcgccggacg cacatttggg gtacgttgcg ggtagaccag attgagccgc ttcaacatgc + 123061 gggacaatca agattgcgcc tagcagaaca gcggcggaaa tgacgcctgt cagaattgat + 123121 ttcacctgat tgccccaagc tcgatgacgg tagatggagc gcaactatca gattcgcggg + 123181 ttgcagctga ccgcgttcga tagcctcgag cgcgcttctg tcgggtcagg tcgaggaaga + 123241 gggcaaccgc atcctgctcc cgctcgaagt ggtgggtcaa tgtctggccc ttgctgccga + 123301 tccgtcccca tcgccgggtc aggcaaagtt caccgaacag gttggcctcg atcgacatgg + 123361 catagtagcg agccatgttc ttggcgcggt ccttgcgttc gacatagagc tggtagggct + 123421 gtgcgatcat gactgagaat cgcgcgtacg gagtccgcgg tccaacgaca attatgaatc + 123481 ggtttcgcgt gatcgattca atctggtgaa gcgttcgaaa cgttaggtct tgggcgatcc + 123541 gccttcggcg tacggctcta ttcgccacgg cgaacgaagc ccgcaagcgg tcttcgccca + 123601 tttcggtcac gatcttctcg cggtcaccgc ggcattgatc agtacccgag atagccgagg + 123661 atatccgcac cataatattc cccctcctcg tcggcatgag caatgacctc accagcagct + 123721 tcatcgacca atgccttgtc accttctcgg acggcggccg ccacttcgtg gatgcggcat + 123781 tcctcgatcg cgtcttcagc cgaaacctga gcctcacagg cttcccaata tcgcattgat + 123841 ggatcctccc gcgtttagtt cggttcggcc gaagctgaga cttctggacg atcaccgcgc + 123901 tgttgaaacc tgttttcgcc gacacatgac ttcatgtccg accggcggaa ccagatcgcg + 123961 cggccgcatc gcaatggcgc gttgcccgcg gtgaagacga tctgctcgtc agcacgcatg + 124021 cgaaggactt cgtgcggttg gatcaactgc cttgaagcga gctgtttcga gcgggttcgt + 124081 gacgacccgc gcgattggaa actgcggctg acctggtcga tctcgactgt cgtcgtgcca + 124141 catcgacggg agatatagtc tgcggtttcc ggatcgttga ttgccgcaaa cgagatccag + 124201 ctggcgcttt cgaaccattt gctggacgcg tcacgaccgc cgtaggtttc ccgcagctgg + 124261 ccgatcgact gatagatcat cgtgagcgta atgccatatt tccggccggc gtcgcgcgcc + 124321 gtctccagca cccgcatgta gccgagcctc gccacctcat cgaggagaaa cagtgcccgg + 124381 cctttgatcg ccccgtcgcg atggtagatc gcgttgagga aggagccgat gatgacgcgc + 124441 gccaggccgg catgggtttc caaagccttg aggtcgatat tgatgaagac gtcggtttca + 124501 ccatcggcca gcgcctcggt ggagaagctg gaaccggaaa cgagggcgcc gtagttcgca + 124561 tagcttagcc aatgcgtttc cttgatcgca ttggcgtaga cgccagagaa ggtctccggc + 124621 gtcatgttga cgaaagcggc gacattctcc ttcacgaaat cggaagcgga attgtcgtag + 124681 atctcctgca ggcgcgcgcg gagcttaggt tccggctctg agagattggc ccgcacctgt + 124741 cgcagcgtct ggtgcttctc atcggtatgg ccggacaggc agacgtcagc gatcatggcg + 124801 gtgagcagct gaagaccgga cgcgcggaag aagtcatcgc ggacgccggt cgcgcgaccg + 124861 ctgtcgctca tgatccacga cgcgacggct gcgatgtcct cttccttggt gccgccgtgg + 124921 cgaccgatcc agtcgagcgc gttgaagccg gattgggcgt tcttcggatc gagaacgata + 124981 acgcgccggc ctgccttacg ccgatgctcc atcaccatcg gcgccacctc gttggacgga + 125041 tcgagcacga cgagagagcc gccccatttg agcgccgtcg ggatcgtaac agaggtcgtc + 125101 ttgaaacctc ctgaaccggc gaaaacgatc ccatgcgatg agccgaacga gccgtcgaag + 125161 cagagcagcg gtggggaacc tcccctcccc caggaagtcg gctcgccagg ccgaaacgac + 125221 gtggcggccg gatcgtcgcg atcgacacga tatcgctcgc cgatgacgat gcctcctgtt + 125281 tccggaaaga gcttttccgc ctcttgcatc gtcatccaat cggcctcccc gtgcagggcg + 125341 cgtttgcccc ggatgcgtct cggctcgctt cgggtaaagg ccgcattgcc catcctggcg + 125401 acgcgcagcg cgaagaaggt gacgagcagc gcggtgcccg ctcctatgat ggtcgaaggg + 125461 tcggcgtagg acagaagcgt cctgcgcgcg ggcacctgac tggcaaaggt caacagacgc + 125521 gatccctcgc gaagagctgc gacggcgatg acgccgagac cgcccgcggc aacgctccaa + 125581 cctgatgtct tgacgttgat cgatcccttg gtgccgaaca ggaacatgac gccacatgcc + 125641 ccggcgagtg catagggcaa tgcaagcccg atccgcccca gcatcagctt cgcctgatcc + 125701 gtcgttccga atgctgagag ccaccgttcg atgcctggga aagtcaaagc gatggccacc + 125761 atggctacgg ccggaatacc ggcgacaagg atccgcttag gcgtcatcgg cgaatgcctt + 125821 cgcaccgagt tcactcagac gcgttcgctc cgcctcgtca gcctttatcc gtgcgctggc + 125881 atcgatcaga agcccgagca agagcgcgcg cttctcgtag cgcaggccgg ccttgacgat + 125941 caggccgcca agttcgatct tttcccgcgc gtcctttttc cgagcctctg ccgtcgacat + 126001 cttttgcacc cgctcaagcc tcagcagacc ggcccgcagc ctcgccagcg ccgttcggcg + 126061 gcgtggtcgc aggcttgcgc ccctgagcgg gtctcgcctg tgcccgaaac cgagccgcga + 126121 cttcttcgaa cgccgacaga agtttaccct cctctatctc gatctctccg agccctgccc + 126181 gtagcgcgat ccgaccgatg cgctcggcgt cacgtgtttc ggcctgcttg agttgctcct + 126241 gcaggcgagc aatctcttcc cggattttgg cggttggctt cttcataagc gttgcttcct + 126301 tgttcatgat ggtccgttga agacgaaccc tgccccgaat tttctcccgc tggaaggtgc + 126361 aatgttgcac ctcgccaaag cgctagcttt gggcgaatga tcccgccgtt ccgaaggagc + 126421 ggatccaagg gcgcaattat acgtcgctga cgcgacgccc tgctgaaggg tcctggcggg + 126481 gccgcccgct cccaacgaac gcgtcgaaaa tgttcgatct taggagcctg tctcagccgt + 126541 ggccgtcccg catttctcag tcagcatcgt tgcccgcggc tcaggccgca gcgcggtgct + 126601 gtcggcggcc taccggcact gcgccaagat ggaattcgag cgggaagcga ggacgatcga + 126661 ttacagccgc aagcaaggac tcctccatga ggagttcgtc atccccgaaa ccgcacccga + 126721 ctggctacgg tcgatgatcg ccgatagatc ggtctcgggc gcctcggaag ctttctggaa + 126781 caaggttgag gcgttcgaaa agcgcgccga tgcacagctc gccaaggacg tgaccatcgc + 126841 tctgccggtc gagctttcga acgaccagaa cattgccctg gtccgcgact tcgtcgagcg + 126901 ccacatcacg gccaaaggca tggtcgccga ttgggtgtat catgatgctc tgggcaatcc + 126961 gcacatccat ctgatgacga ccctgcggcc gctgaccgaa gacgggtttg gcgccaagaa + 127021 agttgcggtt ctggcaccag ccggcaaacc tgtgcgtaac gacgccggca agatcgtcta + 127081 cgagctttgg gctggaagca ccgacgactt caacgcgttt cgcgatggct ggttcgcatg + 127141 ccagaaccgg catctggcgc tcgcaggtct cgacatccgc atcgacggcc ggtccttcga + 127201 gaaacagggg atcgagctga cgccgaccat tcatctcggc gtcgggacga aggcgatcga + 127261 acggaagggc gacaataaaa ccggatgggg ggaggagaag gtcgcgctcg agcggttgga + 127321 gctgcaggaa gaacggcgcg ccgagaatgc ccggcggatc cagcgcaatc cggagatcgt + 127381 gctcgacctc attacccgcg agaagagcgt cttcgacgaa cgcgatatcg ccaagatcct + 127441 ctaccgctac atcgacgatg cggcgctctt ccaaaacctg atggcgcgga tcctacaaag + 127501 cccgcaaaca ctgcggctcg accgtgagcg gatggatctc gtcactggcg tcagggcacc + 127561 gagcaaatac acgacgcgag agctgatccg gcttgaagcg cagatggcca atcaggcaat + 127621 ctggctgtcg cagcgatcct ctcatggcgt caggcacgcg gttctgagcg gtgtgttttc + 127681 gcgacatgac cgtctatcgg atgagcagaa aaccgcgatc gaacatgtcg caggtccgga + 127741 aaggatcgct gccgtgatcg gccgtgccgg tgccggcaag acgacgatga tgaaggcggc + 127801 gcgtcaagcc tgggaagcag ccggttatag ggtcgtcggt ggcgcgcttg cgggcaaagc + 127861 ggcagagggg ttggagaagg aagcgggcat tgcctcccgc acgctgtctg cctgggaact + 127921 aagatgggat caggagcgag accggcttga tgaaaagtct gtcttcgtcc tcgacgaggc + 127981 aggaatggtc tcttcccggc agatggcgcg cttcgtcgaa gcggtaaccg tgtccggtgc + 128041 caagctcgtc ctggtcggcg atcccgagca gctgcagccg atcgaagcgg gcgctgcctt + 128101 ccgcgccatt tcagggcgga tcggctatgc ggaactcgag acaatctacc gtcagcgcga + 128161 acagtggatg cgcgatgcct ccctcgatct ggcgcgtgga aacgtgtcgg ctgcgctcga + 128221 cgcttacgcg cagcgggata tggtgcggac cggttggacc agggacgagg cgatcacagc + 128281 tttgatcgcc gactgggacc acgaatacga tccggccaag tcgactctga tactcgctca + 128341 tcgccgtatt gatgtgcgtt tgctgaacga gatggcgcgc agcaaactcg tcgagcgtgg + 128401 gctgatcgag gccggccatg cgttcaagac ggaagacggg acgaggcagc ttgcggccgg + 128461 tgatcagatt gtctttctga agaatgaagg gtcgctcggc gtcaagaacg gtatgttggc + 128521 gcgtgtcgtc gacgcccagc ccgggcggat tgtcgctgag atcggcaatg gtgaagaccg + 128581 gcgtcgggtc gtggtcgaac agcggttcta cgccaatgtc gatcacggct acgccaccac + 128641 ggtgcacaag agccagggcg ccaccgtcga tcgcgtcaag gtgctggcgt cgagcacgct + 128701 ggaccggcat ctatcctatg tcgccatgac ccgccaccgc gagacggcgg aactctatgt + 128761 cgggctggaa gagttcgcgc aacggcgcgg cggcgtgctg atagcccatg gcgaagcgcc + 128821 ttacgagcat aagccgggca accgggacag ctactatgtg acgctcggct ttgccgatgg + 128881 ccaggaacga accgtctggg gtgtcgatct cgcgcgggcg atggatgcgt cgggcgccag + 128941 gatcggggat cgcatcggcc tcaaacacgt cggatcgcag cgcgtcacgc tccccgatgg + 129001 cactgaggtt gaccgcaatt cctggaaggt cgtgccggtc gaggagctgg cgatggcccg + 129061 gctacatgag cggctctcgc gtcccggatc aaaggaaacg acgctcgact atcaggacgc + 129121 ttcgcactat cgcgctgcgc tgcgtttcgc ggaggcgcgt agcctccacc tgatgaatgt + 129181 cgcccgcacc atcgcgcacg atcagctcca atggaccatc cgtcaatcgt cgaaactcgc + 129241 cgagctcggc gcgcgccttg tcgccgtcgc ggcaaagctc ggccttggcg gagcaaagtc + 129301 gactgtatcg acaacatcgg caatcaagga ggcaaaaccc atggtctctg gcacgacgac + 129361 cttccccaga tcaattggtc aggcggccga agacaagctc tcggccgacc ccggcctgaa + 129421 ggctagctgg caggaggttt ccgctcgctt ccatcacgtc tttgctgatc cacaagccgc + 129481 cttcaagaca gtcaatgtcg acgccatgct ggcgaatgga acggtagcgg caacgacgat + 129541 tgtgcagatt gccgagcaac cggaaagctt cggtgcgctg aaggggaaga ccggtctttt + 129601 tgcagggagc gccgagaagc gggcccacga tactgcgctt gtcaacgcac ccgcacttgc + 129661 ccgcgatctg cagggcttca ttgcgaaacg cgcagaggcg gccagccgct atgaggacga + 129721 ggaacgcgcg gtccgcacca agctctcgct cgatattcct gctctctccg catccgcaaa + 129781 gcaggtgctt gagcgcgtgc gcgatgccat cgatcgaaac gatatccccg ccggtctcga + 129841 gttcgctctg gcggacaaaa tggtgaaggc cgagctcgaa ggttttgcca aggccgtgtc + 129901 ggagcgcttc ggggaaagaa ccttccttcc gcttgcggca aagactgcag acgggaaggc + 129961 gttcgaggta gcgtccgcag gtatgcagcc ggcccagaag aacgagctgc gatccgcttg + 130021 ggacacgatc cgcaccgtcc agcaattggc tgcgcatgag cgaacagctg tggcactcaa + 130081 acaggcagaa gccatacggc agacccagac gaaagggttg tcgctgaaat gatgtctcat + 130141 atctccccca cctctgccgt tgtgcgccaa aagcgcccgg ttctcgccct cctcgcagtg + 130201 tcatgtggca ttgcgctcgt ggtgattatc ggcgggttca tcggcggcct gcggatcaac + 130261 accacgccca gcgagccact cgggctctgg cgtgtcgcac ccctcgaaca cccgattcag + 130321 gtgggtgaaa tggtgttcgt ctgcccgccg gagactgacg ccgtttcgga gggatttgag + 130381 agaggctacc ttcgctcagg cctgtgcccc ggcggctttg gccccctcat caaaacggtg + 130441 gctgcggtcg gcgggaagcg catcgagatc gccggcaatg tcacgatcga tggacggccg + 130501 atcgcaaact cgagcctcgt gtctcaggac ggccaggggc ggccattgcg cccgtacgca + 130561 ggagggacga ttccggccgg cttcctgttc ctccattcgc catttcccgg atcctgggac + 130621 tctcgatatt tcgggccggt cccaggatcc ggtgttctcg gcctcgcaga acaggtgctt + 130681 acctatgcgc cctgattggc ttcaaccttt taccctcgct ctggctgctg ccacgactgg + 130741 atatatctcc tggagcgggc atgccctggc gcttcccgca gcgatcgcat ttccggcgct + 130801 ttggtcgctg gcgcacagtc gtcgggccgc aagtgtcgtt tcggccgcct attttctggc + 130861 ggcgtcgcgc ggtctgccgc aaggcgtggc tgccttctat caatccgata tctggccggg + 130921 actgatcttg tggctcgtcg cctcaagcgg cttcgttttc gttcacgtcg cactctggtc + 130981 gcgtcagtcc ggcggctgga aagccctgcg atatacgatc gcgatggtcc tcatggccct + 131041 tccacccttc ggcattgttg gctgggcaca tccgatcacg gcggcaggcg ttctctttcc + 131101 aggatggggc tgggccggac ttgcggcagt gacagccggt ctcgcgctca tgacgacgca + 131161 atatcgaccg gctgtagcca taacccttgc aggcttctgg ctctggtccg ccgcattctg + 131221 gacagcaccg gacatagggc ggcattggca gggggtcgac ctgcagctcg gaaatagact + 131281 cggtcgcgat aacagtcttg ctcggcatag tgatcttgtt gcaacgctgc gctctgagcg + 131341 tcgccccggc tccactttca tgctcttgcc tgaaagtgct ctcgggtttt ggacgccatc + 131401 ggtcgaacgc gtgtggcggc agcaactggc cgaggccgat ctgagcgtca tcgccggcgc + 131461 ggctgtcgtc gatcgggaag gatatgacaa tgtgctggtt cgcgtctcgg ccaccgacag + 131521 tgagatcctc tatcgggaac gcatgccagt gcctggctcg atgtggcagc cctggttggc + 131581 accgatcgga aagagcggcg gagccagagc ggacttcttt gctaatccgg tcgtgtccgt + 131641 tggcggccag cgcgtggccc cactcatttg ctacgagcaa ctcatcatct ggcccgtcct + 131701 gcagtcgatg cttcatgatc cagacctcgt cgtcgccgtc ggaaacgact ggtggaccaa + 131761 agggaccgca atcatcggca tacagcgcgc cagcgccgag gcttgggcgc ggctgttcaa + 131821 caaacctctc gtgatgtctt tcaacacctg atctggagca aagaccatgg acgctgccct + 131881 cattgcgaaa tgcgccgatc caagccttcc ccccgctatc gtggaacagt tcatttcggc + 131941 cgtcggttcc gacgatccac tcgccgtcac cgtgaatgct gatggtcggc tggttctcat + 132001 ccccaaacct cgctcgcccg acgaggccat gggtgtggtc aaagactatg tcggtcatgc + 132061 catcgtgcgg gtcggcatca cccagtttcc ggcggacgtt ggcgtcgacg acgcatcgca + 132121 gcttcaatca gatatgttcg aggcctgcgc gaatttgcgc acagggacag gcatattcgc + 132181 aaaggtcgcc cgcatcgtcg ccaaatggta cggccgcccc acgaataaag agcttcttcc + 132241 gcagttggtg gacgatacga tctacgcttg gaagacgggc agcttcgagg gcgacaacgt + 132301 gttccgtgcc tcagatcctg gtggcccgac tttctttggt acacgctcag aaaagcgtgc + 132361 cgaaggtacg gatcccgtgg cgcctcccat ggaaagtgaa gatgcctcac aatcggccga + 132421 gccggataag gctacggagg caggaatgag gatcgatctt tcgaggatcg gcggacaaaa + 132481 ataacgagcg ctgcggatgg gtcagcgcta gggcgaaacc ggagaggtga gcgctggcgt + 132541 ctcgcctcgc ggcaggaaga agcccttgat acatgggtca ttgcgaacgg cgatctgcgt + 132601 gtgggtacgc tcgtgaaagc gagcgccatc gtagatctca ttgccttcgg gaaacaggcc + 132661 tcgaactgtg tcgaatgccg gaagggcggc ttccttggcg atctcgtgga cgtaattgat + 132721 aacggcgcaa tccagcttcc gaaccagttt gtcggaattc ttgtcaccct tggcatcgga + 132781 attcaccggc attttgccgc cggtggcttc gatctgactt ttcagcattc gatagctagt + 132841 ctggatcagg gggacgtact tgcgggtgat gaggtctagg cagttgccga gatcgataat + 132901 tgcgccgagc acaaagggct ctttataagc gccagattca gctttcagcg tcgcccattc + 132961 gagcgcacgc tcgggatcgt tttcccaaaa gtagatccca gaaccgagcc agtcatacgc + 133021 cttttcgctc ggaagaaggg gcgatgcgcc agttagcgca gccataccca cggccttgtc + 133081 acagccgtgg tagccgagca cgtatgcagg ataattggaa ctcaagcggc agccgcttcc + 133141 tcagcgtcgt cgtcgcgcgc atattccggg gcgagcttgc catcggcaga cacaatgccg + 133201 gcgtccatca gatgacggcg tgcatcgtct ggagacttgg cggcgtcttc agcccgcttt + 133261 tccatgagcg cgatgagcgc ctccagctgg atgtctgtca tcgccgaggc cctcctgagc + 133321 caacttcttg cagtttatta catggattcc cgcaaaagcg caaacctggg gggcttggcg + 133381 ccaagcatcg taacgcggac acgctgcaaa cgttcatttt cgtctgcccg gtactcggtc + 133441 aactcaagac agaaaccgtt tcaagctcaa gccggcgtac aaaatgtatg gcaacttacc + 133501 gagggaatag tggccacagc ttagcttcat aatttgcggt cggccgccgg cgttcttcaa + 133561 actgcgtaac aggctctccg acagctccgg catcacaacc gtgtctctcc tggccagcac + 133621 catctggata tcgagatccg gccgcgccaa acgatgcgcg tagttttcga gattgagtgg + 133681 actccatgct cttctgagat cagccagctc gatctctggc tcgaggctgc tccggatagt + 133741 tcgtgtagcg cgacccgtcc aaaccatatc cgcaaggctt ccacctgtca gaaacagcga + 133801 agctttcgaa acgttcggat catgtgcggc aatcaatcct gcgacccagg aaccgaggct + 133861 catgccgagg acggagattt ctcgatagcc ctcacttttc agccagcgga tgagttttcc + 133921 tccatcacat acggcctgcc ttacggcttg gatcgttcga ccaaggttgg cgctcagcat + 133981 gtaatccgcg tgcaatgaac cgggacggtt acgctcgaag tgatatggca tcgcgatctc + 134041 gatgactgtg atgccgcgcc gtgacaagaa gccggcgatt tgagcgtttc gagctcgtgc + 134101 gttccaatgg tgaaaaatca ccagtgcctt gtcgagtgac ccactttttg tgattttcgc + 134161 ccagacgata ttgttctctg cgacatcggt cgaaatatcc gatggaaact taacccaatg + 134221 ctctcggact tcgaatcgtt gatcgccgcc gttctgctca tcgaaaaacg ttgggtcaat + 134281 cgaagcttca tctgcgcggg cacaaaattc gcggatactg tcgattctct cacctggaaa + 134341 ggcacggtct gcatcaagga tgaaactcgt cgtttccttc ccgtcctcac cgcgtcgcgc + 134401 ccgctgttca tcccaccgat caagccagct aagatacaat ttgtttgctt cctatttcac + 134461 tggatcgccg actgcctagg ccaggatcaa ccaggctgaa caacgcgacg aacacaccga + 134521 tgcccaaaac gattgagaat cccacgaacg cgtcaaccag caggtaggtg gcaaacagca + 134581 atgtcacgac cgccgccatc ttccagagag gcacatgaaa tcgcgtgccg atagctcgca + 134641 ggcagatcga accgtagaca aagacaacga tggtcgaaag tgcaacgagc aggccgctgt + 134701 aatgggtagg catccgatag tggaacccat tgctggcacc atcacgtcca gccagcgctg + 134761 aagacaaatc cccgatcaac cagtcgagga tgtttttggc gttcgatgtc agataggagc + 134821 tctgagcctt catgcacaac gcgaccagga ggcctagaat tgtgcaacga aaaatcccac + 134881 gcattaatct caggcaaagc tcatggactt cgccttgaga tccaacctgc tgaaggatct + 134941 ttgcttcaag gtcaattttc ctgagatcat ggatcatggt gtggagcaaa atgagcccgg + 135001 caaaaaaaag gtaaaagcag aggcacatat agaggtaagc gagggccgta aacgcgatcg + 135061 acaccgggac cgatataacc tccggtcgca caatcgccag cgtgccccaa ctagtcgcgt + 135121 agtctgcacc gccttccagc aacgggatca ggcaaacgcc aatccattga aaaacaccgg + 135181 cgaacaacac acagatcagg aataccgccc aataagaaga ggaggacgcc tccacattgc + 135241 gcgcccaatc atcctcgctt gtcgtcgtgt tgttccacgc ccatagcctc acacgccctt + 135301 ccttggtcca aaaagcgagt agctctgtga cgagagcgaa aaacagtggc aagagcacca + 135361 tgaaaaggaa ggtccaattc ggcgcccaga gaaatccaac ctctttcgca acgttatccg + 135421 cgcggaggta aacggcgctg tggatcccca cgatgtagga cagaaaaccg agagcggagg + 135481 cgcctgcaaa cacagacgcg ggcaagttca gcggcgagcc gccgctaaaa agggtctccg + 135541 attttcgcgc taaactgaag cgctgcttct tccgatcggc ttcgggttcg gcatcgggtg + 135601 taaacttaac cggtgaccac tgctcgtggt caaccgtggg gtctggcggc acatctcggg + 135661 ttcctggagc aaggctatcc gcctttttcg cctcccgtct cttggcttgc agccgcgcct + 135721 gagccgcgct tagctcgacc tgccatgcag ccgtggcctc cggatcatcg cacccaaaaa + 135781 ccctcgcgag ccaacgaatg ttcgtcgcgc taattcccct gtcattctcc tgaaaccaaa + 135841 gctgcaccgt cctcagatca accccagagc ggttcgaatc gatctgtgaa atcgccgcgg + 135901 cgagcagttc gggcgtccac gggcccgcgg gaaatccgtc ttcaccgact gctctgcctg + 135961 cgcccgccgc agccagtttc ttgaacaatt ccttccagtc actgccgtca tttggcggtg + 136021 gtagaaataa ttttccgttc tgaaacaatc gattaccagt tctgatttcg tttcgtttcg + 136081 ctgcacttca tattacatcg tgccaaatca ctcaatcgcc ggttatcgaa ttcaggttgg + 136141 aaaattttag ggcgaactca tagtgagtgg tcgagatgac caaattggta aatcaacctg + 136201 aggctgagtt aaccgccgcg cgctgcgctc gttagcgttg gcagaagggg ccggacagga + 136261 gcaggaccgc gttccgctac cgtcgctatg agccgctccg ggacagaaga gcagcaccga + 136321 atggaagggg attggatgaa ccaacttgtc gcggggcttc tagagatcag cgctgtcgcg + 136381 catgacgatg caacgctgaa ggccgctctg gcggatttgg cagaacggtt cgaattttcc + 136441 ggttacgact atgccaaact gctgccgggc gacttttacg tgatatcgaa ccttcatccc + 136501 gattggctga aacgaagccg aaagttggac ctggaccgac gaaatccggt tatgaagcgt + 136561 gcccaacaaa cacgccgcgc ctttatctgg tctggcaccc cacagacggg gacgccgctg + 136621 gaggaagacc agactttcta tgagactgca gcgcagttcg ggatccgttc aggcatcacc + 136681 attccaatcg ctatttccag tggcgccatc tcagtcctca gcttcgtctc tccaaagtcg + 136741 attctgaccg cgcaggacga aatcgatccg atcgctgcat cctctgcggt tgggcagtta + 136801 catgcccgca tcgggcagct aaaggtaacg ccttcaatcc aggaatcgtt ttacctctct + 136861 ccgaaggaag ggacctacac gcgatggctt tcactcggca aaacggtgga agacacggcc + 136921 gatatcgagc aggtcaaata taataccgtc cgcattgcgc tcgccgaggc acgacggcga + 136981 tatgacctct gcaacaacac gcagctcgtg gctctggcaa ttcggcgtgg tctgatttga + 137041 tcaaactttc ggtgttcgac ccaggatgtt gagaagtgtt gtgagcgcag acatctgcgc + 137101 atgcatttcg atcgtcgcct ccagataggc taggtgagcg tcactaggct ctttctcgtc + 137161 gcttcggtca ccggcatggg cgacctggaa cattctttcg gcgcggtcga caagcgtgcg + 137221 atgtttcttg atcgcgccga ccgtcaggat ttcgacaact tgctcgggct cgttggagag + 137281 caatccgatg atcggccgca ggtcgatgtc cgtggcagtg tcatttctct cgtaacccat + 137341 aagatctccc aacgcccagg ctcagcctgc ggtgtacgcc ttgtttgctc gtctgagaat + 137401 cgagcaatgc agcgcgtacg gccttgggag ttattttagc gattggctcg ggactcatcc + 137461 actgtagttt tgcagcaact ttttttggtt gacggagccg gtgtccgtcg cctgaacgtc + 137521 acccgccatc gtgctcatac ctcagttgct gtaattaccc ggaaaaatga tgtcctggtc + 137581 gaccaggaca ttgaaccgga aaccgggtcg gatcttgatt gtcggttgga cgatgaggtt + 137641 cttcgaaatg gtctgttcgg cgacacgccc aaatgtctcc gcaaaattcc ggcgcgccgc + 137701 gtcggaggcc gtgtcttgcg tggcaagagt cgagctttga ggcatcgaca tgtcgatccc + 137761 tgtcccgata agcgcgacaa gcgcggccga accaaaggtc cggagatagt ggttgtcgac + 137821 cttgtcgctg aaaccgctat aaccctccga gtcggtcccg gccataccgc cgatctgcag + 137881 agtcgagcca ttcgggaaga tgatatcggt ccagacgacg agagcgcgat tctggccgaa + 137941 cgagaccttg ctgtcgtaac gaccaaagag ctttgtcccc tgcggcacca acatgaaatg + 138001 accggtcgcg ctgtcataga cgttctggct aacctgcgca gtgatgcggc ccggcaagtc + 138061 ggaattgatg ccggtgatca acgttgcggg gataactgac ccccgcttca gctcgtagcg + 138121 tgactgctga ggcacgacct gattgggcag gtagccgaga tccttgatgt ccgcgttaaa + 138181 gaaatcttcc ttggacgctt gtccattctg atcgacctga cccgccgcgc cggcccgcaa + 138241 agctgcggca taaagatcct gcgagccgtt cggcgcggtg tttgattgac gcggcgcggc + 138301 agacgcagcg cccggcgctt gtgcggagac atcggtgatg tcgactttca gcggcgagtc + 138361 tagggccgcc ccccttgcct gaaggcttgc catgcgctgg cgctgctgct cacgaaagat + 138421 ctgctcgcgc tgttcgcgaa gcatcctggc attccactcc tcgtccgact ccatgcgaga + 138481 cctgcgcggc tctcgtggct cgatctttgg ctcggtttgc actggtcggg tatcgccttg + 138541 ctgctgggga accggggtcg gctgaaatac cggctgctgt tcctcccggt caccgataat + 138601 tccatcttta acaccccgct tcagctgatc ggcgaaggtc gaagccggcg tgccggaacc + 138661 gctctcgttc gggtcgtgct gaccaaacct cagtccgcgt gatgaaaggc cgtaaacgag + 138721 aacggcaagg aacagaacca ggattccgat gccgacgaac aggggcacgc gattgaggcg + 138781 cttcatccct ttctcgtcag cagactgatt tgacgttccc agttgcagcg attggaccat + 138841 gatcgtctcc gcctgtcagt ttcgcttcat aagggaaagg gggcttgccg gcgtcgccag + 138901 gccattattg accgagtaag ctcggccgag ctccatcgtg tccgtcgaaa ggcgcgccag + 138961 cacttgtccg tcgacggtag cgatcgagta ggccagctcg accggcttgg gcgcatcctt + 139021 cacgcttgcg ctcttatcat cggtgacgat ggcaaagccc catcctttca aagcggcttc + 139081 gagcgcgacg gagaattcgg aggtgtcctt gtgcagtttg atgggagtgg tcgtcgatcc + 139141 ggcctgctcg gcgaaacggc ctgccatgtc gccagcgatg gcaccggcgg caggcccggt + 139201 gacctcagcg ggtgctccgc tggtggacag tccgtcggtt gcggtctggc atccggctag + 139261 gctgagggta aggacaatcg gaatgaggcg ataaagctgc atggatcacc ctccccgccg + 139321 gatcgtgatc ttctgctgtt tccagccaac cccagaaatc aggatcgcct tgtcgatatt + 139381 gtaatcgacg atcatcatgt ctttcttcat ccgataattg acgatgcggt tttgtccgcc + 139441 cgagacgacg aagagaaccg gcgcatcctg accggagatc gcacgcggaa actggatgta + 139501 ggtcttctgc ccatcggaat agacccgctt cggccgccaa ggtgcgcttc cactcaagga + 139561 ataggagaag gcaagctgtt cgggagcgac accgccgtct ggtccggtca tagcctggat + 139621 gcgggcattg atatcggaga gctttgtggc ggcgtcttcg ggatattcga agccgacccg + 139681 cgccatgtat tgagtcggat gcgatttcag ctggatatgg taggtgcggc gtgaggtcgt + 139741 caccaccatg gaagtgacga gccccggctc ggacggcttg acaatcagat ggatcgcctg + 139801 tccaccggcg gcgcccgacg ttgctggctc caccttccag cgcacggtat caccgacaag + 139861 gacgtcgcgg acaatttcgc cgccctgcag ctcgatatcg cagacctgca aaggggagca + 139921 gacgacggag ggctgcgtct caccaaaaag gaagatcacc ttgccgtccg gtccgcgtgt + 139981 gaccagtccg gtctgccctc gccatttgcc cgagaggttc gtccctttgg cctcattgcc + 140041 cgtcaggctt tgcgcgatgg ccatcggtga aactgcggtc gatacggcaa gcgccagcat + 140101 gcagcgaaga gcgacgcgtt tgtgtttgat cctcattgca gatccctgcc cttaaagttg + 140161 cgcggtccag tcgaaatcgg tcacgtacac gccgatcgga ttgagccgga ttgttgcttc + 140221 gtcttgtggt gatgtgatcg acacggtcgc gatgccccgg aaacgtcggg tcgcgacctc + 140281 cttgcctttc cgatcgcgct cgtactcggt ccagtcgatc tggaaggtct ggttcgaaag + 140341 cgcgacgatg ttgttcacct cgatcgcgac ggtcgcgttg accgcctttt cgaacggcga + 140401 gttgccgcgg aaccatgcat tgaccttttg cgtcgaggga tcgctcgtcc gcaggagggc + 140461 ataggtgcgg tcgatatatt gcttctggac caccgcatct ggtgtgatcg aacgaaagct + 140521 ggtcacgaag ttcccgagcg tcgcgcgcac gacccgcgtg tccgcatact cgatctgctg + 140581 cgggaagccc ccggaaaccg tgttcccgag cttatcaacc tcgacgatat acgggacgag + 140641 tttgacctgc gtactgaggt aaagcgcgta gccaaagcca atcacggcca ttcccaggct + 140701 caagattcca acgaccttcc aggcggacgc cgccctgacg taggagccgt aacgttctga + 140761 ccattcctgc cttgcagcaa ggtacgggtt ctcaggtgcg ggatgccctg ccatttgtgt + 140821 gatcctgtcc gtgaattatg gtttctccgg tttttccgga gggggcgtcg gagctgagga + 140881 attgccccta gcctgatcga gctttgcgtt ggccaggccg agaagggaac cggcataagc + 140941 gccgggtgac ccgattgcct tctccttggc tgcggatccg gcagccatgg caccggagct + 141001 gaaacttgcg cccatgcccc gaagtaccga agctgcggcc gaggaaccgc cggcccgcgc + 141061 tgcctgcgcg gcggcaaaac cggcgccgac ggcgccagct ccaaggaatg cgccgcccgc + 141121 tgcgaacgag gccgcctgtc cgccgtggcg aattgtctcc atgccgccgg agaccgacgc + 141181 cccttgcacg acgccctgga tgatgttcgg gacgtacacc gcgatgatga agacgacgac + 141241 cgaaatgcca gcgatcgcga gtgtggtgat gaactggtcg ctttcggctg tcggggcctg + 141301 cgccagtcca agcaggacgt tcgagccgat cttggcgatc atgaccaagg ccatcagctt + 141361 catcccaacg ccgaaagcat agacgagata gcggatcgca aagtccttgg tgaaggagga + 141421 gccgccaagc ccgagcatga tcatgcctgc cagcaggccg acatacatct cgaccatgac + 141481 cgatacgaag attgccgcga caagcgagaa gcagatgacg acgatgccca tcgcaagcac + 141541 ggcggcgata gccagggcat tgtcttcgaa gacgccgaac ttcgcctgtt ccgacatctg + 141601 ggacgctacg cgaattccgg cgtcgaagac ttcggctgga gacgcagagc cgccaccggc + 141661 gccgatttgg aaaagactgt cgacgacagc cctggcgaag gtcgggccct gcgtcaaagc + 141721 gaaggcaaaa aagccgatga acatgatgcg acggacgagc tcggcaaacc aactgtccag + 141781 cgatgctgcc tggatcgcga gccacaccgc agcaatgccg acctcgatgc ctgcgaggat + 141841 ccagaacagc gatttcgccg catccatgac ggtggtttcc caccctttgg cggcagtcga + 141901 gacctggttt tccaactccg tcagcacttg gccttgttgg gcgaaggctg gtgccgccaa + 141961 agcaaggaag gcgattcccg cgacgaggat ggtgcttctc cgattgcgct tcaccatctc + 142021 gggcgcatct cctggccctt ttcgatcggc ggcagcgcct tgcccgtgcc aaagaacttc + 142081 tcgcgttttg ctcgttgctc ttcactcatc tgcggcgcgg cggagctttc gccacgaacc + 142141 agaacgactg aggtgacggc aatcactgtg atcacgacgc acacggaggc gatcagtacg + 142201 gggcgggtca ccagcggggc tccattttct gaccggaggg aatggattgg acgtccgcat + 142261 tgaagaattt ctctcgacga gcctgcgcca gatccttgtc cgtctgttcg ctctgcagcc + 142321 aggtccccat catcgtggtc tgctgggaga caattccgcg tagcttctgt atctgggcga + 142381 cctgctgagc ggcgatctgg tggccgacct gaagtgcctt catctggccg tccgctgatt + 142441 gcgacattga ttgcagttgg cccatcgtat cctcttcgct gtcgaactga tcggcggtga + 142501 ggcttgcggc cttgagagtg ctggaaatcg tgtcgcggtt cgtgttcgac cacgtctgat + 142561 aggtggacga aaagctctcg gcgtttggca ggcttgtctt caagtccgca tagcttttga + 142621 agcgttgctt cagaagatcg tcagcgttgc ccatggaaaa ggcgatgccc tgaccctgat + 142681 tgacgatatc gcgcagctgg ttcaggtcac cttcaacctg gccccagata tggttcggaa + 142741 gctgggccgt attctgcagc atgttttgat agatattgag ctggttttgg atctgctcgg + 142801 caagctgggt aatctgggtg atctgatttg cgatctgctc gctcgattgc ccgaccaaag + 142861 atatcagctc gccgttgttc aagagctgcg tcatttcggt cgcgccggta actgcgcctc + 142921 ctgcatagga aacagtgggc gtggcggtaa ggccggcggc gaccaatccc accaccaatg + 142981 atttgccgag cttcaagaac tggattgctt cactcattgc ggttgatccc cctttcgatg + 143041 agccactgac ccggccattc gaggccgtgg gtcgacgaga gtgccctgat gcgggctaga + 143101 tcggctttgc cagaggcgcc gacgaaggag agcgtgaccg gtccgagaga catgtcgaaa + 143161 agcctgcggc catcgggtga ggtcacgtaa tattcgcgtt tcgggatggc gttcgcgaca + 143221 atctcgatct gccgcgcatt gaagccgatc cgctcgtaga attcgcgggt gccggtttcc + 143281 cgggctgcgc cgttcggaag gcaaatcttg gtcgggcaag attccttcag gacgtcgata + 143341 atgcccgagc gctcagcgtc ggagatcgac tgcgtcgcga gaatgacggc gcagttcgcc + 143401 tttcgtagca ccttgagcca ttccctgatc ttgtcgcgga acacaggatg gccgagcatc + 143461 agccatgcct cgtccagaac gatgaggctt ggcgagccat cgagccgctt ctcaatccgt + 143521 cgaaagagat aggtcagaac agggacgaga ttacgctcgc ccatattcat cagctgctcg + 143581 atctcaaagc actggaagcg cccaagcgtc aggccatcct cttccgcatc aaggagcaga + 143641 cccatcgggc catcaaccgt gtaatggtgc agcgcatctt tgatctcccg catctggacg + 143701 ccgctgacaa aatccgacag cgaacggccg ggcgccgacg ccatcaaccc tatttgcctc + 143761 gagattgcat tgcgatggtc cggcgcaatc gtgactcctt gaagcgaaac gagggtctcg + 143821 atccactcgg aagcccatgc tcgatcggcg tcagtcgaaa gctccgacaa tgggcagaag + 143881 gcaagactgg ccccctctcc gctctcgccg ccgatctcgt agtggtcacc gccgacaccg + 143941 agcgtcagcg tcagcatcga attgcctttg tcgaaggcga agacttgcgc cttctcgtag + 144001 cggcagaatt gggcggcaat cagcgcgagg agcgtggatt tgccggaacc agtcggaccg + 144061 aaaatcaagg tatggccgac atcgtcgaca tgcaggttca gtcggaaggg ggtcgagcca + 144121 gatgcaacct gcatgagtgg gggcgcatcg ggaggataga acggacacgg ggcgaacggc + 144181 tgacccgacc agacggagtt gagcgggacg aggtcggcaa gattgcgcgt gttgatgagc + 144241 ggttcgcgta tgttgcaata ccagttgccc ggcaggctgc caagaaacgc gtctgtcgca + 144301 ttcagcgtct cgatgcgagc tccgaagcct tccgcctgga tgaggcgacg aaccgcctcc + 144361 gccttctcct gcagcgccga gcggctttcg tcgaagagaa cgatgaccgg ggtataatag + 144421 ccgtaggcca ccaattgcga tgaggcttcg gcgatcgcat cctcagtctc ggcgaccatt + 144481 gccatggcat cctgatcgac cgaccgggat tgcgtctgga acagctggtc gaaaaacggc + 144541 cgcactttct gctgccactt cttgcgggcc ctctcaaggc gctgctttgc ttcctccgca + 144601 tcgaggaaga tgaagcgcga cgaccaccga taagtcagcg gcatgaggtt caggctgttc + 144661 aaaatccctg gccagctctc ggccggaagc ccatcgatcg cgatcacacc aagaaaccga + 144721 ccctcgaccc gcggcgtcag gccatgttcc agctccgccg ttgcgagcca gtctaggtac + 144781 atggggattt ctggtaggcg aacgggatgg ttctcgccgg tgatggcaaa tcgaatgaac + 144841 tggaaaagct cgtcgtagcg tgcaacgcga gcaccatccc gctcggaaac ctctcgggtt + 144901 tgcatgcgct gcaccgagag aacattgccg agatactgct caatctcccg gatcgagcgg + 144961 cgaaacgcat caagagcggt atcggcatat ttcgctgagc ggctttcgtt gtccgagtag + 145021 atataacgcg tgagacccga tcgccgccgc tcgggcggcc gataggtcag aatgagcgca + 145081 tgacggcttt cgaagtgccc gcgctcctgt ccaaaatgtc tacggcgctc gtggtcgatc + 145141 agcagggtga cggcatcggg aaaatggctg cgttcagcag acgggtattc ggtcgtcggc + 145201 acgcgcactg cctcgacctg gatcatccag cccgtaccca gtcgcgagag gatcgagttg + 145261 atctggcggg aaagctcgtt tcgctcgaaa tccgtagcgc tttcggaatc cggaccggca + 145321 aaataccacc cggccatcag actaccgtcc ttcagcagga gaatgccgtt atcgacaagg + 145381 ccagcatacg gaacgaggtc cgcgaacgac ggaccggtcg atcgaaatgt gcgtaaagcg + 145441 accattgtag cctcctcagt accggcgcca gggcgcggag gtcgggaggt agtgcgcccg + 145501 gtaccgcata tggcgtgcat agactttgcg catcatgggg tccgccttcg ccatcatgcg + 145561 gagcgcgccg acgaccacga tccagatcgc gatgccgaaa agtgccgagt agatggtcag + 145621 gacgacgaag atgaggatga tcgcggcgag cccggtcagc agcacgagtt ctcgatccgc + 145681 tcccatcaag aggttcggtc gcgagagagc acggtgaata cggttgcgat ggagtccgac + 145741 gccggcgtca gccatgacgc ccctcccctc cccctaacgg agttgaagaa ggagttgggg + 145801 gttgctcggc gacaaacgag ctaccgatcg aggcaccggt cgaaccgaag aggccgacaa + 145861 ttgttgtggc accaagcagg atccccgcga cgagtaccac atacatcagt cgccgcgcga + 145921 aatcgttgag ctcgccgccg aagatcagca tgccgccggc aatcgcgacg gcggccagtg + 145981 caatatagcc ggcgaccggt cccgtgatgg actgctggat ctgctcgagc ggaccttccc + 146041 acggcaggct tccgcctgaa cctgcaagtg ctggcccggc aagcgaggcg caaagcaatg + 146101 cgccgataat tcccaaacgg agaaagcgat tatgctgcat gactgtcctc atcgatctgc + 146161 ggatagtgtt ctgtctggta acgcgagccg ttgaacccct cgacatggag gacctccctc + 146221 acccggcggc ctcgcccggc gcgttcgatc gagatgacga gatcgaccgc ttcgccaatc + 146281 acggcttgca ttggctgttg actggcttcc gatgtcagtt gctcgaggcg ccgaagcgcc + 146341 gacattgcgg aattggaatg gatcgtcgtg acgcctcctg ggtgtcctgt gttccaggct + 146401 ttcagcaggg tcagtgccgc cccgtcacga acctcgccaa cgatgatgcg gtcgggacgc + 146461 aggcgcatcg tgctcttgag aagcctggcc atgtcgacag cgtcgctcgt gtggaggcag + 146521 acggcatttt ccgctgcgca ttggatttcg gacgtgtcct ccaggatcac catcctgtcc + 146581 tcaggcgcgg aggaaacgat ttcagcgatg accgcgttcg cgagtgttgt cttacctgag + 146641 cccgtcccgc cagcgatgac gatgttcagc cggttggtga tggcactgcg gatgatcgaa + 146701 gcctgagctt cggtcatgat ctttgcggtc acatagtcat cgagtggaat gagccgtgaa + 146761 gcgcgcctgc ggatcgtaaa cgttggcgag ttgacgacgg gaggcaggag gccttcgaag + 146821 cggtggccgc caatcggtag ttcgccggag atgatcggtc gttctccgtc ggcttcggac + 146881 tggagagcgt gcgctaccga tccgataaca gtctcggcgg cggtcgcctg catttcgcca + 146941 gctggcgcta caccatgacc caggcgttcg atgaagagct tgccatctgg gttgagcatg + 147001 atctcaacga ctgttggatc ctcgagggca atgcaaaggt gctcgccgag agcgtcttgg + 147061 agtttacgga caaggcggga gtgcgactga agcatgggaa agcttgtctc ctcgtttgag + 147121 tgtaggagga gacagacccc ggtggccgcg cgaagtccat gtgcaaattt gcacctcgcg + 147181 agtcgcggct gattccgtca gggtcgcttc gattcggtgg tttaccagcg agttccgcta + 147241 tcggagcgga ttatgaatca caggaaagat gggcacggag attgcaccaa atgatcgata + 147301 aatcgctgga atctatgcat tagttgcgtt cgcttcggaa atggtcgttt tgcgttaagg + 147361 ctttgttaac tttaaaaatc ttgcaaaacg cgctggaatg agttgtctta tccacggcaa + 147421 aaagatgtat ttacgaaatt taccgtgtgt acgagaggac gtc +// diff --git a/seqmetrics/microBioRust/src/embl.rs b/seqmetrics/microBioRust/src/embl.rs new file mode 100644 index 0000000..07c1fbe --- /dev/null +++ b/seqmetrics/microBioRust/src/embl.rs @@ -0,0 +1,1473 @@ +//! # An EMBL format to GFF parser +//! +//! +//! You are able to parse genbank and save as a GFF (gff3) format as well as extracting DNA sequences, gene DNA sequences (ffn) and protein fasta sequences (faa) +//! +//! You can also create new records and save as a embl (gbk) format +//! +//! ## Detailed Explanation +//! +//! +//! The Embl parser contains: +//! +//! Records - a top level structure which consists of either one record (single embl) or multiple instances of record (multi-embl). +//! +//! Each Record contains: +//! +//! 1. A source, ```SourceAttributes```, construct(enum) of counter (source name), start, stop [of source or contig], organism, mol_type, strain, type_material, db_xref +//! 2. Features, ```FeatureAttributes```, construct(enum) of counter (locus tag), gene (if present), product, codon start, strand, start, stop [of cds/gene] +//! 3. Sequence features, ```SequenceAttributes```, construct(enum) of counter (locus tag), sequence_ffn (DNA gene sequence) sequence_faa (protein translation), strand, codon start, start, stop [cds/gene] +//! 4. The DNA sequence of the whole record (or contig) +//! +//! Example to extract and print all the protein sequence fasta, example using getters (or get_ functionality), simplified embl! macro +//! +//!```rust +//! use clap::Parser; +//! use std::{ +//! fs::File, +//! io, +//! }; +//! use microBioRust::{embl, embl::{Reader}}; +//! +//! #[derive(Parser, Debug)] +//! #[clap(author, version, about)] +//! struct Arguments { +//! #[clap(short, long)] +//! filename: String, +//! } +//! +//! pub fn genbank_to_faa() -> Result<(), anyhow::Error> { +//! let args = Arguments::parse(); +//! let records = embl!(&args.filename); +//! for record in records { +//! for (k, v) in &record.cds.attributes { +//! if let Some(seq) = record.seq_features.get_sequence_faa(k) { +//! println!(">{}|{}\n{}", &record.id, &k, seq); +//! } +//! } +//! } +//! return Ok(()); +//! } +//!``` +//! +//! Example to extract protein sequence from embl file, debugging use +//!```rust +//! use clap::Parser; +//! use std::{ +//! fs::File, +//! io, +//! }; +//! use microBioRust::embl::Reader; +//! +//! #[derive(Parser, Debug)] +//! #[clap(author, version, about)] +//! struct Arguments { +//! #[clap(short, long)] +//! filename: String, +//! } +//! +//! pub fn embl_to_faa() -> Result<(), anyhow::Error> { +//! let args = Arguments::parse(); +//! let file_embl = File::open(args.filename)?; +//! let mut reader = Reader::new(file_embl); +//! let mut records = reader.records(); +//! loop { +//! //collect from each record advancing on a next record basis, count cds records +//! match records.next() { +//! Some(Ok(mut record)) => { +//! for (k, v) in &record.cds.attributes { +//! match record.seq_features.get_sequence_faa(&k) { +//! Some(value) => { let seq_faa = value.to_string(); +//! println!(">{}|{}\n{}", &record.id, &k, seq_faa); +//! }, +//! _ => (), +//! }; +//! } +//! }, +//! Some(Err(e)) => { println!("Error encountered - an err {:?}", e); }, +//! None => break, +//! } +//! } +//! return Ok(()); +//! } +//!``` +//! +//! +//! Example to save a provided multi- or single genbank file as a GFF file (by joining any multi-genbank) +//! +//! +//! ```rust +//! use microBioRust::embl::{gff_write, Reader, Record}; +//! use std::collections::BTreeMap; +//! use std::{ +//! fs::File, +//! io, +//! }; +//! use clap::Parser; +//! +//! #[derive(Parser, Debug)] +//! #[clap(author, version, about)] +//! struct Arguments { +//! #[clap(short, long)] +//! filename: String, +//! } +//! +//! pub fn embl_to_gff() -> io::Result<()> { +//! let args = Arguments::parse(); +//! let file_embl = File::open(&args.filename)?; +//! let prev_start: u32 = 0; +//! let mut prev_end: u32 = 0; +//! let mut reader = Reader::new(file_embl); +//! let mut records = reader.records(); +//! let mut read_counter: u32 = 0; +//! let mut seq_region: BTreeMap = BTreeMap::new(); +//! let mut record_vec: Vec = Vec::new(); +//! loop { +//! match records.next() { +//! Some(Ok(mut record)) => { +//! //println!("next record"); +//! //println!("Record id: {:?}", record.id); +//! let source = record.source_map.source_name.clone().expect("issue collecting source name"); +//! let beginning = match record.source_map.get_start(&source) { +//! Some(value) => value.get_value(), +//! _ => 0, +//! }; +//! let ending = match record.source_map.get_stop(&source) { +//! Some(value) => value.get_value(), +//! _ => 0, +//! }; +//! if ending + prev_end < beginning + prev_end { +//! println!("debug: end value smaller is than the start {:?}", beginning); +//! } +//! seq_region.insert(source, (beginning + prev_end, ending + prev_end)); +//! record_vec.push(record); +//! // Add additional fields to print if needed +//! read_counter+=1; +//! prev_end+=ending; // create the joined record if there are multiple +//! }, +//! Some(Err(e)) => { println!("theres an err {:?}", e); }, +//! None => { +//! println!("finished iteration"); +//! break; }, +//! } +//! } +//! let output_file = format!("{}.gff", &args.filename); +//! gff_write(seq_region.clone(), record_vec, &output_file, true); +//! println!("Total records processed: {}", read_counter); +//! return Ok(()); +//! } +//!``` +//! Example to create a completely new record, use of setters or set_ functionality +//! +//! To write into GFF format requires gff_write(seq_region, record_vec, filename, true or false) +//! +//! The seq_region is the region of interest to save with name and DNA coordinates such as ``` seqregion.entry("source_1".to_string(), (1,897))``` +//! This makes it possible to save the whole file or to subset it +//! +//! record_vec is a list of the records. If there is only one record, include this as a vec using ``` vec![record] ``` +//! +//! The boolean true/false describes whether the DNA sequence should be included in the GFF3 file +//! +//! To write into embl format requires embl_write(seq_region, record_vec, filename), no true or false since embl format will include the DNA sequence +//! +//! +//! ```rust +//! use microBioRust::embl::{gff_write, RangeValue, Record}; +//! use std::collections::BTreeMap; +//! +//! pub fn create_new_record() -> Result<(), anyhow::Error> { +//! let filename = format!("new_record.gff"); +//! let mut record = Record::new(); +//! let mut seq_region: BTreeMap = BTreeMap::new(); +//! //example from E.coli K12 +//! seq_region.insert("source_1".to_string(), (1,897)); +//! //Add the source into SourceAttributes +//! record.source_map +//! .set_counter("source_1".to_string()) +//! .set_start(RangeValue::Exact(1)) +//! .set_stop(RangeValue::Exact(897)) +//! .set_organism("Escherichia coli".to_string()) +//! .set_mol_type("DNA".to_string()) +//! .set_strain("K-12 substr. MG1655".to_string()) +//! .set_type_material("type strain of Escherichia coli K12".to_string()) +//! .set_db_xref("PRJNA57779".to_string()); +//! //Add the features into FeatureAttributes, here we are setting two features, i.e. coding sequences or genes +//! record.cds +//! .set_counter("b3304".to_string()) +//! .set_start(RangeValue::Exact(1)) +//! .set_stop(RangeValue::Exact(354)) +//! .set_gene("rplR".to_string()) +//! .set_product("50S ribosomal subunit protein L18".to_string()) +//! .set_codon_start(1) +//! .set_strand(-1); +//! record.cds +//! .set_counter("b3305".to_string()) +//! .set_start(RangeValue::Exact(364)) +//! .set_stop(RangeValue::Exact(897)) +//! .set_gene("rplF".to_string()) +//! .set_product("50S ribosomal subunit protein L6".to_string()) +//! .set_codon_start(1) +//! .set_strand(-1); +//! //Add the sequences for the coding sequence (CDS) into SequenceAttributes +//! record.seq_features +//! .set_counter("b3304".to_string()) +//! .set_start(RangeValue::Exact(1)) +//! .set_stop(RangeValue::Exact(354)) +//! .set_sequence_ffn("ATGGATAAGAAATCTGCTCGTATCCGTCGTGCGACCCGCGCACGCCGCAAGCTCCAGGAG +//!CTGGGCGCAACTCGCCTGGTGGTACATCGTACCCCGCGTCACATTTACGCACAGGTAATT +//!GCACCGAACGGTTCTGAAGTTCTGGTAGCTGCTTCTACTGTAGAAAAAGCTATCGCTGAA +//!CAACTGAAGTACACCGGTAACAAAGACGCGGCTGCAGCTGTGGGTAAAGCTGTCGCTGAA +//!CGCGCTCTGGAAAAAGGCATCAAAGATGTATCCTTTGACCGTTCCGGGTTCCAATATCAT +//!GGTCGTGTCCAGGCACTGGCAGATGCTGCCCGTGAAGCTGGCCTTCAGTTCTAA".to_string()) +//! .set_sequence_faa("MDKKSARIRRATRARRKLQELGATRLVVHRTPRHIYAQVIAPNGSEVLVAASTVEKAIAE +//!QLKYTGNKDAAAAVGKAVAERALEKGIKDVSFDRSGFQYHGRVQALADAAREAGLQF".to_string()) +//! .set_codon_start(1) +//! .set_strand(-1); +//! record.seq_features +//! .set_counter("bb3305".to_string()) +//! .set_start(RangeValue::Exact(364)) +//! .set_stop(RangeValue::Exact(897)) +//! .set_sequence_ffn("ATGTCTCGTGTTGCTAAAGCACCGGTCGTTGTTCCTGCCGGCGTTGACGTAAAAATCAAC +//!GGTCAGGTTATTACGATCAAAGGTAAAAACGGCGAGCTGACTCGTACTCTCAACGATGCT +//!GTTGAAGTTAAACATGCAGATAATACCCTGACCTTCGGTCCGCGTGATGGTTACGCAGAC +//!GGTTGGGCACAGGCTGGTACCGCGCGTGCCCTGCTGAACTCAATGGTTATCGGTGTTACC +//!GAAGGCTTCACTAAGAAGCTGCAGCTGGTTGGTGTAGGTTACCGTGCAGCGGTTAAAGGC +//!AATGTGATTAACCTGTCTCTGGGTTTCTCTCATCCTGTTGACCATCAGCTGCCTGCGGGT +//!ATCACTGCTGAATGTCCGACTCAGACTGAAATCGTGCTGAAAGGCGCTGATAAGCAGGTG +//!ATCGGCCAGGTTGCAGCGGATCTGCGCGCCTACCGTCGTCCTGAGCCTTATAAAGGCAAG +//!GGTGTTCGTTACGCCGACGAAGTCGTGCGTACCAAAGAGGCTAAGAAGAAGTAA".to_string()) +//! .set_sequence_faa("MSRVAKAPVVVPAGVDVKINGQVITIKGKNGELTRTLNDAVEVKHADNTLTFGPRDGYAD +//!GWAQAGTARALLNSMVIGVTEGFTKKLQLVGVGYRAAVKGNVINLSLGFSHPVDHQLPAG +//!ITAECPTQTEIVLKGADKQVIGQVAADLRAYRRPEPYKGKGVRYADEVVRTKEAKKK".to_string()) +//! .set_codon_start(1) +//! .set_strand(-1); +//! //Add the full sequence of the entire record into the record.sequence +//! record.sequence = "TTAGAACTGAAGGCCAGCTTCACGGGCAGCATCTGCCAGTGCCTGGACACGACCATGATA +//!TTGGAACCCGGAACGGTCAAAGGATACATCTTTGATGCCTTTTTCCAGAGCGCGTTCAGC +//!GACAGCTTTACCCACAGCTGCAGCCGCGTCTTTGTTACCGGTGTACTTCAGTTGTTCAGC +//!GATAGCTTTTTCTACAGTAGAAGCAGCTACCAGAACTTCAGAACCGTTCGGTGCAATTAC +//!CTGTGCGTAAATGTGACGCGGGGTACGATGTACCACCAGGCGAGTTGCGCCCAGCTCCTG +//!GAGCTTGCGGCGTGCGCGGGTCGCACGACGGATACGAGCAGATTTCTTATCCATAGTGTT +//!ACCTTACTTCTTCTTAGCCTCTTTGGTACGCACGACTTCGTCGGCGTAACGAACACCCTT +//!GCCTTTATAAGGCTCAGGACGACGGTAGGCGCGCAGATCCGCTGCAACCTGGCCGATCAC +//!CTGCTTATCAGCGCCTTTCAGCACGATTTCAGTCTGAGTCGGACATTCAGCAGTGATACC +//!CGCAGGCAGCTGATGGTCAACAGGATGAGAGAAACCCAGAGACAGGTTAATCACATTGCC +//!TTTAACCGCTGCACGGTAACCTACACCAACCAGCTGCAGCTTCTTAGTGAAGCCTTCGGT +//!AACACCGATAACCATTGAGTTCAGCAGGGCACGCGCGGTACCAGCCTGTGCCCAACCGTC +//!TGCGTAACCATCACGCGGACCGAAGGTCAGGGTATTATCTGCATGTTTAACTTCAACAGC +//!ATCGTTGAGAGTACGAGTCAGCTCGCCGTTTTTACCTTTGATCGTAATAACCTGACCGTT +//!GATTTTTACGTCAACGCCGGCAGGAACAACGACCGGTGCTTTAGCAACACGAGACAT".to_string(); +//! gff_write(seq_region, vec![record], &filename, true); +//! return Ok(()); +//! } +//!``` +//! + +use std::{ + io::{self, Write}, + fs::{self, OpenOptions, File}, + vec::Vec, + str, + convert::{AsRef, TryInto}, + path::Path, + collections::{BTreeMap, HashSet}, +}; +use regex::Regex; +use protein_translate::translate; +use bio::alphabets::dna::revcomp; +use anyhow::{anyhow, Context}; +use paste::paste; +use chrono::prelude::*; + + +/// import macro to create get_ functions for the values and +/// macro to create the set_ functions for the values in a Builder format +use crate::{create_getters, create_builder}; + + +#[macro_export] +macro_rules! embl { + ($filename:expr) => {{ + use std::fs::File; + use std::io::BufReader; + let file = File::open($filename) + .unwrap_or_else(|e| panic!("Could not open file {}: {}", $filename, e)); + let mut reader = $crate::embl::Reader::new(file); + let mut vec = Vec::new(); + for rec in reader.records() { + match rec { + Ok(r) => { println!("this is r {:?}", &r); + vec.push(r); + } + Err(e) => panic!("Error reading record: {:?}", e), + } + } + vec + }}; +} + + +//const MAX_EMBL_BUFFER_SIZE: usize = 512; +/// An EMBL reader. + +#[derive(Debug)] +pub struct Records +where + B: io::BufRead, +{ + reader: Reader, + error_has_occurred: bool, +} + +impl Records +where + B: io::BufRead, +{ + #[allow(unused_mut)] + pub fn new(mut reader: Reader) -> Self { + Records { + reader: reader, + error_has_occurred: false, + } + } +} + +impl Iterator for Records +where + B: io::BufRead, +{ + type Item = Result; + + fn next(&mut self) -> Option> { + if self.error_has_occurred { + println!("error was encountered in iteration"); + None + } else { + let mut record = Record::new(); + match self.reader.read(&mut record) { + Ok(_) => { if record.is_empty() { + None } + else { + Some(Ok(record)) + } + } + Err(err) => { + //println!("we encountered an error {:?}", &err); + self.error_has_occurred = true; + Some(Err(anyhow!("next record read error {:?}",err))) + } + } + } + } +} + +pub trait EmblRead { + fn read(&mut self, record: &mut Record) -> Result; +} + +///per line reader for the file +#[derive(Debug, Default)] +pub struct Reader { + reader: B, + line_buffer: String, +} + +impl Reader> { + /// Read Embl from given file path in given format. + pub fn from_file + std::fmt::Debug>(path: P) -> anyhow::Result { + fs::File::open(&path) + .map(Reader::new) + .with_context(|| format!("Failed to read Embl from {:#?}", path)) + } +} + +impl Reader> +where + R: io::Read, +{ + //// Create a new Embl reader given an instance of `io::Read` in given format + pub fn new(reader: R) -> Self { + Reader { + reader: io::BufReader::new(reader), + line_buffer: String::new(), + } + } +} + +impl Reader +where + B: io::BufRead, +{ + pub fn from_bufread(bufreader: B) -> Self { + Reader { + reader: bufreader, + line_buffer: String::new(), + } + } + //return an iterator over the records of the genbank file + pub fn records(self) -> Records { + Records { + reader: self, + error_has_occurred: false, + } + } +} + +///main embl parser +impl<'a, B> EmblRead for Reader +where + B: io::BufRead, +{ + #[allow(unused_mut)] + #[allow(unused_variables)] + #[allow(unused_assignments)] + fn read(&mut self, record: &mut Record) -> Result { + record.rec_clear(); + //println!("reading new record"); + //initialise variables + let mut sequences = String::new(); + let mut source_map = SourceAttributeBuilder::new(); + let mut cds = FeatureAttributeBuilder::new(); + let mut seq_features = SequenceAttributeBuilder::new(); + let mut cds_counter: i32 = 0; + let mut source_counter: i32 = 0; + let mut prev_end: u32 = 0; + let mut organism = String::new(); + let mut mol_type = String::new(); + let mut strain = String::new(); + let mut source_name = String::new(); + let mut type_material = String::new(); + let mut theend: u32 = 0; + let mut thestart: u32 = 0; + let mut db_xref = String::new(); + //check if there are any more lines, if not return the record as is + if self.line_buffer.is_empty() { + self.reader.read_line(&mut self.line_buffer)?; + if self.line_buffer.is_empty() { + return Ok(record.to_owned()); + } + } + //main loop to populate the attributes and iterate through the file + 'outer: while !self.line_buffer.is_empty() { + //println!("is line buffer {:?}", &self.line_buffer); + //collect the header fields + if self.line_buffer.starts_with("ID") { + record.rec_clear(); + let mut header_fields: Vec<&str> = self.line_buffer.split_whitespace().collect(); + let header_len = header_fields.len(); + //println!("these are the header fields {:?}", &header_fields); + let mut header_iter = header_fields.iter(); + header_iter.next(); + record.id = header_iter.next() + .ok_or_else(|| anyhow::anyhow!("missing record id"))? // Get &str or error + .to_string(); + if record.id.ends_with(";") { + record.id.pop(); + } + //println!("so record id is {:?}", &record.id); + header_iter.next(); + header_iter.next(); + header_iter.next(); + header_iter.next(); + header_iter.next(); + header_iter.next(); + header_iter.next(); + let lens = header_iter.next() + .ok_or_else(|| anyhow::anyhow!("missing record length"))? // Get &str or error + .to_string(); + //println!("just before length {:?}", &lens); + record.length = lens.trim().parse::()?; + self.line_buffer.clear(); + } + //collect the source fields and populate the source_map and source_attributes + if self.line_buffer.starts_with("FT source") { + let re = Regex::new(r"([0-9]+)[[:punct:]]+([0-9]+)")?; + let location = re.captures(&self.line_buffer).ok_or_else(|| anyhow::anyhow!("missing location"))?; + let start = &location[1]; + let end = &location[2]; + thestart = start.trim().parse::()?; + source_counter+=1; + source_name = format!("source_{}_{}",record.id,source_counter).to_string(); + thestart += prev_end; + theend = end.trim().parse::()? + prev_end; + //println!("so the start and end are {:?} {:?}", &thestart, &theend); + loop { + self.line_buffer.clear(); + self.reader.read_line(&mut self.line_buffer)?; + if self.line_buffer.starts_with("FT CDS") { + //println!("this source name {:?} start {:?} end {:?} organism {:?} mol_type {:?} strain {:?} type_material {:?} db_xref {:?}", &source_name,&thestart, &theend, &organism, &mol_type, &strain, &type_material, &db_xref); + record.source_map + .set_counter(source_name.to_string()) + .set_start(RangeValue::Exact(thestart)) + .set_stop(RangeValue::Exact(theend)) + .set_organism(organism.clone()) + .set_mol_type(mol_type.clone()) + .set_strain(strain.clone()) + // culture_collection.clone() + .set_type_material(type_material.clone()) + .set_db_xref(db_xref.clone()); + continue 'outer; + } + if self.line_buffer.contains("/organism") { + let org: Vec<&str> = self.line_buffer.split('\"').collect(); + organism = org[1].to_string(); + } + if self.line_buffer.contains("/mol_type") { + let mol: Vec<&str> = self.line_buffer.split('\"').collect(); + mol_type = mol[1].to_string(); + } + if self.line_buffer.contains("/strain") { + let stra: Vec<&str> = self.line_buffer.split('\"').collect(); + strain = stra[1].to_string(); + } + // if self.line_buffer.contains("/culture_collection") { + // let cc: Vec<&str> = self.line_buffer.split('\"').collect(); + // culture_collection = cc[1].to_string(); + // } + if self.line_buffer.contains("/type_material") { + let mat: Vec<&str> = self.line_buffer.split('\"').collect(); + type_material = mat[1].to_string(); + } + if self.line_buffer.contains("/db_xref") { + let db: Vec<&str> = self.line_buffer.split('\"').collect(); + db_xref = db[1].to_string(); + } + } + } + //populate the FeatureAttributes and the coding sequence annotation + if self.line_buffer.starts_with("FT CDS") { + let mut startiter: Vec<_> = Vec::new(); + let mut enditer: Vec<_> = Vec::new(); + let mut thestart: u32 = 0; + let mut thend: u32 = 0; + let mut joined: bool = false; + //gather the feature coordinates + let joined = if self.line_buffer.contains("join") { true } else { false }; + let re = Regex::new(r"([0-9]+)[[:punct:]]+([0-9]+)")?; + //let matches: Vec<®ex::Captures> = re.captures_iter(&self.line_buffer).collect(); + for cap in re.captures_iter(&self.line_buffer) { + cds_counter+=1; + thestart = cap[1].parse().expect("failed to match and parse numerical start"); + theend = cap[2].parse().expect("failed to match and parse numerical end"); + startiter.push(thestart); + enditer.push(theend); + } + let mut gene = String::new(); + let mut product = String::new(); + let strand: i8 = if self.line_buffer.contains("complement") {-1} else {1}; + let mut locus_tag = String::new(); + let mut codon_start: u8 = 1; + //loop to populate the feature attributes, when complete it calls to the outer loop directly to prevent reading a new line into self.line_buffer + loop { + self.line_buffer.clear(); + self.reader.read_line(&mut self.line_buffer)?; + if self.line_buffer.contains("/locus_tag=") { + let loctag: Vec<&str> = self.line_buffer.split('\"').collect(); + locus_tag = loctag[1].to_string(); + //println!("designated locus tag {:?}", &locus_tag); + } + if self.line_buffer.contains("/codon_start") { + let codstart: Vec<&str> = self.line_buffer.split('=').collect(); + let valstart = codstart[1].trim().parse::()?; + codon_start = valstart; + //println!("designated codon start {:?} {:?}", &codon_start, &locus_tag); + } + if self.line_buffer.contains("/gene=") { + let gen: Vec<&str> = self.line_buffer.split('\"').collect(); + gene = gen[1].to_string(); + //println!("gene designated {:?} {:?}", &gene, &locus_tag); + } + if self.line_buffer.contains("/product") { + let prod: Vec<&str> = self.line_buffer.split('\"').collect(); + product = substitute_odd_punctuation(prod[1].to_string())?; + //println!("designated product {:?} {:?}", &product, &locus_tag); + } + if self.line_buffer.starts_with("FT CDS") || self.line_buffer.starts_with("SQ Sequence") || self.line_buffer.starts_with("FT intron") || self.line_buffer.starts_with("FT exon") || self.line_buffer.starts_with(" misc_feature") { + if locus_tag.is_empty() { + locus_tag = format!("CDS_{}",cds_counter).to_string(); + } + if joined { + //println!("currently the start is {:?} and the stop is {:?}", &startiter, &enditer); + for (i, m) in startiter.iter().enumerate() { + let loc_tag = format!("{}_{}",locus_tag.clone(),i); + //check we may need to add or subtract one to m + record.cds + .set_counter(loc_tag) + .set_start(RangeValue::Exact(*m)) + .set_stop(RangeValue::Exact(enditer[i])) + .set_gene(gene.to_string()) + .set_product(product.to_string()) + .set_codon_start(codon_start) + .set_strand(strand); + } + continue 'outer; + } + else { + record.cds + .set_counter(locus_tag.clone()) + .set_start(RangeValue::Exact(thestart)) + .set_stop(RangeValue::Exact(theend)) + .set_gene(gene.to_string()) + .set_product(product.to_string()) + .set_codon_start(codon_start) + .set_strand(strand); + continue 'outer; + } + } + } } + //check if we have reached the DNA sequence section and populate the record sequences field if so. Returns the record on finding end of record mark + if self.line_buffer.starts_with("SQ Sequence") { + //println!("we have reached the sequence"); + let mut sequences = String::new(); + let result_seq = loop { + self.line_buffer.clear(); + self.reader.read_line(&mut self.line_buffer)?; + if self.line_buffer.starts_with("//") { + break sequences; + } else { + let s: Vec<&str> = self.line_buffer.split_whitespace().collect(); + let sequence = if s.len() > 1 { + s[0..s.len() - 1].join("") + } + else { + String::new() + }; + sequences.push_str(&sequence); + } + }; + record.sequence = result_seq.to_string(); + //println!("this is record sequence {:?}", &record.sequence); + let mut iterablecount: u32 = 0; + //Fields are completed and populated for the FeatureAttributes, collect and populate the SequenceAttributes fields + for (key,val) in record.cds.iter_sorted() { + let (mut a, mut b, mut c, mut d): (Option, Option, Option, Option) = (None, None, None, None); + for value in val { + //println!("this is key {:?} value {:?}", &key, &value); + match value { + FeatureAttributes::Start { value } => a = match value { + RangeValue::Exact(v) => Some(*v), + RangeValue::LessThan(v) => Some(*v), // Assign the value even if it's Some(*v), //Assign the value even it's > value + }, + FeatureAttributes::Stop { value } => b = match value { + RangeValue::Exact(v) => Some(*v), + RangeValue::LessThan(v) => Some(*v), // Assign the value even if it's Some(*v), //Assign the value even if it's > value + }, + FeatureAttributes::Strand { value } => c = match value { + value => Some(*value), + }, + FeatureAttributes::CodonStart { value } => d = match value { + value => Some(value.clone()), + }, + _ => (), + } + } + let sta = a.map(|o| o as usize) + .ok_or(anyhow!("No value for start"))?; + let sto = b.map(|t| t as usize) + .ok_or(anyhow!("No value for stop"))? - 1; + let stra = c.map(|u| u as i8) + .ok_or(anyhow!("No value for strand"))?; + let cod = d.map(|v| v as usize - 1) + .ok_or(anyhow!("No value for strand"))?; + + let star = sta.try_into()?; + let stow = sto.try_into()?; + let codd = cod.try_into()?; + let mut sliced_sequence: &str = ""; + //collects the DNA sequence and translations on the correct strand + if stra == -1 { + if cod > 1 { + println!("reverse strand coding start more than one {:?}", &iterablecount); + if sto + 1 <= record.sequence.len() { + sliced_sequence = &record.sequence[sta+cod..sto+1]; + } + else { + sliced_sequence = &record.sequence[sta+cod..sto]; + } + } + else { + println!("record sta {:?} sto {:?} cod {:?} stra {:?} record.seq length {:?}", &sta, &sto, &cod, &stra, &record.sequence.len()); + println!("sliced sta {:?} sliced sto {:?} record.id {:?}", sta, sto, &record.id); + println!("iterable count is {:?} reverse strand codon start one", &iterablecount); + println!("this is the sequence len {:?}", &record.sequence.len()); + if sto + 1 <= record.sequence.len() { + sliced_sequence = &record.sequence[sta..sto+1]; + } + else { + sliced_sequence = &record.sequence[sta..sto]; + } + println!("iterable count after is {:?}", &iterablecount); + } + let cds_char = sliced_sequence; + let prot_seq = translate(&revcomp(cds_char.as_bytes())); + let parts: Vec<&str> = prot_seq.split('*').collect(); + println!("this is the prot_seq {:?}", &prot_seq); + record.seq_features + .set_counter(key.to_string()) + .set_start(RangeValue::Exact(star)) + .set_stop(RangeValue::Exact(stow)) + .set_sequence_ffn(cds_char.to_string()) + .set_sequence_faa(parts[0].to_string()) + .set_codon_start(codd) + .set_strand(stra); + } else { + if cod > 1 { + //println!("forward strand codon value more than one cnt {:?}", &iterablecount); + sliced_sequence = &record.sequence[sta+cod-1..sto]; + } + else { + //println!("forward strand codon value one cnt {:?} start {:?} {:?} {:?}", &iterablecount, &sta, &sto, &record.sequence.len()); + sliced_sequence = &record.sequence[sta-1..sto]; + } + let cds_char = sliced_sequence; + let prot_seq = translate(cds_char.as_bytes()); + let parts: Vec<&str> = prot_seq.split('*').collect(); + //println!("this is on parts {:?}", &parts); + record.seq_features + .set_counter(key.to_string()) + .set_start(RangeValue::Exact(star)) + .set_stop(RangeValue::Exact(stow)) + .set_sequence_ffn(cds_char.to_string()) + .set_sequence_faa(parts[0].to_string()) + .set_codon_start(codd) + .set_strand(stra); + } + } + //return the record when completed + //println!("record seq features {:?}", &record.seq_features); + return Ok(record.to_owned()); + } + //clear the line buffer and read the next to continue back to the outer loop + self.line_buffer.clear(); + self.reader.read_line(&mut self.line_buffer)?; + } + Ok(record.to_owned()) + } +} + +pub use crate::record::RangeValue; + +///stores the details of the source features in genbank (contigs) +#[derive(Debug, Eq, PartialEq, Hash, Clone)] +pub enum SourceAttributes { + Start { value: RangeValue }, + Stop { value: RangeValue }, + Organism { value: String }, + MolType { value: String}, + Strain { value: String}, + CultureCollection { value: String}, + TypeMaterial { value: String}, + DbXref { value:String} +} + +//macro for creating the getters +create_getters!( + SourceAttributeBuilder, + source_attributes, + SourceAttributes, + Start { value: RangeValue }, + Stop { value: RangeValue }, + Organism { value: String }, + MolType { value: String}, + Strain { value: String}, + // CultureCollection { value: String}, + TypeMaterial { value: String}, + DbXref { value:String} +); + +///builder for the source information on a per record basis +#[derive(Debug, Default, Clone)] +pub struct SourceAttributeBuilder { + pub source_attributes: BTreeMap>, + pub source_name: Option, +} + +impl SourceAttributeBuilder { + // Method to set source name + pub fn set_source_name(&mut self, name: String) { + self.source_name = Some(name); + } + + // Method to get source name + pub fn get_source_name(&self) -> Option<&String> { + self.source_name.as_ref() + } + + // Method to add source attributes + pub fn add_source_attribute(&mut self, key: String, attribute: SourceAttributes) { + self.source_attributes + .entry(key) + .or_insert_with(HashSet::new) + .insert(attribute); + } + + // Method to retrieve source attributes for a given key + pub fn get_source_attributes(&self, key: &str) -> Option<&HashSet> { + self.source_attributes.get(key) + } +} + + +create_builder!( + SourceAttributeBuilder, + source_attributes, + SourceAttributes, + source_name, + Start { value: RangeValue }, + Stop { value: RangeValue }, + Organism { value: String }, + MolType { value: String}, + Strain { value: String}, + // CultureCollection { value: String}, + TypeMaterial { value: String}, + DbXref { value:String} +); + +///attributes for each feature, cds or gene +#[derive(Debug, Eq, Hash, PartialEq, Clone)] +pub enum FeatureAttributes { + Start { value: RangeValue }, + Stop { value: RangeValue }, + Gene { value: String }, + Product { value: String }, + CodonStart { value: u8 }, + Strand { value: i8 }, + // ec_number { value: String } +} + + +create_getters!( + FeatureAttributeBuilder, + attributes, + FeatureAttributes, + Start { value: RangeValue }, + Stop { value: RangeValue }, + Gene { value: String }, + Product { value: String }, + CodonStart { value: u8 }, + Strand { value: i8 } +); + +///builder for the feature information on a per coding sequence (CDS) basis +#[derive(Debug, Default, Clone)] +pub struct FeatureAttributeBuilder { + pub attributes: BTreeMap>, + locus_tag: Option, +} + +create_builder!( + FeatureAttributeBuilder, + attributes, + FeatureAttributes, + locus_tag, + Start { value: RangeValue }, + Stop { value: RangeValue }, + Gene { value: String }, + Product { value: String }, + CodonStart { value: u8 }, + Strand { value: i8 } +); + +///stores the sequences of the coding sequences (genes) and proteins. Also stores start, stop, codon_start and strand information +#[derive(Debug, Eq, PartialEq, Hash, Clone)] +pub enum SequenceAttributes { + Start { value: RangeValue }, + Stop { value: RangeValue }, + SequenceFfn { value: String }, + SequenceFaa { value: String }, + CodonStart { value: u8 }, + Strand { value: i8 }, +} + +create_getters!( + SequenceAttributeBuilder, + seq_attributes, + SequenceAttributes, + Start { value: RangeValue }, + Stop { value: RangeValue }, + SequenceFfn { value: String}, + SequenceFaa { value: String}, + CodonStart { value: u8}, + Strand { value: i8} +); + +///builder for the sequence information on a per coding sequence (CDS) basis +#[derive(Debug, Default, Clone)] +pub struct SequenceAttributeBuilder { + pub seq_attributes: BTreeMap>, + locus_tag: Option, +} + +create_builder!( + SequenceAttributeBuilder, + seq_attributes, + SequenceAttributes, + locus_tag, + Start { value: RangeValue }, + Stop { value: RangeValue }, + SequenceFfn { value: String}, + SequenceFaa { value: String}, + CodonStart { value: u8 }, + Strand { value: i8 } +); + +///product lines can contain difficult to parse punctuation such as biochemical symbols like unclosed single quotes, superscripts, single and double brackets etc. +///here we substitute these for an underscore +pub fn substitute_odd_punctuation(input: String) -> Result { + let re = Regex::new(r"[/?()',`]|[α-ωΑ-Ω]")?; + + // Strip either \r\n or \n more elegantly + let cleaned = input.trim_end_matches(&['\r', '\n'][..]); + + Ok(re.replace_all(cleaned, "_").to_string()) +} + +///GFF3 field9 construct +#[derive(Debug)] +pub struct GFFInner { + id: String, + name: String, + locus_tag: String, + gene: String, + // Inference: String, + // Parent: String, + // db_xref: String, + product: String, + // is_circular: bool, +} + +impl GFFInner { + pub fn new( + id: String, + name: String, + locus_tag: String, + gene: String, + // Inference: String, + // Parent: String, + // db_xref: String, + product: String, + ) -> Self { + GFFInner { + id, name, locus_tag, gene, product, + } + } +} + +///The main GFF3 construct +#[derive(Debug)] +pub struct GFFOuter<'a> { + seqid: String, + source: String, + type_val: String, + start: u32, + end: u32, + score: f64, + strand: String, + phase: u8, + attributes: &'a GFFInner, +} + +impl<'a> GFFOuter<'a> { + pub fn new( + seqid: String, + source: String, + type_val: String, + start: u32, + end: u32, + score: f64, + strand: String, + phase: u8, + attributes: &'a GFFInner + ) -> Self { + GFFOuter { + seqid, source, type_val, start, end, score, strand, phase, attributes, + } + } + pub fn field9_attributes_build(&self) -> String { + let mut full_field9 = Vec::new(); + if !self.attributes.id.is_empty() { + full_field9.push(format!("id={}",self.attributes.id)); + } + if !self.attributes.name.is_empty() { + full_field9.push(format!("name={}", self.attributes.name)); + } + if !self.attributes.gene.is_empty() { + full_field9.push(format!("gene={}",self.attributes.gene)); + } + // if !self.attributes.Inference.is_empty() { + // full_field9.push(format!("inference={}",self.attributes.Inference)); +// } + if !self.attributes.locus_tag.is_empty() { + full_field9.push(format!("locus_tag={}",self.attributes.locus_tag)); + } + if !self.attributes.product.is_empty() { + full_field9.push(format!("product={}",self.attributes.product)); + } + // if !self.attributes.Parent.is_empty() { + // full_field9.push(format!("Parent={}",self.attributes.Parent)); +// } +// if !self.attributes.db_xref.is_empty() { +// full_field9.push(format!("db_xref={}",self.attributes.db_xref)); +// } + full_field9.join(";") + } +} + +///formats the translation string which can be mulitple lines, for embl +pub fn format_translation(translation: &str) -> String { + //create method to add the protein sequence into the translation qualifier with correct line lengths + let mut formatted = String::new(); + let cleaned_translation = translation.replace("\n", ""); + formatted.push_str(" /translation=\""); + let line_length: usize = 60; + let final_num = line_length - 15; + formatted.push_str(&format!("{}\n",&cleaned_translation[0..final_num])); + for i in (47..translation.len()).step_by(60) { + let end = i+60 -1; + let valid_end = if end >= translation.len() { &cleaned_translation.len() -1 } else { end }; + formatted.push_str(&format!(" {}",&cleaned_translation[i..valid_end])); + println!("cleaned translation leng is {:?}", &cleaned_translation[i..valid_end].len()); + if *&cleaned_translation[i..valid_end].len() < 59 { + formatted.push('\"'); + } + else { + formatted.push('\n'); + } + } + formatted +} + +///writes the DNA sequence in gbk format with numbering +pub fn write_gbk_format_sequence(sequence: &str,file: &mut File) -> io::Result<()> { + //function to write gbk format sequence + writeln!(file, "ORIGIN")?; + let mut formatted = String::new(); + let cleaned_input = sequence.replace("\n", ""); + let mut index = 1; + for (_i, chunk) in cleaned_input.as_bytes().chunks(60).enumerate() { + formatted.push_str(&format!("{:>5} ", index)); + for (j, sub_chunk) in chunk.chunks(10).enumerate() { + if j > 0 { + formatted.push(' '); + } + formatted.push_str(&String::from_utf8_lossy(sub_chunk)); + } + formatted.push('\n'); + index+=60; + } + writeln!(file, "{:>6}", &formatted)?; + writeln!(file, "//")?; + Ok(()) +} + +///saves the parsed data in genbank format +//writes a genbank or multi-genbank file +pub fn gbk_write(seq_region: BTreeMap, record_vec: Vec, filename: &str) -> io::Result<()> { + let now = Local::now(); + let formatted_date = now.format("%d-%b-%Y").to_string().to_uppercase(); + let mut file = OpenOptions::new() + .write(true) // Allow writing to the file + .append(true) // Enable appending to the file + .create(true) // Create the file if it doesn't exist + .open(filename)?; + for (i, (key, _val)) in seq_region.iter().enumerate() { + let strain = match &record_vec[i].source_map.get_strain(key) { + Some(value) => value.to_string(), + None => "Unknown".to_string(), + }; + //write lines for the header + let organism = match &record_vec[i].source_map.get_organism(key) { + Some(value) => value.to_string(), + None => "Unknown".to_string(), + }; + let mol_type = match &record_vec[i].source_map.get_mol_type(key) { + Some(value) => value.to_string(), + None => "Unknown".to_string(), + }; + let type_material = match &record_vec[i].source_map.get_type_material(&key) { + Some(value) => value.to_string(), + None => "Unknown".to_string(), + }; + let db_xref = match &record_vec[i].source_map.get_db_xref(key) { + Some(value) => value.to_string(), + None => "Unknown".to_string(), + }; + let source_stop = match &record_vec[i].source_map.get_stop(key) { + Some(value) => value.get_value(), + None => { println!("stop value not found"); + None }.expect("stop value not received") + }; + writeln!(file, "LOCUS {} {} bp DNA linear CON {}", &key,&record_vec[i].sequence.len(),&formatted_date)?; + writeln!(file, "DEFINITION {} {}.", &organism, &strain)?; + writeln!(file, "ACCESSION {}", &key)?; + writeln!(file, "KEYWORDS .")?; + writeln!(file, "SOURCE {} {}", &organism,&strain)?; + writeln!(file, " ORGANISM {} {}", &organism,&strain)?; + //write lines for the source + writeln!(file, "FEATURES Location/Qualifiers")?; + writeln!(file, " source 1..{}", &source_stop)?; + writeln!(file, " /organism=\"{}\"",&strain)?; + writeln!(file, " /mol_type=\"{}\"",&mol_type)?; + writeln!(file, " /strain=\"{}\"",&strain)?; + if type_material != *"Unknown".to_string() { + writeln!(file, " /type_material=\"{}\"",&type_material)?; + } + writeln!(file, " /db_xref=\"{}\"",&db_xref)?; + //write lines for each CDS + for (locus_tag, _value) in &record_vec[i].cds.attributes { + let start = match &record_vec[i].cds.get_start(locus_tag) { + Some(value) => value.get_value(), + None => { println!("start value not found"); + None }.expect("start value not received") + }; + let stop = match &record_vec[i].cds.get_stop(locus_tag) { + Some(value) => value.get_value(), + None => { println!("stop value not found"); + None }.expect("stop value not received") + }; + let product = match &record_vec[i].cds.get_product(locus_tag) { + Some(value) => value.to_string(), + None => "unknown product".to_string(), + }; + let strand = match &record_vec[i].cds.get_strand(locus_tag) { + Some(value) => **value, + None => 0, + }; + let codon_start = match &record_vec[i].cds.get_codon_start(locus_tag) { + Some(value) => **value, + None => 0, + }; + let gene = match &record_vec[i].cds.get_gene(locus_tag) { + Some(value) => value.to_string(), + None => "unknown".to_string(), + }; + let translation = match &record_vec[i].seq_features.get_sequence_faa(locus_tag) { + Some(value) => value.to_string(), + None => "unknown".to_string(), + }; + if strand == 1 { + writeln!(file, " gene {}..{}",&start,&stop)?; + } else { + writeln!(file, " gene complement({}..{})",&start,&stop)?; + } + writeln!(file, " /locus_tag=\"{}\"",&locus_tag)?; + if strand == 1 { + writeln!(file, " CDS {}..{}",&start,&stop)?; + } + else { + writeln!(file, " CDS complement({}..{})",&start,&stop)?; + } + writeln!(file, " /locus_tag=\"{}\"",&locus_tag)?; + writeln!(file, " /codon_start=\"{}\"", &codon_start)?; + if gene != "unknown" { + writeln!(file, " /gene=\"{}\"", &gene)?; + } + if translation != "unknown" { + let formatted_translation = format_translation(&translation); + writeln!(file, "{}", &formatted_translation)?; + } + writeln!(file, " /product=\"{}\"",&product)?; + } + write_gbk_format_sequence(&record_vec[i].sequence, &mut file)?; + } + Ok(()) +} + +///saves the parsed data in gff3 format +//writes a gff3 file from a genbank +#[allow(unused_assignments)] +#[allow(unused_variables)] +pub fn gff_write(seq_region: BTreeMap, mut record_vec: Vec, filename: &str, dna: bool) -> io::Result<()> { + let mut file = OpenOptions::new() + //.write(true) // Allow writing to the file + .append(true) // Enable appending to the file + .create(true) // Create the file if it doesn't exist + .open(filename)?; + if file.metadata()?.len() == 0 { + writeln!(file, "##gff-version 3")?; + } + let mut full_seq = String::new(); + let mut prev_end: u32 = 0; + //println!("this is the full seq_region {:?}", &seq_region); + for (k, v) in seq_region.iter() { + writeln!(file, "##sequence-region\t{}\t{}\t{}", &k, v.0, v.1)?; + } + for ((source_name, (seq_start, seq_end)), record) in seq_region.iter().zip(record_vec.drain(..)) { + if dna == true { + full_seq.push_str(&record.sequence); + } + for (locus_tag, _valu) in &record.cds.attributes { + let start = match record.cds.get_start(&locus_tag) { + Some(value) => value.get_value(), + None => { println!("start value not found"); + None }.expect("start value not received") + }; + let stop = match record.cds.get_stop(&locus_tag) { + Some(value) => value.get_value(), + None => { println!("stop value not found"); + None }.expect("stop value not received") + }; + let gene = match record.cds.get_gene(&locus_tag) { + Some(value) => value.to_string(), + None => "unknown".to_string(), + }; + let product = match record.cds.get_product(&locus_tag) { + Some(value) => value.to_string(), + None => "unknown product".to_string(), + }; + let strand = match record.cds.get_strand(&locus_tag) { + Some(valu) => { + match valu { + 1 => "+".to_string(), + -1 => "-".to_string(), + _ => { println!("unexpected strand value {} for locus_tag {}", valu, &locus_tag); + "unknownstrand".to_string() } + } + }, + None => "unknownvalue".to_string(), + }; + let phase = match record.cds.get_codon_start(&locus_tag) { + Some(valuer) => { + match valuer { + 1 => 0, + 2 => 1, + 3 => 2, + _ => { println!("unexpected phase value {} in the bagging area for locus_tag {}", valuer, &locus_tag); + 1 } + } + }, + None => 1, + }; + let gff_inner = GFFInner::new( + locus_tag.to_string(), + source_name.clone(), + locus_tag.to_string(), + gene, + // &record.cds.get_Inference(&locus_tag), + // &record.cds.get_Parent(&locus_tag), + // db_xref, + product, + ); + let gff_outer = GFFOuter::new( + source_name.clone(), + ".".to_string(), + "CDS".to_string(), + start + prev_end, + stop + prev_end, + 0.0, + strand, + phase, + &gff_inner, + ); + let field9_attributes = gff_outer.field9_attributes_build(); + //println!("{}\t{}\t{}\t{:?}\t{:?}\t{}\t{}\t{}\t{}", gff_outer.seqid, gff_outer.source, gff_outer.type_val, gff_outer.start, gff_outer.end, gff_outer.score, gff_outer.strand, gff_outer.phase, field9_attributes); + writeln!(file, "{}\t{}\t{}\t{:?}\t{:?}\t{}\t{}\t{}\t{}", gff_outer.seqid, gff_outer.source, gff_outer.type_val, gff_outer.start, gff_outer.end, gff_outer.score, gff_outer.strand, gff_outer.phase, field9_attributes)?; + + } + prev_end = *seq_end; + } + if dna { + writeln!(file, "##FASTA")?; + //writeln!(file, ">{}\n",&filename.to_string())?; + writeln!(file, "{}", full_seq)?; + } + Ok(()) +} + + +///internal record containing data from a single source or contig. Has multiple features. +//sets up a record +#[derive(Debug, Clone)] +pub struct Record { + pub id: String, + pub length: u32, + pub sequence: String, + pub start: usize, + pub end: usize, + pub strand: i32, + pub cds: FeatureAttributeBuilder, + pub source_map: SourceAttributeBuilder, + pub seq_features: SequenceAttributeBuilder, +} + +impl Record { + /// Create a new instance. + pub fn new() -> Self { + Record { + id: "".to_owned(), + length: 0, + sequence: "".to_owned(), + start: 0, + end: 0, + strand: 0, + source_map: SourceAttributeBuilder::new(), + cds: FeatureAttributeBuilder::new(), + seq_features: SequenceAttributeBuilder::new(), + } + } + pub fn is_empty(&mut self) -> bool { + self.id.is_empty() && self.length == 0 + } + pub fn check(&mut self) -> Result<(), &str> { + if self.id().is_empty() { + return Err("Expecting id for Embl record."); + } + Ok(()) + } + pub fn id(&mut self) -> &str { + &self.id + } + pub fn length(&mut self) -> u32 { + self.length + } + pub fn sequence(&mut self) -> &str { + &self.sequence + } + pub fn start(&mut self) -> u32 { + self.start.try_into().unwrap() + } + pub fn end(&mut self) -> u32 { + self.end.try_into().unwrap() + } + pub fn strand(&mut self) -> i32 { + self.strand + } + pub fn cds(&mut self) -> FeatureAttributeBuilder { + self.cds.clone() + } + pub fn source_map(&mut self) -> SourceAttributeBuilder { + self.source_map.clone() + } + pub fn seq_features(&mut self) -> SequenceAttributeBuilder { + self.seq_features.clone() + } + fn rec_clear(&mut self) { + self.id.clear(); + self.length = 0; + self.sequence.clear(); + self.start = 0; + self.end = 0; + self.strand = 0; + self.source_map = SourceAttributeBuilder::new(); + self.cds = FeatureAttributeBuilder::new(); + self.seq_features = SequenceAttributeBuilder::new(); + } +} + +impl Default for Record { + fn default() -> Self { + Self::new() + } +} + +// Provide a type alias and conversion to a generic record to aid interoperability +pub type GenericRecordEmbl = crate::record::GenericRecord; + +impl From<&Record> for GenericRecordEmbl { + fn from(r: &Record) -> Self { + Self { + id: r.id.clone(), + seq: r.sequence.clone(), + seqid: r.id.clone(), + start: r.start as u32, + end: r.end as u32, + strand: r.strand, + source: r.source_map.clone(), + cds: r.cds.clone(), + seq_features: r.seq_features.clone(), + } + } +} + +#[allow(dead_code)] +pub struct Config { + filename: String, +} + +impl Config { + pub fn new(args: &[String]) -> Result { + if args.len() < 2 { + panic!("not enough arguments, please provide filename"); + } + let filename = args[1].clone(); + + Ok(Config { filename }) + } +} + +#[cfg(test)] +mod tests { + use super::*; + #[test] + #[allow(unused_mut)] + #[allow(unused_variables)] + #[allow(dead_code)] + #[allow(unused_assignments)] + #[allow(unused_imports)] + fn test_read_file() { + let content = std::fs::read_to_string("example.embl").expect("error reading file"); + assert!(content.contains("ID")); + assert!(content.len() > 0); + } + #[test] + #[allow(unused_mut)] + #[allow(unused_variables)] + #[allow(dead_code)] + #[allow(unused_assignments)] + #[allow(unused_imports)] + fn test_parse_embl() { + let file_embl = "example.embl"; + let records = embl!(&file_embl); + assert!(records.len() > 0); + } + #[test] + #[allow(unused_mut)] + #[allow(unused_variables)] + #[allow(dead_code)] + #[allow(unused_assignments)] + #[allow(unused_imports)] + fn test_parse_source_attributes() { + let file_embl = "example.embl"; + let records = embl!(&file_embl); + if let Some(record) = records.first() { + if let Some((key, val)) = record.source_map.source_attributes.first_key_value() { + assert_eq!(key, &"source_AM236082_1".to_string()); + } + } + } + #[test] + #[allow(unused_mut)] + #[allow(unused_variables)] + #[allow(dead_code)] + #[allow(unused_assignments)] + #[allow(unused_imports)] + fn test_parse_cds_attributes() { + let file_embl = "example.embl"; + let records = embl!(&file_embl); + if let Some(record) = records.first() { + if let Some((locus_tag, vals)) = record.cds.attributes.first_key_value() { + assert_eq!(locus_tag, &"pRL80001".to_string()); + assert_eq!(record.cds.get_gene(&locus_tag).as_deref(), Some(&"repAp8".to_string())); + } + } + } + #[test] + #[allow(unused_mut)] + #[allow(unused_variables)] + #[allow(dead_code)] + #[allow(unused_assignments)] + #[allow(unused_imports)] + fn test_parse_sequence_attributes() { + let file_embl = "example.embl"; + let records = embl!(&file_embl); + if let Some(record) = records.first() { + if let Some((key, vals)) = record.cds.attributes.first_key_value() { + assert_eq!(key, &"pRL80001".to_string()); + assert_eq!(record.seq_features.get_sequence_faa(&key), Some(&"VENPAQLQKAIHKLIAAHARDLSGALHEHRVKLYPPEARKTLRSFSSIEAAKLIGVNDGYLRHLSLEGKGPQPEIGNNNRRSYSVETIQALREYLDENGKGDRRYSPRRSGREHLQVITAVNFKGGSGKTTTAAHLAQYLALNGYRVLAIDLDPQASMSALHGFQPEFDVGDNETLYGAVRYDEERRPLKDIIKKTYFANLDLVPGNLELMEFEHDTAKVLGSNDRKNIFFTRMDDAIASVADDYDVVVVDCPPQLGFLTISALCAATAVLVTVHPQMLDVMSMCQFLLMTSELLSVVADAGGSMNYDWMRYLVTRYEPGDGPQNQMVSFMRTMFGDHVLNHPMLKSTAISDAGITKQTLYEVSRDQFTRATYDRAMESLDNVNSEIEQLIQSSWGRK".to_string())); + } + } + } +} + diff --git a/seqmetrics/microBioRust/src/gbk.rs b/seqmetrics/microBioRust/src/gbk.rs new file mode 100644 index 0000000..5bae988 --- /dev/null +++ b/seqmetrics/microBioRust/src/gbk.rs @@ -0,0 +1,1643 @@ +//! # A Genbank to GFF parser +//! +//! +//! You are able to parse genbank and save as a GFF (gff3) format as well as extracting DNA sequences, gene DNA sequences (ffn) and protein fasta sequences (faa) +//! +//! You can also create new records and save as a genbank (gbk) format +//! +//! ## Detailed Explanation +//! +//! +//! The Genbank parser contains: +//! +//! Records - a top level structure which consists of either one record (single genbank) or multiple instances of record (multi-genbank). +//! +//! Each Record contains: +//! +//! 1. A source, ```SourceAttributes```, construct(enum) of counter (source name), start, stop [of source or contig], organism, mol_type, strain, type_material, db_xref +//! 2. Features, ```FeatureAttributes```, construct(enum) of counter (locus tag), gene (if present), product, codon start, strand, start, stop [of cds/gene] +//! 3. Sequence features, ```SequenceAttributes```, construct(enum) of counter (locus tag), sequence_ffn (DNA gene sequence) sequence_faa (protein translation), strand, codon start, start, stop [cds/gene] +//! 4. The DNA sequence of the whole record (or contig) +//! +//! Example to extract and print all the protein sequence fasta, example using getters or get_ functionality +//! +//! +//!```rust +//! use clap::Parser; +//! use std::{ +//! fs::File, +//! io, +//! }; +//! use microBioRust::gbk::Reader; +//! +//! #[derive(Parser, Debug)] +//! #[clap(author, version, about)] +//! struct Arguments { +//! #[clap(short, long)] +//! filename: String, +//! } +//! +//! pub fn genbank_to_faa() -> Result<(), anyhow::Error> { +//! let args = Arguments::parse(); +//! let file_gbk = File::open(args.filename)?; +//! let mut reader = Reader::new(file_gbk); +//! let mut records = reader.records(); +//! loop { +//! //collect from each record advancing on a next record basis, count cds records +//! match records.next() { +//! Some(Ok(mut record)) => { +//! for (k, v) in &record.cds.attributes { +//! match record.seq_features.get_sequence_faa(&k) { +//! Some(value) => { let seq_faa = value.to_string(); +//! println!(">{}|{}\n{}", &record.id, &k, seq_faa); +//! }, +//! _ => (), +//! }; +//! } +//! }, +//! Some(Err(e)) => { println!("Error encountered - an err {:?}", e); }, +//! None => break, +//! } +//! } +//! return Ok(()); +//! } +//!``` +//! +//! Example to extract the protein sequences with simplified genbank! macro use +//! +//!```rust +//! use clap::Parser; +//! use std::{ +//! fs::File, +//! io, +//! }; +//! use microBioRust::{ +//! gbk::Reader, +//! genbank, +//! }; +//! +//! #[derive(Parser, Debug)] +//! #[clap(author, version, about)] +//! struct Arguments { +//! #[clap(short, long)] +//! filename: String, +//! } +//! +//! pub fn genbank_to_faa() -> Result<(), anyhow::Error> { +//! let args = Arguments::parse(); +//! let records = genbank!(&args.filename); +//! for record in records { +//! for (k, v) in &record.cds.attributes { +//! if let Some(seq) = record.seq_features.get_sequence_faa(k) { +//! println!(">{}|{}\n{}", &record.id, &k, seq); +//! } +//! } +//! } +//! return Ok(()); +//! } +//! +//!``` +//! Example to save a provided multi- or single genbank file as a GFF file (by joining any multi-genbank) +//! +//! ```rust +//! use microBioRust::gbk::{gff_write, Reader, Record}; +//! use std::collections::BTreeMap; +//! use std::{ +//! fs::File, +//! io, +//! }; +//! use clap::Parser; +//! +//! #[derive(Parser, Debug)] +//! #[clap(author, version, about)] +//! struct Arguments { +//! #[clap(short, long)] +//! filename: String, +//! } +//! +//! pub fn genbank_to_gff() -> io::Result<()> { +//! let args = Arguments::parse(); +//! let file_gbk = File::open(&args.filename)?; +//! let prev_start: u32 = 0; +//! let mut prev_end: u32 = 0; +//! let mut reader = Reader::new(file_gbk); +//! let mut records = reader.records(); +//! let mut read_counter: u32 = 0; +//! let mut seq_region: BTreeMap = BTreeMap::new(); +//! let mut record_vec: Vec = Vec::new(); +//! loop { +//! match records.next() { +//! Some(Ok(mut record)) => { +//! println!("next record"); +//! println!("Record id: {:?}", record.id); +//! let source = record.source_map.source_name.clone().expect("issue collecting source name"); +//! let beginning = match record.source_map.get_start(&source) { +//! Some(value) => value.get_value(), +//! _ => 0, +//! }; +//! let ending = match record.source_map.get_stop(&source) { +//! Some(value) => value.get_value(), +//! _ => 0, +//! }; +//! if ending + prev_end < beginning + prev_end { +//! println!("debug: end value smaller is than the start {:?}", beginning); +//! } +//! seq_region.insert(source, (beginning + prev_end, ending + prev_end)); +//! record_vec.push(record); +//! // Add additional fields to print if needed +//! read_counter+=1; +//! prev_end+=ending; // create the joined record if there are multiple +//! }, +//! Some(Err(e)) => { println!("theres an err {:?}", e); }, +//! None => { +//! println!("finished iteration"); +//! break; }, +//! } +//! } +//! let output_file = format!("{}.gff", &args.filename); +//! if std::path::Path::new(&output_file).exists() { +//! println!("Deleting existing file: {}", &output_file); +//! std::fs::remove_file(&output_file).expect("NOOO"); +//! } +//! gff_write(seq_region.clone(), record_vec, &output_file, true); +//! println!("Total records processed: {}", read_counter); +//! return Ok(()); +//! } +//!``` +//! Example to create a completely new record, use of setters or set_ functionality +//! +//! To write into GFF format requires gff_write(seq_region, record_vec, filename, true or false) +//! +//! The seq_region is the region of interest to save with name and DNA coordinates such as ``` seqregion.entry("source_1".to_string(), (1,897))``` +//! This makes it possible to save the whole file or to subset it +//! +//! record_vec is a list of the records. If there is only one record, include this as a vec using ``` vec![record] ``` +//! +//! The boolean true/false describes whether the DNA sequence should be included in the GFF3 file +//! +//! To write into genbank format requires gbk_write(seq_region, record_vec, filename), no true or false since genbank format will include the DNA sequence +//! +//! +//! ```rust +//! use microBioRust::gbk::{gff_write, RangeValue, Record}; +//! use std::fs::File; +//! use std::collections::BTreeMap; +//! +//! pub fn create_new_record() -> Result<(), anyhow::Error> { +//! let filename = format!("new_record.gff"); +//! if std::path::Path::new(&filename).exists() { +//! std::fs::remove_file(&filename)?; +//! } +//! let mut record = Record::new(); +//! let mut seq_region: BTreeMap = BTreeMap::new(); +//! //example from E.coli K12 +//! seq_region.insert("source_1".to_string(), (1,897)); +//! //Add the source into SourceAttributes +//! record.source_map +//! .set_counter("source_1".to_string()) +//! .set_start(RangeValue::Exact(1)) +//! .set_stop(RangeValue::Exact(897)) +//! .set_organism("Escherichia coli".to_string()) +//! .set_mol_type("DNA".to_string()) +//! .set_strain("K-12 substr. MG1655".to_string()) +//! .set_type_material("type strain of Escherichia coli K12".to_string()) +//! .set_db_xref("PRJNA57779".to_string()); +//! //Add the features into FeatureAttributes, here we are setting two features, i.e. coding sequences or genes +//! record.cds +//! .set_counter("b3304".to_string()) +//! .set_start(RangeValue::Exact(1)) +//! .set_stop(RangeValue::Exact(354)) +//! .set_gene("rplR".to_string()) +//! .set_product("50S ribosomal subunit protein L18".to_string()) +//! .set_codon_start(1) +//! .set_strand(-1); +//! record.cds +//! .set_counter("b3305".to_string()) +//! .set_start(RangeValue::Exact(364)) +//! .set_stop(RangeValue::Exact(897)) +//! .set_gene("rplF".to_string()) +//! .set_product("50S ribosomal subunit protein L6".to_string()) +//! .set_codon_start(1) +//! .set_strand(-1); +//! //Add the sequences for the coding sequence (CDS) into SequenceAttributes +//! record.seq_features +//! .set_counter("b3304".to_string()) +//! .set_start(RangeValue::Exact(1)) +//! .set_stop(RangeValue::Exact(354)) +//! .set_sequence_ffn("ATGGATAAGAAATCTGCTCGTATCCGTCGTGCGACCCGCGCACGCCGCAAGCTCCAGGAG +//!CTGGGCGCAACTCGCCTGGTGGTACATCGTACCCCGCGTCACATTTACGCACAGGTAATT +//!GCACCGAACGGTTCTGAAGTTCTGGTAGCTGCTTCTACTGTAGAAAAAGCTATCGCTGAA +//!CAACTGAAGTACACCGGTAACAAAGACGCGGCTGCAGCTGTGGGTAAAGCTGTCGCTGAA +//!CGCGCTCTGGAAAAAGGCATCAAAGATGTATCCTTTGACCGTTCCGGGTTCCAATATCAT +//!GGTCGTGTCCAGGCACTGGCAGATGCTGCCCGTGAAGCTGGCCTTCAGTTCTAA".to_string()) +//! .set_sequence_faa("MDKKSARIRRATRARRKLQELGATRLVVHRTPRHIYAQVIAPNGSEVLVAASTVEKAIAE +//!QLKYTGNKDAAAAVGKAVAERALEKGIKDVSFDRSGFQYHGRVQALADAAREAGLQF".to_string()) +//! .set_codon_start(1) +//! .set_strand(-1); +//! record.seq_features +//! .set_counter("bb3305".to_string()) +//! .set_start(RangeValue::Exact(364)) +//! .set_stop(RangeValue::Exact(897)) +//! .set_sequence_ffn("ATGTCTCGTGTTGCTAAAGCACCGGTCGTTGTTCCTGCCGGCGTTGACGTAAAAATCAAC +//!GGTCAGGTTATTACGATCAAAGGTAAAAACGGCGAGCTGACTCGTACTCTCAACGATGCT +//!GTTGAAGTTAAACATGCAGATAATACCCTGACCTTCGGTCCGCGTGATGGTTACGCAGAC +//!GGTTGGGCACAGGCTGGTACCGCGCGTGCCCTGCTGAACTCAATGGTTATCGGTGTTACC +//!GAAGGCTTCACTAAGAAGCTGCAGCTGGTTGGTGTAGGTTACCGTGCAGCGGTTAAAGGC +//!AATGTGATTAACCTGTCTCTGGGTTTCTCTCATCCTGTTGACCATCAGCTGCCTGCGGGT +//!ATCACTGCTGAATGTCCGACTCAGACTGAAATCGTGCTGAAAGGCGCTGATAAGCAGGTG +//!ATCGGCCAGGTTGCAGCGGATCTGCGCGCCTACCGTCGTCCTGAGCCTTATAAAGGCAAG +//!GGTGTTCGTTACGCCGACGAAGTCGTGCGTACCAAAGAGGCTAAGAAGAAGTAA".to_string()) +//! .set_sequence_faa("MSRVAKAPVVVPAGVDVKINGQVITIKGKNGELTRTLNDAVEVKHADNTLTFGPRDGYAD +//!GWAQAGTARALLNSMVIGVTEGFTKKLQLVGVGYRAAVKGNVINLSLGFSHPVDHQLPAG +//!ITAECPTQTEIVLKGADKQVIGQVAADLRAYRRPEPYKGKGVRYADEVVRTKEAKKK".to_string()) +//! .set_codon_start(1) +//! .set_strand(-1); +//! //Add the full sequence of the entire record into the record.sequence +//! record.sequence = "TTAGAACTGAAGGCCAGCTTCACGGGCAGCATCTGCCAGTGCCTGGACACGACCATGATA +//!TTGGAACCCGGAACGGTCAAAGGATACATCTTTGATGCCTTTTTCCAGAGCGCGTTCAGC +//!GACAGCTTTACCCACAGCTGCAGCCGCGTCTTTGTTACCGGTGTACTTCAGTTGTTCAGC +//!GATAGCTTTTTCTACAGTAGAAGCAGCTACCAGAACTTCAGAACCGTTCGGTGCAATTAC +//!CTGTGCGTAAATGTGACGCGGGGTACGATGTACCACCAGGCGAGTTGCGCCCAGCTCCTG +//!GAGCTTGCGGCGTGCGCGGGTCGCACGACGGATACGAGCAGATTTCTTATCCATAGTGTT +//!ACCTTACTTCTTCTTAGCCTCTTTGGTACGCACGACTTCGTCGGCGTAACGAACACCCTT +//!GCCTTTATAAGGCTCAGGACGACGGTAGGCGCGCAGATCCGCTGCAACCTGGCCGATCAC +//!CTGCTTATCAGCGCCTTTCAGCACGATTTCAGTCTGAGTCGGACATTCAGCAGTGATACC +//!CGCAGGCAGCTGATGGTCAACAGGATGAGAGAAACCCAGAGACAGGTTAATCACATTGCC +//!TTTAACCGCTGCACGGTAACCTACACCAACCAGCTGCAGCTTCTTAGTGAAGCCTTCGGT +//!AACACCGATAACCATTGAGTTCAGCAGGGCACGCGCGGTACCAGCCTGTGCCCAACCGTC +//!TGCGTAACCATCACGCGGACCGAAGGTCAGGGTATTATCTGCATGTTTAACTTCAACAGC +//!ATCGTTGAGAGTACGAGTCAGCTCGCCGTTTTTACCTTTGATCGTAATAACCTGACCGTT +//!GATTTTTACGTCAACGCCGGCAGGAACAACGACCGGTGCTTTAGCAACACGAGACAT".to_string(); +//! gff_write(seq_region, vec![record], &filename, true); +//! return Ok(()); +//! } +//!``` +//! + +use std::{ + io::{self, Write}, + fs::{self, OpenOptions, File}, + vec::Vec, + str, + convert::{AsRef, TryInto}, + path::Path, + collections::{BTreeMap, HashSet}, +}; +use regex::Regex; +use itertools::Itertools; +use protein_translate::translate; +use bio::alphabets::dna::revcomp; +use anyhow::{anyhow, Context}; +use paste::paste; +use chrono::prelude::*; + + +/// macro to create get_ functions for the values +#[macro_export] +macro_rules! create_getters { + // macro for creating get methods + ($struct_name:ident, $attributes:ident, $enum_name:ident, $( $field:ident { value: $type:ty } ),* ) => { + impl $struct_name { + $( + // creates a get method for each of the fields in the SourceAttributes, FeatureAttributes and SequenceAttributes + paste! { + pub fn [](&self, key: &str) -> Option<&$type> { + // Get the HashSet for the key (e.g., "source_1") + self.$attributes.get(key).and_then(|set| { + // Iterate over the HashSet to find the correct SourceAttributes value + set.iter().find_map(|attr| { + if let $enum_name::$field { value } = attr { + Some(value) + } else { + None + } + }) + }) + } + } + )* + } + }; +} + +/// macro to create the set_ functions for the values in a Builder format +#[macro_export] +macro_rules! create_builder { + // Macro for creating attribute builders for SourceAttributes, FeatureAttributes and SequenceAttributes + ($builder_name:ident, $attributes:ident, $enum_name:ident, $counter_name:ident, $( $field:ident { value: $type:ty } ),* ) => { + impl $builder_name { + pub fn new() -> Self { + $builder_name { + $attributes: BTreeMap::new(), + $counter_name: None, + } + } + //sets the key for the BTreeMap + pub fn set_counter(&mut self, counter: String) -> &mut Self { + self.$counter_name = Some(counter); + self + } + //function to insert the fields from the enum into the attributes + pub fn insert_to(&mut self, value: $enum_name) { + if let Some(counter) = &self.$counter_name { + self.$attributes + .entry(counter.to_string()) + .or_insert_with(HashSet::new) + .insert(value); + } + else { + panic!("Counter key not set"); // Needs better error handling + } + } + // function to set each of the alternative fields in the builder + $( + paste! { + pub fn [](&mut self, value: $type) -> &mut Self { + self.insert_to($enum_name::$field { value }); + self + } + } + )* + // build function to the attributes + pub fn build(self) -> BTreeMap> { + self.$attributes + } + // function to iterate immutably through the BTreeMap as required + pub fn iter_sorted(&'_ self) -> std::collections::btree_map::Iter> { + self.$attributes.iter() + } + //default function + pub fn default() -> Self { + $builder_name { + $attributes: BTreeMap::new(), + $counter_name: None, + } + } + } + }; +} + +#[macro_export] +macro_rules! genbank { + ($filename:expr) => {{ + use std::fs::File; + use std::io::BufReader; + let file = File::open($filename) + .unwrap_or_else(|e| panic!("Could not open file {}: {}", $filename, e)); + let mut reader = $crate::gbk::Reader::new(file); + let mut vec = Vec::new(); + for rec in reader.records() { + match rec { + Ok(r) => { //println!("this is r {:?}", &r); + vec.push(r); + } + Err(e) => panic!("Error reading record: {:?}", e), + } + } + vec + }}; +} + + +//const MAX_GBK_BUFFER_SIZE: usize = 512; +/// A Gbk reader. + +#[derive(Debug)] +#[allow(unused_mut)] +pub struct Records +where + B: io::BufRead, +{ + reader: Reader, + error_has_occurred: bool, +} + +impl Records +where + B: io::BufRead, +{ + #[allow(unused_mut)] + pub fn new(mut reader: Reader) -> Self { + Records { + reader: reader, + error_has_occurred: false, + } + } +} + +impl Iterator for Records +where + B: io::BufRead, +{ + type Item = Result; + + fn next(&mut self) -> Option { + if self.error_has_occurred { + println!("error was encountered in iteration"); + None + } else { + let mut record = Record::new(); + match self.reader.read(&mut record) { + Ok(_) => { if record.is_empty() { + None } + else { + Some(Ok(record)) + } + } + Err(err) => { + //println!("we encountered an error {:?}", &err); + self.error_has_occurred = true; + Some(Err(anyhow!("next record read error {:?}",err))) + } + } + } + } +} + +pub trait GbkRead { + fn read(&mut self, record: &mut Record) -> Result; +} + +///per line reader for the file +#[derive(Debug, Default)] +pub struct Reader { + reader: B, + line_buffer: String, +} + +impl Reader> { + /// Read Gbk from given file path in given format. + pub fn from_file + std::fmt::Debug>(path: P) -> anyhow::Result { + fs::File::open(&path) + .map(Reader::new) + .with_context(|| format!("Failed to read Gbk from {:#?}", path)) + } +} + +impl Reader> +where + R: io::Read, +{ + //// Create a new Gbk reader given an instance of `io::Read` in given format + pub fn new(reader: R) -> Self { + Reader { + reader: io::BufReader::new(reader), + line_buffer: String::new(), + } + } +} + +impl Reader +where + B: io::BufRead, +{ + pub fn from_bufread(bufreader: B) -> Self { + Reader { + reader: bufreader, + line_buffer: String::new(), + } + } + //return an iterator over the records of the genbank file + pub fn records(self) -> Records { + Records { + reader: self, + error_has_occurred: false, + } + } +} + +///main gbk parser +impl<'a, B> GbkRead for Reader +where + B: io::BufRead, +{ + #[allow(unused_mut)] + #[allow(unused_variables)] + #[allow(unused_assignments)] + fn read(&mut self, record: &mut Record) -> Result { + record.rec_clear(); + //println!("reading new record"); + //initialise variables + let mut sequences = String::new(); + let mut source_map = SourceAttributeBuilder::new(); + let mut cds = FeatureAttributeBuilder::new(); + let mut seq_features = SequenceAttributeBuilder::new(); + let mut cds_counter: i32 = 0; + let mut source_counter: i32 = 0; + let mut prev_end: u32 = 0; + let mut organism = String::new(); + let mut mol_type = String::new(); + let mut strain = String::new(); + let mut source_name = String::new(); + let mut type_material = String::new(); + let mut theend: u32 = 0; + let mut thestart: u32 = 0; + let mut db_xref = String::new(); + //check if there are any more lines, if not return the record as is + if self.line_buffer.is_empty() { + self.reader.read_line(&mut self.line_buffer)?; + if self.line_buffer.is_empty() { + return Ok(record.to_owned()); + } + } + //main loop to populate the attributes and iterate through the file + 'outer: while !self.line_buffer.is_empty() { + //println!("is line buffer {:?}", &self.line_buffer); + //collect the header fields + if self.line_buffer.starts_with("LOCUS") { + record.rec_clear(); + let mut header_fields: Vec<&str> = self.line_buffer.split_whitespace().collect(); + let mut header_iter = header_fields.iter(); + header_iter.next(); + record.id = header_iter.next() + .ok_or_else(|| anyhow::anyhow!("missing record id"))? // Get &str or error + .to_string(); + let lens = header_iter.next() + .ok_or_else(|| anyhow::anyhow!("missing record length"))? // Get &str or error + .to_string(); + record.length = lens.trim().parse::()?; + self.line_buffer.clear(); + } + //collect the source fields and populate the source_map and source_attributes + if self.line_buffer.starts_with(" source") { + let re = Regex::new(r"([0-9]+)[[:punct:]]+([0-9]+)")?; + let location = re.captures(&self.line_buffer).ok_or_else(|| anyhow::anyhow!("missing location"))?; + let start = &location[1]; + let end = &location[2]; + thestart = start.trim().parse::()?; + source_counter+=1; + source_name = format!("source_{}_{}",record.id,source_counter).to_string(); + thestart += prev_end; + theend = end.trim().parse::()? + prev_end; + //println!("so the start and end are {:?} {:?}", &thestart, &theend); + loop { + self.line_buffer.clear(); + self.reader.read_line(&mut self.line_buffer)?; + if self.line_buffer.starts_with(" CDS") { + //println!("this source name {:?} start {:?} end {:?} organism {:?} mol_type {:?} strain {:?} type_material {:?} db_xref {:?}", &source_name,&thestart, &theend, &organism, &mol_type, &strain, &type_material, &db_xref); + record.source_map + .set_counter(source_name.to_string()) + .set_start(RangeValue::Exact(thestart)) + .set_stop(RangeValue::Exact(theend)) + .set_organism(organism.clone()) + .set_mol_type(mol_type.clone()) + .set_strain(strain.clone()) + // culture_collection.clone() + .set_type_material(type_material.clone()) + .set_db_xref(db_xref.clone()); + continue 'outer; + } + if self.line_buffer.contains("/organism") { + let org: Vec<&str> = self.line_buffer.split('\"').collect(); + organism = org[1].to_string(); + } + if self.line_buffer.contains("/mol_type") { + let mol: Vec<&str> = self.line_buffer.split('\"').collect(); + mol_type = mol[1].to_string(); + } + if self.line_buffer.contains("/strain") { + let stra: Vec<&str> = self.line_buffer.split('\"').collect(); + strain = stra[1].to_string(); + } + // if self.line_buffer.contains("/culture_collection") { + // let cc: Vec<&str> = self.line_buffer.split('\"').collect(); + // culture_collection = cc[1].to_string(); + // } + if self.line_buffer.contains("/type_material") { + let mat: Vec<&str> = self.line_buffer.split('\"').collect(); + type_material = mat[1].to_string(); + } + if self.line_buffer.contains("/db_xref") { + let db: Vec<&str> = self.line_buffer.split('\"').collect(); + db_xref = db[1].to_string(); + } + } + } + //populate the FeatureAttributes and the coding sequence annotation + if self.line_buffer.starts_with(" CDS") { + let mut startiter: Vec<_> = Vec::new(); + let mut enditer: Vec<_> = Vec::new(); + let mut thestart: u32 = 0; + let mut thend: u32 = 0; + let mut joined: bool = false; + //gather the feature coordinates + let joined = if self.line_buffer.contains("join") { true } else { false }; + let re = Regex::new(r"([0-9]+)[[:punct:]]+([0-9]+)")?; + //let matches: Vec<®ex::Captures> = re.captures_iter(&self.line_buffer).collect(); + for cap in re.captures_iter(&self.line_buffer) { + cds_counter+=1; + thestart = cap[1].parse().expect("failed to match and parse numerical start"); + theend = cap[2].parse().expect("failed to match and parse numerical end"); + startiter.push(thestart); + enditer.push(theend); + } + let mut gene = String::new(); + let mut product = String::new(); + let strand: i8 = if self.line_buffer.contains("complement") {-1} else {1}; + let mut locus_tag = String::new(); + let mut codon_start: u8 = 1; + //loop to populate the feature attributes, when complete it calls to the outer loop directly to prevent reading a new line into self.line_buffer + loop { + self.line_buffer.clear(); + self.reader.read_line(&mut self.line_buffer)?; + if self.line_buffer.contains("/locus_tag=") { + let loctag: Vec<&str> = self.line_buffer.split('\"').collect(); + locus_tag = loctag[1].to_string(); + //println!("designated locus tag {:?}", &locus_tag); + } + if self.line_buffer.contains("/codon_start") { + let codstart: Vec<&str> = self.line_buffer.split('=').collect(); + let valstart = codstart[1].trim().parse::()?; + codon_start = valstart; + //println!("designated codon start {:?} {:?}", &codon_start, &locus_tag); + } + if self.line_buffer.contains("/gene=") { + let gen: Vec<&str> = self.line_buffer.split('\"').collect(); + gene = gen[1].to_string(); + //println!("gene designated {:?} {:?}", &gene, &locus_tag); + } + if self.line_buffer.contains("/product") { + let prod: Vec<&str> = self.line_buffer.split('\"').collect(); + product = substitute_odd_punctuation(prod[1].to_string())?; + //println!("designated product {:?} {:?}", &product, &locus_tag); + } + if self.line_buffer.starts_with(" CDS") || self.line_buffer.starts_with("ORIGIN") || self.line_buffer.starts_with(" gene") || self.line_buffer.starts_with(" misc_feature") { + if locus_tag.is_empty() { + locus_tag = format!("CDS_{}",cds_counter).to_string(); + } + if joined { + //println!("currently the start is {:?} and the stop is {:?}", &startiter, &enditer); + for (i, m) in startiter.iter().enumerate() { + let loc_tag = format!("{}_{}",locus_tag.clone(),i); + //check we may need to add or subtract one to m + record.cds + .set_counter(loc_tag) + .set_start(RangeValue::Exact(*m)) + .set_stop(RangeValue::Exact(enditer[i])) + .set_gene(gene.to_string()) + .set_product(product.to_string()) + .set_codon_start(codon_start) + .set_strand(strand); + } + continue 'outer; + } + else { + record.cds + .set_counter(locus_tag.clone()) + .set_start(RangeValue::Exact(thestart)) + .set_stop(RangeValue::Exact(theend)) + .set_gene(gene.to_string()) + .set_product(product.to_string()) + .set_codon_start(codon_start) + .set_strand(strand); + continue 'outer; + } + } + } } + //check if we have reached the DNA sequence section and populate the record sequences field if so. Returns the record on finding end of record mark + if self.line_buffer.starts_with("ORIGIN") { + let mut sequences = String::new(); + let result_seq = loop { + self.line_buffer.clear(); + self.reader.read_line(&mut self.line_buffer)?; + if self.line_buffer.starts_with("//") { + break sequences; + } else { + let s: Vec<&str> = self.line_buffer.split_whitespace().collect(); + let s = &s[1..]; + let sequence = s.iter().join(""); + sequences.push_str(&sequence); + } + }; + record.sequence = result_seq.to_string(); + let mut iterablecount: u32 = 0; + //Fields are completed and populated for the FeatureAttributes, collect and populate the SequenceAttributes fields + for (key,val) in record.cds.iter_sorted() { + let (mut a, mut b, mut c, mut d): (Option, Option, Option, Option) = (None, None, None, None); + for value in val { + //println!("this is key {:?} value {:?}", &key, &value); + match value { + FeatureAttributes::Start { value } => a = match value { + RangeValue::Exact(v) => Some(*v), + RangeValue::LessThan(v) => Some(*v), // Assign the value even if it's Some(*v), //Assign the value even it's > value + }, + FeatureAttributes::Stop { value } => b = match value { + RangeValue::Exact(v) => Some(*v), + RangeValue::LessThan(v) => Some(*v), // Assign the value even if it's Some(*v), //Assign the value even if it's > value + }, + FeatureAttributes::Strand { value } => c = match value { + value => Some(*value), + }, + FeatureAttributes::CodonStart { value } => d = match value { + value => Some(value.clone()), + }, + _ => (), + } + } + let sta = a.map(|o| o as usize) + .ok_or(anyhow!("No value for start"))?; + let sto = b.map(|t| t as usize) + .ok_or(anyhow!("No value for stop"))? - 1; + let stra = c.map(|u| u as i8) + .ok_or(anyhow!("No value for strand"))?; + let cod = d.map(|v| v as usize - 1) + .ok_or(anyhow!("No value for strand"))?; + + let star = sta.try_into()?; + let stow = sto.try_into()?; + let codd = cod.try_into()?; + let mut sliced_sequence: &str = ""; + //collects the DNA sequence and translations on the correct strand + if stra == -1 { + if cod > 1 { + //println!("reverse strand coding start more than one {:?}", &iterablecount); + if sto + 1 <= record.sequence.len() { + sliced_sequence = &record.sequence[sta+cod..sto+1]; + } + else { + sliced_sequence = &record.sequence[sta+cod..sto]; + } + } + else { + //println!("record sta {:?} sto {:?} cod {:?} stra {:?} record.seq length {:?}", &sta, &sto, &cod, &stra, &record.sequence.len()); + //println!("sliced sta {:?} sliced sto {:?} record.id {:?}", sta, sto, &record.id); + //println!("iterable count is {:?} reverse strand codon start one", &iterablecount); + if sto + 1 <= record.sequence.len() { + sliced_sequence = &record.sequence[sta..sto+1]; + } + else { + sliced_sequence = &record.sequence[sta..sto]; + } + println!("iterable count after is {:?}", &iterablecount); + } + let cds_char = sliced_sequence; + let prot_seq = translate(&revcomp(cds_char.as_bytes())); + let parts: Vec<&str> = prot_seq.split('*').collect(); + record.seq_features + .set_counter(key.to_string()) + .set_start(RangeValue::Exact(star)) + .set_stop(RangeValue::Exact(stow)) + .set_sequence_ffn(cds_char.to_string()) + .set_sequence_faa(parts[0].to_string()) + .set_codon_start(codd) + .set_strand(stra); + } else { + if cod > 1 { + //println!("forward strand codon value more than one cnt {:?}", &iterablecount); + sliced_sequence = &record.sequence[sta+cod-1..sto]; + } + else { + println!("forward strand codon value one cnt {:?}", &iterablecount); + sliced_sequence = &record.sequence[sta-1..sto]; + } + let cds_char = sliced_sequence; + let prot_seq = translate(cds_char.as_bytes()); + let parts: Vec<&str> = prot_seq.split('*').collect(); + record.seq_features + .set_counter(key.to_string()) + .set_start(RangeValue::Exact(star)) + .set_stop(RangeValue::Exact(stow)) + .set_sequence_ffn(cds_char.to_string()) + .set_sequence_faa(parts[0].to_string()) + .set_codon_start(codd) + .set_strand(stra); + } + } + //return the record when completed + return Ok(record.to_owned()); + } + //clear the line buffer and read the next to continue back to the outer loop + self.line_buffer.clear(); + self.reader.read_line(&mut self.line_buffer)?; + } + Ok(record.to_owned()) + } +} + +pub use crate::record::RangeValue; + +//stores the details of the source features in genbank (contigs) +#[derive(Debug, Eq, PartialEq, Hash, Clone)] +pub enum SourceAttributes { + Start { value: RangeValue }, + Stop { value: RangeValue }, + Organism { value: String }, + MolType { value: String}, + Strain { value: String}, + CultureCollection { value: String}, + TypeMaterial { value: String}, + DbXref { value:String} +} + +//macro for creating the getters +create_getters!( + SourceAttributeBuilder, + source_attributes, + SourceAttributes, + Start { value: RangeValue }, + Stop { value: RangeValue }, + Organism { value: String }, + MolType { value: String}, + Strain { value: String}, + // CultureCollection { value: String}, + TypeMaterial { value: String}, + DbXref { value:String} +); + +///builder for the source information on a per record basis +#[derive(Debug, Default, Clone)] +pub struct SourceAttributeBuilder { + pub source_attributes: BTreeMap>, + pub source_name: Option, +} + +impl SourceAttributeBuilder { + // Method to set source name + pub fn set_source_name(&mut self, name: String) { + self.source_name = Some(name); + } + + // Method to get source name + pub fn get_source_name(&self) -> Option<&String> { + self.source_name.as_ref() + } + + // Method to add source attributes + pub fn add_source_attribute(&mut self, key: String, attribute: SourceAttributes) { + self.source_attributes + .entry(key) + .or_insert_with(HashSet::new) + .insert(attribute); + } + + // Method to retrieve source attributes for a given key + pub fn get_source_attributes(&self, key: &str) -> Option<&HashSet> { + self.source_attributes.get(key) + } +} + + +create_builder!( + SourceAttributeBuilder, + source_attributes, + SourceAttributes, + source_name, + Start { value: RangeValue }, + Stop { value: RangeValue }, + Organism { value: String }, + MolType { value: String}, + Strain { value: String}, + // CultureCollection { value: String}, + TypeMaterial { value: String}, + DbXref { value:String} +); + +///attributes for each feature, cds or gene +#[derive(Debug, Eq, Hash, PartialEq, Clone)] +pub enum FeatureAttributes { + Start { value: RangeValue }, + Stop { value: RangeValue }, + Gene { value: String }, + Product { value: String }, + CodonStart { value: u8 }, + Strand { value: i8 }, + // ec_number { value: String } +} + + +create_getters!( + FeatureAttributeBuilder, + attributes, + FeatureAttributes, + Start { value: RangeValue }, + Stop { value: RangeValue }, + Gene { value: String }, + Product { value: String }, + CodonStart { value: u8 }, + Strand { value: i8 } +); + +///builder for the feature information on a per coding sequence (CDS) basis +#[derive(Debug, Default, Clone)] +pub struct FeatureAttributeBuilder { + pub attributes: BTreeMap>, + locus_tag: Option, +} + +create_builder!( + FeatureAttributeBuilder, + attributes, + FeatureAttributes, + locus_tag, + Start { value: RangeValue }, + Stop { value: RangeValue }, + Gene { value: String }, + Product { value: String }, + CodonStart { value: u8 }, + Strand { value: i8 } +); + +///stores the sequences of the coding sequences (genes) and proteins. Also stores start, stop, codon_start and strand information +#[derive(Debug, Eq, PartialEq, Hash, Clone)] +pub enum SequenceAttributes { + Start { value: RangeValue }, + Stop { value: RangeValue }, + SequenceFfn { value: String }, + SequenceFaa { value: String }, + CodonStart { value: u8 }, + Strand { value: i8 }, +} + +create_getters!( + SequenceAttributeBuilder, + seq_attributes, + SequenceAttributes, + Start { value: RangeValue }, + Stop { value: RangeValue }, + SequenceFfn { value: String}, + SequenceFaa { value: String}, + CodonStart { value: u8}, + Strand { value: i8} +); + +///builder for the sequence information on a per coding sequence (CDS) basis +#[derive(Debug, Default, Clone)] +pub struct SequenceAttributeBuilder { + pub seq_attributes: BTreeMap>, + pub locus_tag: Option, +} + +create_builder!( + SequenceAttributeBuilder, + seq_attributes, + SequenceAttributes, + locus_tag, + Start { value: RangeValue }, + Stop { value: RangeValue }, + SequenceFfn { value: String}, + SequenceFaa { value: String}, + CodonStart { value: u8 }, + Strand { value: i8 } +); + +///product lines can contain difficult to parse punctuation such as biochemical symbols like unclosed single quotes, superscripts, single and double brackets etc. +///here we substitute these for an underscore +pub fn substitute_odd_punctuation(input: String) -> Result { + let re = Regex::new(r"[/?()',`]|[α-ωΑ-Ω]")?; + + // Strip either \r\n or \n more elegantly + let cleaned = input.trim_end_matches(&['\r', '\n'][..]); + + Ok(re.replace_all(cleaned, "_").to_string()) +} + +///GFF3 field9 construct +#[derive(Debug)] +pub struct GFFInner { + pub id: String, + pub name: String, + pub locus_tag: String, + pub gene: String, + // Inference: String, + // Parent: String, + // db_xref: String, + pub product: String, + // is_circular: bool, +} + +impl GFFInner { + pub fn new( + id: String, + name: String, + locus_tag: String, + gene: String, + // Inference: String, + // Parent: String, + // db_xref: String, + product: String, + ) -> Self { + GFFInner { + id, name, locus_tag, gene, product, + } + } +} + +///The main GFF3 construct +#[derive(Debug)] +pub struct GFFOuter<'a> { + pub seqid: String, + pub source: String, + pub type_val: String, + pub start: u32, + pub end: u32, + pub score: f64, + pub strand: String, + pub phase: u8, + pub attributes: &'a GFFInner, +} + +impl<'a> GFFOuter<'a> { + pub fn new( + seqid: String, + source: String, + type_val: String, + start: u32, + end: u32, + score: f64, + strand: String, + phase: u8, + attributes: &'a GFFInner + ) -> Self { + GFFOuter { + seqid, source, type_val, start, end, score, strand, phase, attributes, + } + } + pub fn field9_attributes_build(&self) -> String { + let mut full_field9 = Vec::new(); + if !self.attributes.id.is_empty() { + full_field9.push(format!("id={}",self.attributes.id)); + } + if !self.attributes.name.is_empty() { + full_field9.push(format!("name={}", self.attributes.name)); + } + if !self.attributes.gene.is_empty() { + full_field9.push(format!("gene={}",self.attributes.gene)); + } + // if !self.attributes.Inference.is_empty() { + // full_field9.push(format!("inference={}",self.attributes.Inference)); +// } + if !self.attributes.locus_tag.is_empty() { + full_field9.push(format!("locus_tag={}",self.attributes.locus_tag)); + } + if !self.attributes.product.is_empty() { + full_field9.push(format!("product={}",self.attributes.product)); + } + // if !self.attributes.Parent.is_empty() { + // full_field9.push(format!("Parent={}",self.attributes.Parent)); +// } +// if !self.attributes.db_xref.is_empty() { +// full_field9.push(format!("db_xref={}",self.attributes.db_xref)); +// } + full_field9.join(";") + } +} + +///formats the translation string which can be multiple lines, for gbk +pub fn format_translation(translation: &str) -> String { + //create method to add the protein sequence into the translation qualifier with correct line lengths + let mut formatted = String::new(); + let cleaned_translation = translation.replace("\n", ""); + formatted.push_str(" /translation=\""); + let line_length: usize = 60; + let final_num = line_length - 15; + formatted.push_str(&format!("{}\n",&cleaned_translation[0..final_num])); + for i in (47..translation.len()).step_by(60) { + let end = i+60 -1; + let valid_end = if end >= translation.len() { &cleaned_translation.len() -1 } else { end }; + formatted.push_str(&format!(" {}",&cleaned_translation[i..valid_end])); + println!("cleaned translation leng is {:?}", &cleaned_translation[i..valid_end].len()); + if *&cleaned_translation[i..valid_end].len() < 59 { + formatted.push('\"'); + } + else { + formatted.push('\n'); + } + } + formatted +} + +///writes the DNA sequence in gbk format with numbering +pub fn write_gbk_format_sequence(sequence: &str,file: &mut File) -> io::Result<()> { + //function to write gbk format sequence + writeln!(file, "ORIGIN")?; + let mut formatted = String::new(); + let cleaned_input = sequence.replace("\n", ""); + let mut index = 1; + for (_i, chunk) in cleaned_input.as_bytes().chunks(60).enumerate() { + formatted.push_str(&format!("{:>5} ", index)); + for (j, sub_chunk) in chunk.chunks(10).enumerate() { + if j > 0 { + formatted.push(' '); + } + formatted.push_str(&String::from_utf8_lossy(sub_chunk)); + } + formatted.push('\n'); + index+=60; + } + writeln!(file, "{:>6}", &formatted)?; + writeln!(file, "//")?; + Ok(()) +} + +///saves the parsed data in genbank format +//writes a genbank or multi-genbank file +pub fn gbk_write(seq_region: BTreeMap, record_vec: Vec, filename: &str) -> io::Result<()> { + let now = Local::now(); + let formatted_date = now.format("%d-%b-%Y").to_string().to_uppercase(); + let mut file = OpenOptions::new() + .write(true) // Allow writing to the file + .append(true) // Enable appending to the file + .create(true) // Create the file if it doesn't exist + .open(filename)?; + for (i, (key, _val)) in seq_region.iter().enumerate() { + let strain = match &record_vec[i].source_map.get_strain(key) { + Some(value) => value.to_string(), + None => "Unknown".to_string(), + }; + //write lines for the header + let organism = match &record_vec[i].source_map.get_organism(key) { + Some(value) => value.to_string(), + None => "Unknown".to_string(), + }; + let mol_type = match &record_vec[i].source_map.get_mol_type(key) { + Some(value) => value.to_string(), + None => "Unknown".to_string(), + }; + let type_material = match &record_vec[i].source_map.get_type_material(&key) { + Some(value) => value.to_string(), + None => "Unknown".to_string(), + }; + let db_xref = match &record_vec[i].source_map.get_db_xref(key) { + Some(value) => value.to_string(), + None => "Unknown".to_string(), + }; + let source_stop = match &record_vec[i].source_map.get_stop(key) { + Some(value) => value.get_value(), + None => { println!("stop value not found"); + None }.expect("stop value not received") + }; + writeln!(file, "LOCUS {} {} bp DNA linear CON {}", &key,&record_vec[i].sequence.len(),&formatted_date)?; + writeln!(file, "DEFINITION {} {}.", &organism, &strain)?; + writeln!(file, "ACCESSION {}", &key)?; + writeln!(file, "KEYWORDS .")?; + writeln!(file, "SOURCE {} {}", &organism,&strain)?; + writeln!(file, " ORGANISM {} {}", &organism,&strain)?; + //write lines for the source + writeln!(file, "FEATURES Location/Qualifiers")?; + writeln!(file, " source 1..{}", &source_stop)?; + writeln!(file, " /organism=\"{}\"",&strain)?; + writeln!(file, " /mol_type=\"{}\"",&mol_type)?; + writeln!(file, " /strain=\"{}\"",&strain)?; + if type_material != *"Unknown".to_string() { + writeln!(file, " /type_material=\"{}\"",&type_material)?; + } + writeln!(file, " /db_xref=\"{}\"",&db_xref)?; + //write lines for each CDS + for (locus_tag, _value) in &record_vec[i].cds.attributes { + let start = match &record_vec[i].cds.get_start(locus_tag) { + Some(value) => value.get_value(), + None => { println!("start value not found"); + None }.expect("start value not received") + }; + let stop = match &record_vec[i].cds.get_stop(locus_tag) { + Some(value) => value.get_value(), + None => { println!("stop value not found"); + None }.expect("stop value not received") + }; + let product = match &record_vec[i].cds.get_product(locus_tag) { + Some(value) => value.to_string(), + None => "unknown product".to_string(), + }; + let strand = match &record_vec[i].cds.get_strand(locus_tag) { + Some(value) => **value, + None => 0, + }; + let codon_start = match &record_vec[i].cds.get_codon_start(locus_tag) { + Some(value) => **value, + None => 0, + }; + let gene = match &record_vec[i].cds.get_gene(locus_tag) { + Some(value) => value.to_string(), + None => "unknown".to_string(), + }; + let translation = match &record_vec[i].seq_features.get_sequence_faa(locus_tag) { + Some(value) => value.to_string(), + None => "unknown".to_string(), + }; + if strand == 1 { + writeln!(file, " gene {}..{}",&start,&stop)?; + } else { + writeln!(file, " gene complement({}..{})",&start,&stop)?; + } + writeln!(file, " /locus_tag=\"{}\"",&locus_tag)?; + if strand == 1 { + writeln!(file, " CDS {}..{}",&start,&stop)?; + } + else { + writeln!(file, " CDS complement({}..{})",&start,&stop)?; + } + writeln!(file, " /locus_tag=\"{}\"",&locus_tag)?; + writeln!(file, " /codon_start=\"{}\"", &codon_start)?; + if gene != "unknown" { + writeln!(file, " /gene=\"{}\"", &gene)?; + } + if translation != "unknown" { + let formatted_translation = format_translation(&translation); + writeln!(file, "{}", &formatted_translation)?; + } + writeln!(file, " /product=\"{}\"",&product)?; + } + write_gbk_format_sequence(&record_vec[i].sequence, &mut file)?; + } + Ok(()) +} + +///saves the parsed data in gff3 format +//writes a gff3 file from a genbank +#[allow(unused_assignments)] +#[allow(unused_variables)] +pub fn gff_write(seq_region: BTreeMap, mut record_vec: Vec, filename: &str, dna: bool) -> io::Result<()> { + let mut file = OpenOptions::new() + //.write(true) // Allow writing to the file + .append(true) // Enable appending to the file + .create(true) // Create the file if it doesn't exist + .open(filename)?; + if file.metadata()?.len() == 0 { + writeln!(file, "##gff-version 3")?; + } + let mut full_seq = String::new(); + let mut prev_end: u32 = 0; + //println!("this is the full seq_region {:?}", &seq_region); + for (k, v) in seq_region.iter() { + writeln!(file, "##sequence-region\t{}\t{}\t{}", &k, v.0, v.1)?; + } + for ((source_name, (seq_start, seq_end)), record) in seq_region.iter().zip(record_vec.drain(..)) { + if dna == true { + full_seq.push_str(&record.sequence); + } + for (locus_tag, _valu) in &record.cds.attributes { + let start = match record.cds.get_start(locus_tag) { + Some(value) => value.get_value(), + None => { println!("start value not found"); + None }.expect("start value not received") + }; + let stop = match record.cds.get_stop(locus_tag) { + Some(value) => value.get_value(), + None => { println!("stop value not found"); + None }.expect("stop value not received") + }; + let gene = match record.cds.get_gene(locus_tag) { + Some(value) => value.to_string(), + None => "unknown".to_string(), + }; + let product = match record.cds.get_product(locus_tag) { + Some(value) => value.to_string(), + None => "unknown product".to_string(), + }; + let strand = match record.cds.get_strand(locus_tag) { + Some(valu) => { + match valu { + 1 => "+".to_string(), + -1 => "-".to_string(), + _ => { println!("unexpected strand value {} for locus_tag {}", valu, locus_tag); + "unknownstrand".to_string() } + } + }, + None => "unknownvalue".to_string(), + }; + let phase = match record.cds.get_codon_start(locus_tag) { + Some(valuer) => { + match valuer { + 1 => 0, + 2 => 1, + 3 => 2, + _ => { println!("unexpected phase value {} in the bagging area for locus_tag {}", valuer, locus_tag); + 1 } + } + }, + None => 1, + }; + let gff_inner = GFFInner::new( + locus_tag.to_string(), + source_name.clone(), + locus_tag.to_string(), + gene, + // &record.cds.get_Inference(&locus_tag), + // &record.cds.get_Parent(&locus_tag), + // db_xref, + product, + ); + let gff_outer = GFFOuter::new( + source_name.clone(), + ".".to_string(), + "CDS".to_string(), + start + prev_end, + stop + prev_end, + 0.0, + strand, + phase, + &gff_inner, + ); + let field9_attributes = gff_outer.field9_attributes_build(); + //println!("{}\t{}\t{}\t{:?}\t{:?}\t{}\t{}\t{}\t{}", gff_outer.seqid, gff_outer.source, gff_outer.type_val, gff_outer.start, gff_outer.end, gff_outer.score, gff_outer.strand, gff_outer.phase, field9_attributes); + writeln!(file, "{}\t{}\t{}\t{:?}\t{:?}\t{}\t{}\t{}\t{}", gff_outer.seqid, gff_outer.source, gff_outer.type_val, gff_outer.start, gff_outer.end, gff_outer.score, gff_outer.strand, gff_outer.phase, field9_attributes)?; + + } + prev_end = *seq_end; + } + if dna { + writeln!(file, "##FASTA")?; + //writeln!(file, ">{}\n",&filename.to_string())?; + writeln!(file, "{}", full_seq)?; + } + Ok(()) +} + +///saves the parsed data in gff3 format +//writes a gff3 file from a genbank +#[allow(unused_assignments)] +pub fn orig_gff_write(seq_region: BTreeMap, record_vec: Vec, filename: &str, dna: bool) -> io::Result<()> { + let mut file = OpenOptions::new() + //.write(true) // Allow writing to the file + .append(true) // Enable appending to the file + .create(true) // Create the file if it doesn't exist + .open(filename)?; + if file.metadata()?.len() == 0 { + writeln!(file, "##gff-version 3")?; + } + let mut source_name = String::new(); + let mut full_seq = String::new(); + let mut prev_end: u32 = 0; + //println!("this is the full seq_region {:?}", &seq_region); + for (k, v) in seq_region.iter() { + writeln!(file, "##sequence-region\t{}\t{}\t{}", &k, v.0, v.1)?; + } + for (i, (key, val)) in seq_region.iter().enumerate() { + source_name = key.to_string(); + if dna == true { + full_seq.push_str(&record_vec[i].sequence); + } + for (locus_tag, _valu) in &record_vec[i].cds.attributes { + let start = match record_vec[i].cds.get_start(locus_tag) { + Some(value) => value.get_value(), + None => { println!("start value not found"); + None }.expect("start value not received") + }; + let stop = match record_vec[i].cds.get_stop(locus_tag) { + Some(value) => value.get_value(), + None => { println!("stop value not found"); + None }.expect("stop value not received") + }; + let gene = match record_vec[i].cds.get_gene(locus_tag) { + Some(value) => value.to_string(), + None => "unknown".to_string(), + }; + let product = match record_vec[i].cds.get_product(locus_tag) { + Some(value) => value.to_string(), + None => "unknown product".to_string(), + }; + let strand = match record_vec[i].cds.get_strand(locus_tag) { + Some(valu) => { + match valu { + 1 => "+".to_string(), + -1 => "-".to_string(), + _ => { println!("unexpected strand value {} for locus_tag {}", valu, locus_tag); + "unknownstrand".to_string() } + } + }, + None => "unknownvalue".to_string(), + }; + let phase = match record_vec[i].cds.get_codon_start(locus_tag) { + Some(valuer) => { + match valuer { + 1 => 0, + 2 => 1, + 3 => 2, + _ => { println!("unexpected phase value {} in the bagging area for locus_tag {}", valuer, locus_tag); + 1 } + } + }, + None => 1, + }; + let gff_inner = GFFInner::new( + locus_tag.to_string(), + source_name.clone(), + locus_tag.to_string(), + gene, + // &record.cds.get_Inference(&locus_tag), + // &record.cds.get_Parent(&locus_tag), + // db_xref, + product, + ); + let gff_outer = GFFOuter::new( + source_name.clone(), + ".".to_string(), + "CDS".to_string(), + start + prev_end, + stop + prev_end, + 0.0, + strand, + phase, + &gff_inner, + ); + let field9_attributes = gff_outer.field9_attributes_build(); + //println!("{}\t{}\t{}\t{:?}\t{:?}\t{}\t{}\t{}\t{}", gff_outer.seqid, gff_outer.source, gff_outer.type_val, gff_outer.start, gff_outer.end, gff_outer.score, gff_outer.strand, gff_outer.phase, field9_attributes); + writeln!(file, "{}\t{}\t{}\t{:?}\t{:?}\t{}\t{}\t{}\t{}", gff_outer.seqid, gff_outer.source, gff_outer.type_val, gff_outer.start, gff_outer.end, gff_outer.score, gff_outer.strand, gff_outer.phase, field9_attributes)?; + + } + prev_end = val.1; + } + if dna { + writeln!(file, "##FASTA")?; + //writeln!(file, ">{}\n",&filename.to_string())?; + writeln!(file, "{}", full_seq)?; + } + Ok(()) +} + +///internal record containing data from a single source or contig. Has multiple features. +//sets up a record +#[derive(Debug, Clone)] +pub struct Record { + pub id: String, + pub length: u32, + pub sequence: String, + pub start: usize, + pub end: usize, + pub strand: i32, + pub cds: FeatureAttributeBuilder, + pub source_map: SourceAttributeBuilder, + pub seq_features: SequenceAttributeBuilder, +} + +impl Record { + /// Create a new instance. + pub fn new() -> Self { + Record { + id: "".to_owned(), + length: 0, + sequence: "".to_owned(), + start: 0, + end: 0, + strand: 0, + source_map: SourceAttributeBuilder::new(), + cds: FeatureAttributeBuilder::new(), + seq_features: SequenceAttributeBuilder::new(), + } + } + pub fn is_empty(&mut self) -> bool { + self.id.is_empty() && self.length == 0 + } + pub fn check(&mut self) -> Result<(), &str> { + if self.id().is_empty() { + return Err("Expecting id for Gbk record."); + } + Ok(()) + } + pub fn id(&mut self) -> &str { + &self.id + } + pub fn length(&mut self) -> u32 { + self.length + } + pub fn sequence(&mut self) -> &str { + &self.sequence + } + pub fn start(&mut self) -> u32 { + self.start.try_into().unwrap() + } + pub fn end(&mut self) -> u32 { + self.end.try_into().unwrap() + } + pub fn strand(&mut self) -> i32 { + self.strand + } + pub fn cds(&mut self) -> FeatureAttributeBuilder { + self.cds.clone() + } + pub fn source_map(&mut self) -> SourceAttributeBuilder { + self.source_map.clone() + } + pub fn seq_features(&mut self) -> SequenceAttributeBuilder { + self.seq_features.clone() + } + fn rec_clear(&mut self) { + self.id.clear(); + self.length = 0; + self.sequence.clear(); + self.start = 0; + self.end = 0; + self.strand = 0; + self.source_map = SourceAttributeBuilder::new(); + self.cds = FeatureAttributeBuilder::new(); + self.seq_features = SequenceAttributeBuilder::new(); + } +} + +impl Default for Record { + fn default() -> Self { + Self::new() + } +} + +// Provide a type alias and conversion to a generic record to aid interoperability +pub type GenericRecordGbk = crate::record::GenericRecord; + +impl From<&Record> for GenericRecordGbk { + fn from(r: &Record) -> Self { + Self { + id: r.id.clone(), + seq: r.sequence.clone(), + seqid: r.id.clone(), + start: r.start as u32, + end: r.end as u32, + strand: r.strand, + source: r.source_map.clone(), + cds: r.cds.clone(), + seq_features: r.seq_features.clone(), + } + } +} + +#[allow(dead_code)] +pub struct Config { + filename: String, +} + +impl Config { + pub fn new(args: &[String]) -> Result { + if args.len() < 2 { + panic!("not enough arguments, please provide filename"); + } + let filename = args[1].clone(); + + Ok(Config { filename }) + } +} + +#[cfg(test)] +mod tests { + use super::*; + #[test] + #[allow(unused_mut)] + #[allow(unused_variables)] + #[allow(dead_code)] + #[allow(unused_assignments)] + #[allow(unused_imports)] + fn test_read_file() { + let content = std::fs::read_to_string("K12_ribo.gbk").expect("error reading file"); + assert!(content.contains("LOCUS")); + assert!(content.len() > 0); + } + #[test] + #[allow(unused_mut)] + #[allow(unused_variables)] + #[allow(dead_code)] + #[allow(unused_assignments)] + #[allow(unused_imports)] + fn test_parse_gbk() { + let file_gbk = "K12_ribo.gbk"; + let records = genbank!(&file_gbk); + assert!(records.len() > 0); + } + #[test] + #[allow(unused_mut)] + #[allow(unused_variables)] + #[allow(dead_code)] + #[allow(unused_assignments)] + #[allow(unused_imports)] + fn test_parse_source_attributes() { + let file_gbk = "K12_ribo.gbk"; + let records = genbank!(&file_gbk); + if let Some(record) = records.first() { + if let Some((key, val)) = record.source_map.source_attributes.first_key_value() { + assert_eq!(key, &"source_NC_000913_1".to_string()); + } + } + } + #[test] + #[allow(unused_mut)] + #[allow(unused_variables)] + #[allow(dead_code)] + #[allow(unused_assignments)] + #[allow(unused_imports)] + fn test_parse_cds_attributes() { + let file_gbk = "K12_ribo.gbk"; + let records = genbank!(&file_gbk); + if let Some(record) = records.first() { + if let Some((locus_tag, vals)) = record.cds.attributes.first_key_value() { + assert_eq!(locus_tag, &"b3304".to_string()); + assert_eq!(record.cds.get_gene(&locus_tag).as_deref(), Some(&"rplR".to_string())); + } + } + } + #[test] + #[allow(unused_mut)] + #[allow(unused_variables)] + #[allow(dead_code)] + #[allow(unused_assignments)] + #[allow(unused_imports)] + fn test_parse_sequence_attributes() { + let file_gbk = "K12_ribo.gbk"; + let records = genbank!(&file_gbk); + if let Some(record) = records.first() { + if let Some((key, vals)) = record.cds.attributes.first_key_value() { + assert_eq!(key, &"b3304".to_string()); + assert_eq!(record.seq_features.get_sequence_faa(&key), Some(&"MDKKSARIRRATRARRKLQELGATRLVVHRTPRHIYAQVIAPNGSEVLVAASTVEKAIAEQLKYTGNKDAAAAVGKAVAERALEKGIKDVSFDRSGFQYHGRVQALADAAREAGLQF".to_string())); + } + } + } +} + diff --git a/seqmetrics/microBioRust/src/lib.rs b/seqmetrics/microBioRust/src/lib.rs new file mode 100644 index 0000000..3c60d8e --- /dev/null +++ b/seqmetrics/microBioRust/src/lib.rs @@ -0,0 +1,14 @@ +//! The aim of this crate is to provide Microbiology friendly Rust functions for bioinformatics. +//! +//! +//! With the genbank parser, you are able to parse a genbank format file, then write into gff3 format +//! +//! It is also possible to print the DNA sequences extracted from the coding sequences (genes, ffn format), +//! plus the protein fasta sequences (faa format). +//! +//! Additionally, you can create new features and records and save them either in genbank or gff3 format +//! +#![allow(non_snake_case)] +pub mod record; +pub mod embl; +pub mod gbk; diff --git a/seqmetrics/microBioRust/src/main.rs b/seqmetrics/microBioRust/src/main.rs new file mode 100644 index 0000000..6b40f17 --- /dev/null +++ b/seqmetrics/microBioRust/src/main.rs @@ -0,0 +1,22 @@ +use clap::Parser; +use microBioRust::genbank; + +#[derive(Parser, Debug)] +#[clap(author, version, about)] +struct Arguments { + #[clap(short, long)] + filename: String, +} + +fn main() -> Result<(), anyhow::Error> { + let args = Arguments::parse(); + let records = genbank!(&args.filename); + for record in records.iter() { + for (k, _v) in &record.cds.attributes { + if let Some(seq) = record.seq_features.get_sequence_faa(k) { + println!(">{}|{}\n{}", &record.id, &k, seq); + } + } + } + return Ok(()); +} diff --git a/seqmetrics/microBioRust/src/orig_main.rs b/seqmetrics/microBioRust/src/orig_main.rs new file mode 100644 index 0000000..6369603 --- /dev/null +++ b/seqmetrics/microBioRust/src/orig_main.rs @@ -0,0 +1,44 @@ +use microBioRust::embl::Reader; +use std::fs::File; +use clap::Parser; + +#[derive(Parser,Default,Debug)] +#[clap(author="LCrossman",version,about="extracting protein fasta from gbk file")] +pub struct Arguments { + #[clap(short,long)] + filename: String, +} + +///An example to print protein sequence fasta from either a single or multi-genbank file +fn main() -> Result<(), anyhow::Error> { + //collect filename from --filename input + let args = Arguments::parse(); + let file_embl = File::open(&args.filename).expect("could not open file"); + //create reader + let mut reader = Reader::new(file_embl); + //create records structure + let mut records = reader.records(); + let mut read_counter: u32 = 0; + loop { + match records.next() { + Some(Ok(mut record)) => { + //println!("next"); + //println!("Record id: {:?}", record.id); + for (k,_v) in record.cds.attributes { + match record.seq_features.get_sequence_faa(&k) { + Some(value) => { let seq_faa = value.to_string(); + println!(">{}|{}\n{}", &record.id, &k, seq_faa); + }, + _ => (), + }; + + } + read_counter+=1; + }, + Some(Err(e)) => { println!("theres an err {:?}", e); }, + None => break, + } + } + println!("Total records processed: {}", read_counter); + Ok(()) +} diff --git a/seqmetrics/microBioRust/src/record.rs b/seqmetrics/microBioRust/src/record.rs new file mode 100644 index 0000000..27d34be --- /dev/null +++ b/seqmetrics/microBioRust/src/record.rs @@ -0,0 +1,74 @@ +///Shared generic record types to reduce duplication between gbk and embl +///Minimal initial introduction: defines generic containers and builders that mirror the existing API where possible + +use std::collections::{HashMap, HashSet}; + +#[derive(Clone, Debug, PartialEq, Eq, Hash)] +pub enum RangeValue { + Exact(u32), + LessThan(u32), + GreaterThan(u32), +} + +impl RangeValue { + pub fn get_value(&self) -> u32 { + match self { + RangeValue::Exact(v) => *v, + RangeValue::LessThan(v) => *v, + RangeValue::GreaterThan(v) => *v, + } + } +} + +///Traits to unify attribute enums across formats. Existing enums can implement Into these trait views if needed +pub trait HasStartStopStrand { + fn start(&self) -> Option { None } + fn stop(&self) -> Option { None } + fn strand(&self) -> Option { None } +} + +///Generic attribute builders +#[derive(Clone, Debug, Default)] +pub struct AttributeBuilder { + pub name: Option, + pub attributes: HashMap>, +} + +impl AttributeBuilder +where + K: Eq + std::hash::Hash, + V: Eq + std::hash::Hash, +{ + pub fn set_name(&mut self, name: String) { self.name = Some(name); } + pub fn get_name(&self) -> Option<&String> { self.name.as_ref() } + pub fn add(&mut self, key: K, value: V) { + self.attributes.entry(key).or_insert_with(HashSet::new).insert(value); + } + pub fn get(&self, key: &K) -> Option<&HashSet> { self.attributes.get(key) } +} + +///Generic record and records container +#[derive(Clone, Debug, Default)] +pub struct GenericRecord { + pub id: String, + pub seq: String, + pub seqid: String, + pub start: u32, + pub end: u32, + pub strand: i32, + pub source: S, + pub cds: F, + pub seq_features: Q, +} + +impl GenericRecord { + pub fn is_empty(&self) -> bool { self.id.is_empty() && self.seq.is_empty() } +} + +pub struct GenericRecords { + inner: R, +} + +impl GenericRecords { + pub fn new(reader: R) -> Self { Self { inner: reader } } +} diff --git a/seqmetrics/microBioRust/src/testmain.rs b/seqmetrics/microBioRust/src/testmain.rs new file mode 100644 index 0000000..099b090 --- /dev/null +++ b/seqmetrics/microBioRust/src/testmain.rs @@ -0,0 +1,94 @@ +use microBioRust::gbk::{gbk_write, gff_write, RangeValue, Record}; +use std::{ + fs::File, + collections::BTreeMap, +}; + +pub fn main() -> Result<(), anyhow::Error> { + let filename = format!("new_record.gbk"); + if std::path::Path::new(&filename).exists() { + std::fs::remove_file(&filename)?; + } + let mut record = Record::new(); + let mut seq_region: BTreeMap = BTreeMap::new(); + //example from E.coli K12 + seq_region.insert("source_1".to_string(), (1,897)); + //Add the source into SourceAttributes + record.source_map + .set_counter("source_1".to_string()) + .set_start(RangeValue::Exact(1)) + .set_stop(RangeValue::Exact(897)) + .set_organism("Escherichia coli".to_string()) + .set_mol_type("DNA".to_string()) + .set_strain("K-12 substr. MG1655".to_string()) + .set_type_material("type strain of Escherichia coli K12".to_string()) + .set_db_xref("PRJNA57779".to_string()); + //Add the features into FeatureAttributes, here we are setting two features, i.e. coding sequences or genes + record.cds + .set_counter("b3304".to_string()) + .set_start(RangeValue::Exact(1)) + .set_stop(RangeValue::Exact(354)) + .set_gene("rplR".to_string()) + .set_product("50S ribosomal subunit protein L18".to_string()) + .set_codon_start(1) + .set_strand(-1); + record.cds + .set_counter("b3305".to_string()) + .set_start(RangeValue::Exact(364)) + .set_stop(RangeValue::Exact(897)) + .set_gene("rplF".to_string()) + .set_product("50S ribosomal subunit protein L6".to_string()) + .set_codon_start(1) + .set_strand(-1); + //Add the sequences for the coding sequence (CDS) into SequenceAttributes + record.seq_features + .set_counter("b3304".to_string()) + .set_start(RangeValue::Exact(1)) + .set_stop(RangeValue::Exact(354)) + .set_sequence_ffn("ATGGATAAGAAATCTGCTCGTATCCGTCGTGCGACCCGCGCACGCCGCAAGCTCCAGGAG +CTGGGCGCAACTCGCCTGGTGGTACATCGTACCCCGCGTCACATTTACGCACAGGTAATT +GCACCGAACGGTTCTGAAGTTCTGGTAGCTGCTTCTACTGTAGAAAAAGCTATCGCTGAA +CAACTGAAGTACACCGGTAACAAAGACGCGGCTGCAGCTGTGGGTAAAGCTGTCGCTGAA +CGCGCTCTGGAAAAAGGCATCAAAGATGTATCCTTTGACCGTTCCGGGTTCCAATATCAT +GGTCGTGTCCAGGCACTGGCAGATGCTGCCCGTGAAGCTGGCCTTCAGTTCTAA".to_string()) + .set_sequence_faa("MDKKSARIRRATRARRKLQELGATRLVVHRTPRHIYAQVIAPNGSEVLVAASTVEKAIAE +QLKYTGNKDAAAAVGKAVAERALEKGIKDVSFDRSGFQYHGRVQALADAAREAGLQF".to_string()) + .set_codon_start(1) + .set_strand(-1); + record.seq_features + .set_counter("bb3305".to_string()) + .set_start(RangeValue::Exact(364)) + .set_stop(RangeValue::Exact(897)) + .set_sequence_ffn("ATGTCTCGTGTTGCTAAAGCACCGGTCGTTGTTCCTGCCGGCGTTGACGTAAAAATCAAC +GGTCAGGTTATTACGATCAAAGGTAAAAACGGCGAGCTGACTCGTACTCTCAACGATGCT +GTTGAAGTTAAACATGCAGATAATACCCTGACCTTCGGTCCGCGTGATGGTTACGCAGAC +GGTTGGGCACAGGCTGGTACCGCGCGTGCCCTGCTGAACTCAATGGTTATCGGTGTTACC +GAAGGCTTCACTAAGAAGCTGCAGCTGGTTGGTGTAGGTTACCGTGCAGCGGTTAAAGGC +AATGTGATTAACCTGTCTCTGGGTTTCTCTCATCCTGTTGACCATCAGCTGCCTGCGGGT +ATCACTGCTGAATGTCCGACTCAGACTGAAATCGTGCTGAAAGGCGCTGATAAGCAGGTG +ATCGGCCAGGTTGCAGCGGATCTGCGCGCCTACCGTCGTCCTGAGCCTTATAAAGGCAAG +GGTGTTCGTTACGCCGACGAAGTCGTGCGTACCAAAGAGGCTAAGAAGAAGTAA".to_string()) + .set_sequence_faa("MSRVAKAPVVVPAGVDVKINGQVITIKGKNGELTRTLNDAVEVKHADNTLTFGPRDGYAD +GWAQAGTARALLNSMVIGVTEGFTKKLQLVGVGYRAAVKGNVINLSLGFSHPVDHQLPAG +ITAECPTQTEIVLKGADKQVIGQVAADLRAYRRPEPYKGKGVRYADEVVRTKEAKKK".to_string()) + .set_codon_start(1) + .set_strand(-1); + //Add the full sequence of the entire record into the record.sequence + record.sequence = "TTAGAACTGAAGGCCAGCTTCACGGGCAGCATCTGCCAGTGCCTGGACACGACCATGATA +TTGGAACCCGGAACGGTCAAAGGATACATCTTTGATGCCTTTTTCCAGAGCGCGTTCAGC +GACAGCTTTACCCACAGCTGCAGCCGCGTCTTTGTTACCGGTGTACTTCAGTTGTTCAGC +GATAGCTTTTTCTACAGTAGAAGCAGCTACCAGAACTTCAGAACCGTTCGGTGCAATTAC +CTGTGCGTAAATGTGACGCGGGGTACGATGTACCACCAGGCGAGTTGCGCCCAGCTCCTG +GAGCTTGCGGCGTGCGCGGGTCGCACGACGGATACGAGCAGATTTCTTATCCATAGTGTT +ACCTTACTTCTTCTTAGCCTCTTTGGTACGCACGACTTCGTCGGCGTAACGAACACCCTT +GCCTTTATAAGGCTCAGGACGACGGTAGGCGCGCAGATCCGCTGCAACCTGGCCGATCAC +CTGCTTATCAGCGCCTTTCAGCACGATTTCAGTCTGAGTCGGACATTCAGCAGTGATACC +CGCAGGCAGCTGATGGTCAACAGGATGAGAGAAACCCAGAGACAGGTTAATCACATTGCC +TTTAACCGCTGCACGGTAACCTACACCAACCAGCTGCAGCTTCTTAGTGAAGCCTTCGGT +AACACCGATAACCATTGAGTTCAGCAGGGCACGCGCGGTACCAGCCTGTGCCCAACCGTC +TGCGTAACCATCACGCGGACCGAAGGTCAGGGTATTATCTGCATGTTTAACTTCAACAGC +ATCGTTGAGAGTACGAGTCAGCTCGCCGTTTTTACCTTTGATCGTAATAACCTGACCGTT +GATTTTTACGTCAACGCCGGCAGGAACAACGACCGGTGCTTTAGCAACACGAGACAT".to_string(); + gbk_write(seq_region, vec![record], &filename); + return Ok(()); + } diff --git a/seqmetrics/microBioRust/test_output.gbk b/seqmetrics/microBioRust/test_output.gbk new file mode 100644 index 0000000..d1e8825 --- /dev/null +++ b/seqmetrics/microBioRust/test_output.gbk @@ -0,0 +1,51 @@ +LOCUS source_1 910 bp DNA linear CON 01-NOV-2024 +DEFINITION Escherichia coli K-12 substr. MG1655. +ACCESSION source_1 +KEYWORDS . +SOURCE Escherichia coli K-12 substr. MG1655 + ORGANISM Escherichia coli K-12 substr. MG1655 +FEATURES Location/Qualifiers + source 1..910 + /organism="K-12 substr. MG1655" + /mol_type="DNA" + /strain="K-12 substr. MG1655" + /type_material="type strain of Escherichia coli K12" + /db_xref="PRJNA57779" + gene complement(1..354) + /locus_tag="b3304" + CDS complement(1..354) + /locus_tag="b3304" + /codon_start=1 + /gene="rplR" + /translation="MDKKSARIRRATRARRKLQELGATRLVVHRTPRHIYAQVIAPNGS + LVAASTVEKAIAEQLKYTGNKDAAAAVGKAVAERALEKGIKDVSFDRSGFQYHGRVQAL + DAAREAGLQ" + /product="50S ribosomal subunit protein L18" + gene complement(364..897) + /locus_tag="b3305" + CDS complement(364..897) + /locus_tag="b3305" + /codon_start=1 + /gene="rplF" + /translation="MSRVAKAPVVVPAGVDVKINGQVITIKGKNGELTRTLNDAVEVKH + NTLTFGPRDGYADGWAQAGTARALLNSMVIGVTEGFTKKLQLVGVGYRAAVKGNVINLS + GFSHPVDHQLPAGITAECPTQTEIVLKGADKQVIGQVAADLRAYRRPEPYKGKGVRYAD + VVRTKEAKK" + /product="50S ribosomal subunit protein L6" +ORIGIN + 1 TTAGAACTGA AGGCCAGCTT CACGGGCAGC ATCTGCCAGT GCCTGGACAC GACCATGATA + 61 TTGGAACCCG GAACGGTCAA AGGATACATC TTTGATGCCT TTTTCCAGAG CGCGTTCAGC + 121 GACAGCTTTA CCCACAGCTG CAGCCGCGTC TTTGTTACCG GTGTACTTCA GTTGTTCAGC + 181 GATAGCTTTT TCTACAGTAG AAGCAGCTAC CAGAACTTCA GAACCGTTCG GTGCAATTAC + 241 CTGTGCGTAA ATGTGACGCG GGGTACGATG TACCACCAGG CGAGTTGCGC CCAGCTCCTG + 301 GAGCTTGCGG CGTGCGCGGG TCGCACGACG GATACGAGCA GATTTCTTAT CCATAGTGTT + 361 ACCTTACTTC TTCTTAGCCT CTTTGGTACG CACGACTTCG TCGGCGTAAC GAACACCCTT + 421 GCCTTTATAA GGCTCAGGAC GACGGTAGGC GCGCAGATCC GCTGCAACCT GGCCGATCAC + 481 CTGCTTATCA GCGCCTTTCA GCACGATTTC AGTCTGAGTC GGACATTCAG CAGTGATACC + 541 CGCAGGCAGC TGATGGTCAA CAGGATGAGA GAAACCCAGA GACAGGTTAA TCACATTGCC + 601 TTTAACCGCT GCACGGTAAC CTACACCAAC CAGCTGCAGC TTCTTAGTGA AGCCTTCGGT + 661 AACACCGATA ACCATTGAGT TCAGCAGGGC ACGCGCGGTA CCAGCCTGTG CCCAACCGTC + 721 TGCGTAACCA TCACGCGGAC CGAAGGTCAG GGTATTATCT GCATGTTTAA CTTCAACAGC + 781 ATCGTTGAGA GTACGAGTCA GCTCGCCGTT TTTACCTTTG ATCGTAATAA CCTGACCGTT + 841 GATTTTTACG TCAACGCCGG CAGGAACAAC GACCGGTGCT TTAGCAACAC GAGACA +// diff --git a/seqmetrics/microBioRust/tests/create_new_record.rs b/seqmetrics/microBioRust/tests/create_new_record.rs new file mode 100644 index 0000000..f17ab28 --- /dev/null +++ b/seqmetrics/microBioRust/tests/create_new_record.rs @@ -0,0 +1,125 @@ +use microBioRust::embl::{gbk_write, gff_write, RangeValue, Record}; +use std::collections::BTreeMap; + + + /// Test to create a new record + /// We require a source, features, sequence features and a sequence + /// The source is top level, a single genbank file has one source, multi-genbank has one per contig + /// The SourceAttributes construct has a name (counter), start, stop, organism, moltype, strain, type material and db_xref + /// The FeatureAttributes construct has a locus tag (counter), gene, product, start, stop, codon start, strand + /// SourceAttribute start and stop are the coordinates of the source feature or per contig, FeatureAttributes start and stop are per coding sequence (CDS) + /// The SequenceAttributes construct has a locus tag (counter), start, stop, sequence_ffn, sequence_faa, codon start, and strand + /// SequenceAttribute start and stop, codon start and strand are duplicates of those in the FeatureAttributes + /// To add an entry requires using the set_ values such as set_start, set_stop, set_counter, set_strand + /// To write in GFF format requires gff_write(seq_region, record_vec, filename and true/false + /// The seq_region is the region of interest with name and DNA coordinates such as ``` "source_1".to_string(), (1,897) ``` + /// record_vec is a list of the records. If there is only one record ``` vec![record] ``` will suffice + /// filename is the required filename string, true/false is whether the DNA sequence should be included in the GFF3 file + /// Some GFF3 files have the DNA sequence, whilst others do not. Some tools require the DNA sequence included. + +#[test] +fn create_new_record() -> Result<(), anyhow::Error> { + //let filename = format!("new_record.gff"); + let mut record = Record::new(); + let mut seq_region: BTreeMap = BTreeMap::new(); + seq_region.insert("source_1".to_string(), (1, 910)); + record + .source_map + .set_counter("source_1".to_string()) + .set_start(RangeValue::Exact(1)) + .set_stop(RangeValue::Exact(910)) + .set_organism("Escherichia coli".to_string()) + .set_mol_type("DNA".to_string()) + .set_strain("K-12 substr. MG1655".to_string()) + // culture_collection.clone() + //.set_type_material("type strain of Escherichia coli K12".to_string()) + .set_db_xref("PRJNA57779".to_string()); + record + .cds + .set_counter("b3304".to_string()) + .set_start(RangeValue::Exact(1)) + .set_stop(RangeValue::Exact(354)) + .set_gene("rplR".to_string()) + .set_product("50S ribosomal subunit protein L18".to_string()) + .set_codon_start(1) + .set_strand(-1); + record + .cds + .set_counter("b3305".to_string()) + .set_start(RangeValue::Exact(364)) + .set_stop(RangeValue::Exact(897)) + .set_gene("rplF".to_string()) + .set_product("50S ribosomal subunit protein L6".to_string()) + .set_codon_start(1) + .set_strand(-1); + record + .seq_features + .set_counter("b3304".to_string()) + .set_start(RangeValue::Exact(1)) + .set_stop(RangeValue::Exact(354)) + .set_sequence_ffn( + "ATGGATAAGAAATCTGCTCGTATCCGTCGTGCGACCCGCGCACGCCGCAAGCTCCAGGAG +CTGGGCGCAACTCGCCTGGTGGTACATCGTACCCCGCGTCACATTTACGCACAGGTAATT +GCACCGAACGGTTCTGAAGTTCTGGTAGCTGCTTCTACTGTAGAAAAAGCTATCGCTGAA +CAACTGAAGTACACCGGTAACAAAGACGCGGCTGCAGCTGTGGGTAAAGCTGTCGCTGAA +CGCGCTCTGGAAAAAGGCATCAAAGATGTATCCTTTGACCGTTCCGGGTTCCAATATCAT +GGTCGTGTCCAGGCACTGGCAGATGCTGCCCGTGAAGCTGGCCTTCAGTTCTAA" + .to_string(), + ) + .set_sequence_faa( + "MDKKSARIRRATRARRKLQELGATRLVVHRTPRHIYAQVIAPNGSEVLVAASTVEKAIAE +QLKYTGNKDAAAAVGKAVAERALEKGIKDVSFDRSGFQYHGRVQALADAAREAGLQF" + .to_string(), + ) + .set_codon_start(1) + .set_strand(-1); + record + .seq_features + .set_counter("b3305".to_string()) + .set_start(RangeValue::Exact(364)) + .set_stop(RangeValue::Exact(897)) + .set_sequence_ffn( + "ATGTCTCGTGTTGCTAAAGCACCGGTCGTTGTTCCTGCCGGCGTTGACGTAAAAATCAAC +GGTCAGGTTATTACGATCAAAGGTAAAAACGGCGAGCTGACTCGTACTCTCAACGATGCT +GTTGAAGTTAAACATGCAGATAATACCCTGACCTTCGGTCCGCGTGATGGTTACGCAGAC +GGTTGGGCACAGGCTGGTACCGCGCGTGCCCTGCTGAACTCAATGGTTATCGGTGTTACC +GAAGGCTTCACTAAGAAGCTGCAGCTGGTTGGTGTAGGTTACCGTGCAGCGGTTAAAGGC +AATGTGATTAACCTGTCTCTGGGTTTCTCTCATCCTGTTGACCATCAGCTGCCTGCGGGT +ATCACTGCTGAATGTCCGACTCAGACTGAAATCGTGCTGAAAGGCGCTGATAAGCAGGTG +ATCGGCCAGGTTGCAGCGGATCTGCGCGCCTACCGTCGTCCTGAGCCTTATAAAGGCAAG +GGTGTTCGTTACGCCGACGAAGTCGTGCGTACCAAAGAGGCTAAGAAGAAGTAA" + .to_string(), + ) + .set_sequence_faa( + "MSRVAKAPVVVPAGVDVKINGQVITIKGKNGELTRTLNDAVEVKHADNTLTFGPRDGYAD +GWAQAGTARALLNSMVIGVTEGFTKKLQLVGVGYRAAVKGNVINLSLGFSHPVDHQLPAG +ITAECPTQTEIVLKGADKQVIGQVAADLRAYRRPEPYKGKGVRYADEVVRTKEAKKK" + .to_string(), + ) + .set_codon_start(1) + .set_strand(-1); + record.sequence = "acctctaccttagaactgaaggccagcttcacgggcagcatctgccagtgcctggacacg +accatgatattggaacccggaacggtcaaaggatacatctttgatgcctttttccagagc +gcgttcagcgacagctttacccacagctgcagccgcgtctttgttaccggtgtacttcag +ttgttcagcgatagctttttctacagtagaagcagctaccagaacttcagaaccgttcgg +tgcaattacctgtgcgtaaatgtgacgcggggtacgatgtaccaccaggcgagttgcgcc +cagctcctggagcttgcggcgtgcgcgggtcgcacgacggatacgagcagatttcttatc +catagtgttaccttacttcttcttagcctctttggtacgcacgacttcgtcggcgtaacg +aacacccttgcctttataaggctcaggacgacggtaggcgcgcagatccgctgcaacctg +gccgatcacctgcttatcagcgcctttcagcacgatttcagtctgagtcggacattcagc +agtgatacccgcaggcagctgatggtcaacaggatgagagaaacccagagacaggttaat +cacattgcctttaaccgctgcacggtaacctacaccaaccagctgcagcttcttagtgaa +gccttcggtaacaccgataaccattgagttcagcagggcacgcgcggtaccagcctgtgc +ccaaccgtctgcgtaaccatcacgcggaccgaaggtcagggtattatctgcatgtttaac +ttcaacagcatcgttgagagtacgagtcagctcgccgtttttacctttgatcgtaataac +ctgaccgttgatttttacgtcaacgccggcaggaacaacgaccggtgctttagcaacacg +agacattttttcc".to_string(); + gff_write( + seq_region.clone(), + vec![record.clone()], + "new_output_embl.gff", + true, + )?; + gbk_write(seq_region, vec![record], "new_output_embl.gbk")?; + return Ok(()); +} diff --git a/seqmetrics/microBioRust/tests/embl_to_faa.rs b/seqmetrics/microBioRust/tests/embl_to_faa.rs new file mode 100644 index 0000000..86933c7 --- /dev/null +++ b/seqmetrics/microBioRust/tests/embl_to_faa.rs @@ -0,0 +1,36 @@ +use microBioRust::embl::Reader; +use std::fs; +#[test] +fn embl_to_faa() -> Result<(), anyhow::Error> { + let file_embl = fs::File::open("example.embl")?; + let reader = Reader::new(file_embl); + let mut records = reader.records(); + let mut read_counter: u32 = 0; + loop { + match records.next() { + Some(Ok(record)) => { + //println!("next record"); + //println!("Record id: {:?}", record.id); + for (k, _v) in &record.cds.attributes { + match record.seq_features.get_sequence_faa(&k) { + Some(value) => { + let seq_faa = value.to_string(); + println!(">{}|{}\n{}", &record.id, &k, seq_faa); + } + _ => (), + }; + } + read_counter += 1; + } + Some(Err(e)) => { + println!("theres an err {:?}", e); + } + None => { + println!("finished iteration"); + break; + } + } + } + println!("Total records processed: {}", read_counter); + return Ok(()); +} diff --git a/seqmetrics/microBioRust/tests/embl_to_ffn.rs b/seqmetrics/microBioRust/tests/embl_to_ffn.rs new file mode 100644 index 0000000..70a6f7a --- /dev/null +++ b/seqmetrics/microBioRust/tests/embl_to_ffn.rs @@ -0,0 +1,36 @@ +use microBioRust::embl::Reader; +use std::fs; +#[test] +pub fn embl_to_ffn() -> Result<(), anyhow::Error> { + let file_embl = fs::File::open("example.embl")?; + let reader = Reader::new(file_embl); + let mut records = reader.records(); + let mut read_counter: u32 = 0; + loop { + match records.next() { + Some(Ok(record)) => { + //println!("next record"); + //println!("Record id: {:?}", record.id); + for (k, _v) in &record.cds.attributes { + match record.seq_features.get_sequence_ffn(&k) { + Some(value) => { + let seq_ffn = value.to_string(); + println!(">{}|{}\n{}", &record.id, &k, seq_ffn); + } + _ => (), + }; + } + read_counter += 1; + } + Some(Err(e)) => { + println!("theres an err {:?}", e); + } + None => { + println!("finished iteration"); + break; + } + } + } + println!("Total records processed: {}", read_counter); + return Ok(()); +} diff --git a/seqmetrics/microBioRust/tests/embl_to_gff.rs b/seqmetrics/microBioRust/tests/embl_to_gff.rs new file mode 100644 index 0000000..099cf6a --- /dev/null +++ b/seqmetrics/microBioRust/tests/embl_to_gff.rs @@ -0,0 +1,51 @@ +use microBioRust::embl::{gff_write, Reader, Record}; +use std::collections::BTreeMap; +use std::fs; + +#[test] +fn test_embl_to_gff() -> std::io::Result<()> { + let file_embl = fs::File::open("example.embl")?; + let reader = Reader::new(file_embl); + let mut records = reader.records(); + let mut read_counter: u32 = 0; + let mut prev_end: u32 = 0; + let mut seq_region: BTreeMap = BTreeMap::new(); + let mut record_vec: Vec = Vec::new(); + + while let Some(record_result) = records.next() { + match record_result { + Ok(record) => { + let sour = record + .source_map + .source_name + .clone() + .expect("Missing source name"); + let beginning = record + .source_map + .get_start(&sour) + .map(|v| v.get_value()) + .unwrap_or(0); + let ending = record + .source_map + .get_stop(&sour) + .map(|v| v.get_value()) + .unwrap_or(0); + + if ending + prev_end < beginning + prev_end { + println!("start > end: {:?}", beginning); + } + + seq_region.insert(sour, (beginning + prev_end, ending + prev_end)); + record_vec.push(record); + read_counter += 1; + prev_end += ending; + } + Err(e) => eprintln!("Error: {:?}", e), + } + } + + let output_file = "test_output_embl.gff"; + gff_write(seq_region.clone(), record_vec, output_file, true)?; + println!("Total records processed: {}", read_counter); + Ok(()) +} diff --git a/seqmetrics/microBioRust/tests/genbank_to_faa.rs b/seqmetrics/microBioRust/tests/genbank_to_faa.rs new file mode 100644 index 0000000..2dbee2e --- /dev/null +++ b/seqmetrics/microBioRust/tests/genbank_to_faa.rs @@ -0,0 +1,36 @@ +use microBioRust::gbk::Reader; +use std::fs; +#[test] +pub fn genbank_to_faa() -> Result<(), anyhow::Error> { + let file_gbk = fs::File::open("K12_ribo.gbk")?; + let reader = Reader::new(file_gbk); + let mut records = reader.records(); + let mut read_counter: u32 = 0; + loop { + match records.next() { + Some(Ok(record)) => { + //println!("next record"); + //println!("Record id: {:?}", record.id); + for (k, _v) in &record.cds.attributes { + match record.seq_features.get_sequence_faa(&k) { + Some(value) => { + let seq_faa = value.to_string(); + println!(">{}|{}\n{}", &record.id, &k, seq_faa); + } + _ => (), + }; + } + read_counter += 1; + } + Some(Err(e)) => { + println!("theres an err {:?}", e); + } + None => { + println!("finished iteration"); + break; + } + } + } + println!("Total records processed: {}", read_counter); + return Ok(()); +} diff --git a/seqmetrics/microBioRust/tests/genbank_to_ffn.rs b/seqmetrics/microBioRust/tests/genbank_to_ffn.rs new file mode 100644 index 0000000..7ba93aa --- /dev/null +++ b/seqmetrics/microBioRust/tests/genbank_to_ffn.rs @@ -0,0 +1,36 @@ +use microBioRust::gbk::Reader; +use std::fs; +#[test] +pub fn genbank_to_ffn() -> Result<(), anyhow::Error> { + let file_gbk = fs::File::open("K12_ribo.gbk")?; + let reader = Reader::new(file_gbk); + let mut records = reader.records(); + let mut read_counter: u32 = 0; + loop { + match records.next() { + Some(Ok(record)) => { + //println!("next record"); + //println!("Record id: {:?}", record.id); + for (k, _v) in &record.cds.attributes { + match record.seq_features.get_sequence_ffn(&k) { + Some(value) => { + let seq_ffn = value.to_string(); + println!(">{}|{}\n{}", &record.id, &k, seq_ffn); + } + _ => (), + }; + } + read_counter += 1; + } + Some(Err(e)) => { + println!("theres an err {:?}", e); + } + None => { + println!("finished iteration"); + break; + } + } + } + println!("Total records processed: {}", read_counter); + return Ok(()); +} diff --git a/seqmetrics/microBioRust/tests/genbank_to_gff.rs b/seqmetrics/microBioRust/tests/genbank_to_gff.rs new file mode 100644 index 0000000..9274ff9 --- /dev/null +++ b/seqmetrics/microBioRust/tests/genbank_to_gff.rs @@ -0,0 +1,58 @@ +use microBioRust::gbk::{gff_write, Reader, Record}; +use std::collections::BTreeMap; +use std::fs; +use std::io; +#[test] +pub fn genbank_to_gff() -> io::Result<()> { + let file_gbk = fs::File::open("K12_ribo.gbk")?; + let _prev_start: u32 = 0; + let mut prev_end: u32 = 0; + let reader = Reader::new(file_gbk); + let mut records = reader.records(); + let mut read_counter: u32 = 0; + let mut seq_region: BTreeMap = BTreeMap::new(); + let mut record_vec: Vec = Vec::new(); + loop { + match records.next() { + Some(Ok(record)) => { + //println!("next record"); + //println!("Record id: {:?}", record.id); + let sour = record + .source_map + .source_name + .clone() + .expect("issue collecting source name"); + let beginning = match record.source_map.get_start(&sour) { + Some(value) => value.get_value(), + _ => 0, + }; + let ending = match record.source_map.get_stop(&sour) { + Some(value) => value.get_value(), + _ => 0, + }; + if ending + prev_end < beginning + prev_end { + println!( + "debug since the end value smaller is than the start {:?}", + beginning + ); + } + seq_region.insert(sour, (beginning + prev_end, ending + prev_end)); + record_vec.push(record); + // Add additional fields to print if needed + read_counter += 1; + prev_end += ending; + } + Some(Err(e)) => { + println!("theres an err {:?}", e); + } + None => { + println!("finished iteration"); + break; + } + } + } + let output_file = format!("test_output.gff"); + gff_write(seq_region.clone(), record_vec, &output_file, true)?; + println!("Total records processed: {}", read_counter); + return Ok(()); +}