-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Overview
Want to add two new metrics to describe alignment quality:
- average sequence length
- stddev of sequence lengths
Detail
The cath-align-summary script provides some metrics to describe a (FunFam) sequence alignment: number of sequences, alignment length, dops score, gap positions, total positions.
This summary information for each alignment is currently stored in the cathpy.core.util.AlignmentSummary class:
Lines 527 to 536 in 9e24388
| class AlignmentSummary(object): | |
| """Stores summary information about an alignment.""" | |
| def __init__(self, *, path, dops, aln_length, seq_count, gap_count, total_positions): | |
| self.path = path | |
| self.dops = float(dops) if dops is not None else None | |
| self.aln_length = int(aln_length) | |
| self.seq_count = int(seq_count) | |
| self.gap_count = int(gap_count) | |
| self.total_positions = int(total_positions) |
This could be changed to include attributes that store average_domain_length and stddev_domain_length.
These AlignmentSummary objects are created by AlignmentSummaryRunner (ie a process that generates an alignment summary for each STOCKHOLM alignment).
We would need to calculate these values and add them to the summary object within that runner:
Line 612 in 9e24388
| def run(self): |
Making changes
General approach to making changes:
- clone this repo
- create a new branch (eg called
feature/new_alignment_metrics) - add a test to check that your feature is working
Lines 36 to 48 in 9e24388
def test_alignment_summary_file(self): runner = AlignmentSummaryRunner( aln_file=self.merge_sto_file) entries = runner.run() self.assertEqual(len(entries), 1) summary = entries[0] self.assertEqual(summary.aln_length, 92) self.assertEqual(summary.dops, 88.422) self.assertEqual(summary.gap_count, 25228) self.assertEqual(summary.total_positions, 64492) self.assertEqual(summary.seq_count, 701) self.assertEqual(round(summary.gap_per, 2), round(39.12, 2)) - add the code to make your new test pass
- make sure your changes have not broken anything else (ie run
pytest) - commit your changes, push back to origin (GitHub)
- make sure your changes have not broken anything else
- create a pull request (PR)
- someone else (me) reviews the code
- code is merged into
masterbranch - we add your name as an official contributor to
cathpy:)