Skip to content

Commit ff58c4e

Browse files
committed
modules/clusterblast: update for antiSMASH 8 visuals
1 parent ffc3482 commit ff58c4e

File tree

2 files changed

+13
-18
lines changed

2 files changed

+13
-18
lines changed

docs/img/knownclusterblast.png

-7.36 KB
Loading

docs/modules/clusterblast.md

Lines changed: 13 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,15 @@ with a minimum percentage identity between genes of 30%.
1616

1717
It is normal to have multiple genes hitting for some types of genes (e.g. modular systems such as NRPS or Type I PKS clusters).
1818

19+
As gene hits are not required to be 100% identity and query genes may hit multiple reference genes,
20+
all genes having a match is not a guarantee that the region is exactly the same.
21+
In the case of KnownClusterBlast, this also means that there is no guarantee that the compound(s) recorded for that MIBiG entry will be produced by the region.
22+
23+
Even if 100% of genes have a hit for a reference, it may be less relevant than a lower similarity.
24+
Some cluster types, e.g. NRPS clusters, may only need a few aminos changed in gene translations to have a completely different product.
25+
26+
In all cases, manual verification is required before assuming that the region produces the same compound as the reference.
27+
1928
### Ranking system
2029

2130
Reference areas are sorted based on an empirical similarity score and then,
@@ -30,23 +39,9 @@ The emprical similarity score is calculated as `h + H + s + S + 
3039
- `B` is a core gene bonus
3140

3241
If the similarity scores are equal for multiple references, they are then ranked based on
33-
the cumulative BlastP bit scores between the gene clusters.
34-
35-
### Similarity percentage
36-
37-
Similarity in the description, e.g. `87% of genes show similarity`,
38-
is the percentage of genes within the reference that have a hit to any genes in the query.
39-
40-
As gene hits are not required to be 100% identity and query genes may hit multiple reference genes,
41-
this total similarity percentage is no guarantee that the region is exactly the same.
42-
In the case of KnownClusterBlast, this also means that there is no guarantee that the compound(s) recorded for that MIBiG entry will be produce by the region.
43-
44-
Even if 100% of genes have a hit for a reference, it may be less relevant than a lower similarity.
45-
Some cluster types, e.g. NRPS clusters, may only need a few aminos changed in gene translations to have a completely different product.
46-
47-
In all cases, manual verification is required before assuming that the region produces the same compound as the reference.
42+
the cumulative BlastP bit scores of those references.
4843

49-
#### Example 1: low similarity, good match
44+
#### Example 1: not all genes, but good match
5045
Reference area `R` has 70% of genes showing similarity to the query region `Q`.
5146
All genes with hits are very high identity in their hits, at 95% or higher.
5247

@@ -57,14 +52,14 @@ but are outside `Q` due to the size of `R` being exceptionally large.
5752
After manually checking these extra genes and seeing that they're similar to the missing genes,
5853
it's much, much more likely that the genome matches the reference.
5954

60-
#### Example 2: perfect similarity, poor match
55+
#### Example 2: all genes, but poor match
6156
Reference area `R` has 100% of genes showing similarity to the query region `Q`.
6257
None of the genes have a percentage identity in individual hits greater than 60%.
6358

6459
While it is still possible that `Q` produces the same compound as `R`,
6560
it will depend a great deal on the type of cluster and exactly which parts of the genes are similar.
6661

67-
#### Example 3: high similarity, poor match
62+
#### Example 3: most genes, but poor match
6863
Reference area `R` has very high (but not 100%) similarity, with all but one gene in `R` having similarity to genes in the query region `Q`.
6964
All of the matching genes have very high identity in their hits.
7065

0 commit comments

Comments
 (0)