Conversation
|
Hi @talioto, thanks for sending the EAR of Gesiella jameensis. |
|
Blobtools compute failed. This is kind of a stress testing genome. Need to work out some tmp space issues with nextflow pipeline. Instead we ran FCS-GX, which does not give any graphical output by default. Made a plot myself base just on the FCS output. Might extend it at some point to make it more blobplot-like. Anyway, ignore the automatic figure legend -- it does not apply. |
|
ok |
|
|
Hi @n-equals-one, do you agree to review this assembly? |
|
@n-equals-one Time is out! I will look for the next reviewer on the list :) |
|
|
Hi @auryjm, do you agree to review this assembly? |
|
Yes |
|
Thanks for agreeing! |
|
Ping @tbrown91, |
|
@auryjm have you had a chance to look at this assembly? |
|
This is a very large genome, and both the scaffolder and the curator did a great job! I noticed a few points, but I also have doubts about what we can reliably see in Pretext for such a large genome. The resolution might not be sufficient. For example, when curating dual haplotypes, once we switch back to a single haplotype we often see many features that were previously invisible. So I suggest generating a map for a single chromosome to better assess how well the curation went (or not). The number of duplicated genes is still high and might be due to duplications that could not be detected at this resolution. I also see a lot of white squares that may indicate phasing errors between haplotypes. Of course, we are not going to perform nine curations, but perhaps we can add some notes in the curation note and in the assembly description during submission. What do you think? Otherwise, here are the few points I noticed:
Here is my savestate: Thank you, |
|
Thanks @auryjm I just quickly went through your savestate: Agree with the unlocs in scaffold_1 I agree with Jean-Marc about the resolution. For example, maybe at scaffold_6-424,400 kb there is a haplotig, but I'm not sure given the current map Overall, it's looking really good. Nice job @talioto |
|
thanks for the comments. haven't had time to go over them, yet. I did this all during the Understanding Life meeting at Sanger. I agree that making one pretext per chromosome would be good. This, after discussing bird microchromosomes in the curation workshop. there were many potential issues. I felt like each chromosome was a genome. so, yes, it could benefit from higher resolution. This was done on a hires map, but still not high enough. |
|
Ping @tbrown91, |
2 similar comments
|
Ping @tbrown91, |
|
Ping @tbrown91, |
|
OK. I've come back to this one. I've generated pretextmaps for each SUPER. If anyone wants to curate a few SUPERs, let me know. This took me about 1.5 hours I'd say. 8 more to go. |
|
I will keep curating each SUPER one by one. I've finished SUPER_2. |
|
SUPER1:
|
|
SUPER2:
|
|
Hmm. My pretextview is crashing. I think all my savestates (except for SUPER_1) are corrupted. I'll have to start over. About a full days work gone. |
|
They look good to me, I was able to open your savestates for SUPER1, SUPER2, and SUPER3! |
|
I managed. I'm now on SUPER_7. Will finish by end of week I hope. One final check of savestates, then will process and merge outputs back into one assembly with all supers, folllowed by a final mapping to combined assembly as sanity check. |
|
SUPER3:
|
|
Hi @talioto |
done. I am still uneasy about removing all of these duplications that occur at gaps but that don't drop in coverage. If coverage is higher, I leave them. If the signal is continuous across the gap, I'm inclined to leave them, but most are sharp breaks, so the majority of them I am taking out as haptigs. Internally contigged ones I leave, trusting hifiasm. The number of these is not low, making me think perhaps we are over-purging on the ones at gaps. Tandem seg dups is exactly where hifiasm would break the assembly graph, right? When it can't determine the proper number of cycles?? Anyway, I will keep going as I have been and we can evaluate the final result... |
|
Can't get all the way through SUPER_8 without crashing. I've done it twice to about 2/3 of the way through the chromosome. I need to submit an issue on github for pretextview - if it's platform specific, maybe I'll try on a different architecture - old mac, windows or linux. but it's likely just a pointer arithmetic bug - maybe it only happens occasionally on hi-res maps. |
|
Hi @talioto, |
|
If I manage to produce a new assembly (I have exported agps), I will try again and do only normal resolution on each super and see if that works. |
|
Attention @auryjm, the EAR PDF was updated. |
|
@auryjm , I was able to finish curation of all the supers (similar to strategy you observed on the first few supers), but unable to save state. I was able to output agps though, so I've processed the changes, generated a new assembly and EAR. Looking now at it, perhaps the sequence now called unplaced_4 (coming from the previous SUPER_4) could be placed at the beginning of SUPER_1, perhaps breaking off the part with no contacts and orienting with telomeric sequence at beginning. This is the downside of going super by super. inter-super translocations are missed during curation. If you see anything else, let me know. I am so close to being done with this one. |
Assembly review request