Skip to content

wpGesJame1 EAR#274

Open
talioto wants to merge 2 commits intoERGA-consortium:mainfrom
talioto:wpGesJame1
Open

wpGesJame1 EAR#274
talioto wants to merge 2 commits intoERGA-consortium:mainfrom
talioto:wpGesJame1

Conversation

@talioto
Copy link
Collaborator

@talioto talioto commented Oct 30, 2025

Assembly review request

  • ToLID: wpGesJame1
  • Species: Gesiella jameensis
  • Project: ERGA-BGE
  • Affiliation: CNAG

@erga-ear-bot
Copy link
Contributor

erga-ear-bot bot commented Oct 30, 2025

Hi @talioto, thanks for sending the EAR of Gesiella jameensis.
I added the corresponding tag to the PR and will contact a supervisor and a reviewer ASAP.

@erga-ear-bot
Copy link
Contributor

erga-ear-bot bot commented Oct 30, 2025

Hi @tbrown91, do you agree to supervise this assembly?
Please reply to this message only with OK to give acknowledge.

@talioto
Copy link
Collaborator Author

talioto commented Oct 30, 2025

Blobtools compute failed. This is kind of a stress testing genome. Need to work out some tmp space issues with nextflow pipeline. Instead we ran FCS-GX, which does not give any graphical output by default. Made a plot myself base just on the FCS output. Might extend it at some point to make it more blobplot-like. Anyway, ignore the automatic figure legend -- it does not apply.

@tbrown91
Copy link
Collaborator

ok

@erga-ear-bot
Copy link
Contributor

erga-ear-bot bot commented Oct 30, 2025

*****
EAR Reviewer Selection Process
Date: 2025-10-30 09:12

All Eligible Candidates:

Github ID     | Full Name       | Institution | Total Reviews | Last Review | Active | Working PRs | Calling Score | Adjusted Score
-----------------------------------------------------------------------------------------------------------------------------------
n-equals-one  | Tomas Larsson   | SciLifeLab  | 0             | NA          | Y      | 1           | 1017          | 1097          
EmilieTeo     | Emilie Teodori  | Genoscope   | 10            | 2025-10-21  | Y      | 1           | 1038          | 1068          
CaroB-M       | Caroline Menguy | Genoscope   | 11            | 2025-10-20  | Y      | 1           | 1038          | 1068          
ldemirdj      | Lola Demirdjian | Genoscope   | 11            | 2025-10-14  | Y      | 1           | 1037          | 1067          
gbdias        | Guilherme Dias  | SciLifeLab  | 3             | 2025-10-23  | Y      | 0           | 1014          | 1064          
andar27       | Adama Ndar      | Genoscope   | 6             | 2025-10-17  | Y      | 2           | 1042          | 1052          
bistace       | Benjamin Istace | Genoscope   | 7             | 2025-10-17  | Y      | 2           | 1042          | 1052          
SarahPelan    | Sarah Pelan     | Sanger      | 6             | 2025-09-16  | Y      | 1           | 1019          | 1049          
joannacollins | Jo Collins      | Sanger      | 6             | 2025-10-10  | Y      | 1           | 1019          | 1049          
tommathers    | Tom Mathers     | Sanger      | 7             | 2025-10-16  | Y      | 1           | 1018          | 1048          
DomAbsolon    | Dom Absolon     | Sanger      | 7             | 2025-10-20  | Y      | 1           | 1018          | 1048          
auryjm        | Jean-Marc Aury  | Genoscope   | 10            | 2025-10-20  | Y      | 2           | 1038          | 1048          
MartinPippel  | Martin Pippel   | SciLifeLab  | 2             | 2025-10-06  | Y      | 1           | 1015          | 1045          
diegomics     | Diego De Panis  | IZW         | 11            | 2025-09-22  | Y      | 0           | 992           | 1037          
tbrown91      | Tom Brown       | IZW         | 11            | 2025-07-29  | Y      | 1           | 995           | 1020          

Selected reviewer: Tomas Larsson (n-equals-one)
The decision was based on:
- different institution ('SciLifeLab')
- active ('Y')
- working on 1 PR(s) currently
- highest adjusted calling score in this particular selection (1097)
  (Note: Adjusted score already considering -20 points due to 1 ongoing PR(s))

@erga-ear-bot
Copy link
Contributor

erga-ear-bot bot commented Oct 30, 2025

Hi @n-equals-one, do you agree to review this assembly?
Please reply to this message only with Yes or No by 05-Nov-2025 at 14:12 CET

@erga-ear-bot
Copy link
Contributor

erga-ear-bot bot commented Nov 5, 2025

@n-equals-one Time is out! I will look for the next reviewer on the list :)

@erga-ear-bot
Copy link
Contributor

erga-ear-bot bot commented Nov 5, 2025

*****
EAR Reviewer Selection Process
Date: 2025-11-05 21:18

All Eligible Candidates:

Github ID     | Full Name       | Institution | Total Reviews | Last Review | Active | Working PRs | Calling Score | Adjusted Score
-----------------------------------------------------------------------------------------------------------------------------------
auryjm        | Jean-Marc Aury  | Genoscope   | 11            | 2025-11-04  | Y      | 1           | 1038          | 1068          
EmilieTeo     | Emilie Teodori  | Genoscope   | 11            | 2025-11-05  | Y      | 1           | 1038          | 1068          
DomAbsolon    | Dom Absolon     | Sanger      | 8             | 2025-11-04  | Y      | 0           | 1017          | 1067          
n-equals-one  | Tomas Larsson   | SciLifeLab  | 0             | NA          | Y      | 3           | 1019          | 1059          
bistace       | Benjamin Istace | Genoscope   | 7             | 2025-10-17  | Y      | 2           | 1043          | 1053          
andar27       | Adama Ndar      | Genoscope   | 7             | 2025-10-31  | Y      | 2           | 1042          | 1052          
SarahPelan    | Sarah Pelan     | Sanger      | 6             | 2025-09-16  | Y      | 1           | 1019          | 1049          
joannacollins | Jo Collins      | Sanger      | 6             | 2025-10-10  | Y      | 1           | 1019          | 1049          
CaroB-M       | Caroline Menguy | Genoscope   | 11            | 2025-10-20  | Y      | 2           | 1039          | 1049          
tommathers    | Tom Mathers     | Sanger      | 7             | 2025-10-16  | Y      | 1           | 1018          | 1048          
ldemirdj      | Lola Demirdjian | Genoscope   | 11            | 2025-10-14  | Y      | 2           | 1038          | 1048          
MartinPippel  | Martin Pippel   | SciLifeLab  | 2             | 2025-10-06  | Y      | 1           | 1017          | 1047          
gbdias        | Guilherme Dias  | SciLifeLab  | 3             | 2025-10-23  | Y      | 1           | 1016          | 1046          
diegomics     | Diego De Panis  | IZW         | 11            | 2025-09-22  | Y      | 0           | 992           | 1037          
tbrown91      | Tom Brown       | IZW         | 11            | 2025-07-29  | Y      | 1           | 995           | 1020          

Selected reviewer: Jean-Marc Aury (auryjm)
The decision was based on:
- different institution ('Genoscope')
- active ('Y')
- working on 1 PR(s) currently
- oldest review and fewest reviews among the finalists (1068)
  (Note: Adjusted score already considering -20 points due to 1 ongoing PR(s))

@erga-ear-bot
Copy link
Contributor

erga-ear-bot bot commented Nov 5, 2025

Hi @auryjm, do you agree to review this assembly?
Please reply to this message only with Yes or No by 12-Nov-2025 at 02:18 CET

@auryjm
Copy link
Collaborator

auryjm commented Nov 6, 2025

Yes

@erga-ear-bot erga-ear-bot bot requested a review from auryjm November 6, 2025 07:33
@erga-ear-bot
Copy link
Contributor

erga-ear-bot bot commented Nov 6, 2025

Thanks for agreeing!
I appointed you as the EAR reviewer.
I will track this as one of your Working PRs until you finish this review.
Please check the Wiki if you need to refresh something. (and remember that you must download the EAR PDF to be able to click on the link to the contact map file!)
Contact the PR assignee for any issues.

@erga-ear-bot
Copy link
Contributor

erga-ear-bot bot commented Nov 13, 2025

Ping @tbrown91,
One week without any movements on this PR!

@tbrown91
Copy link
Collaborator

@auryjm have you had a chance to look at this assembly?

@auryjm
Copy link
Collaborator

auryjm commented Nov 14, 2025

Hi @talioto and @tbrown91,

This is a very large genome, and both the scaffolder and the curator did a great job!

I noticed a few points, but I also have doubts about what we can reliably see in Pretext for such a large genome. The resolution might not be sufficient. For example, when curating dual haplotypes, once we switch back to a single haplotype we often see many features that were previously invisible.

So I suggest generating a map for a single chromosome to better assess how well the curation went (or not). The number of duplicated genes is still high and might be due to duplications that could not be detected at this resolution. I also see a lot of white squares that may indicate phasing errors between haplotypes. Of course, we are not going to perform nine curations, but perhaps we can add some notes in the curation note and in the assembly description during submission. What do you think?

Otherwise, here are the few points I noticed:

  • several scaffolds might be tagged as Unloc (I left them in the middle of chromosomes so they should be easy to spot),
  • several haplotigs (again, I left them inside chromosomes to help you identify them),
  • I flipped a region of scaffold_6 (around 900 Mb–908 Mb),
  • I moved scaffold_54 two scaffolds earlier.

Here is my savestate:
wpGesJame1.hfsm.pri.yhs_ec_mq10.hr.ext.pretext.savestate_JMA.txt

Thank you,
Jean-Marc

@tbrown91
Copy link
Collaborator

Thanks @auryjm

I just quickly went through your savestate:

Agree with the unlocs in scaffold_1
Not sure about the haplotig in scaffold_3
Not sure I agree with the unlocs in scaffold_8

I agree with Jean-Marc about the resolution. For example, maybe at scaffold_6-424,400 kb there is a haplotig, but I'm not sure given the current map

Overall, it's looking really good. Nice job @talioto

@talioto
Copy link
Collaborator Author

talioto commented Nov 14, 2025

thanks for the comments. haven't had time to go over them, yet. I did this all during the Understanding Life meeting at Sanger. I agree that making one pretext per chromosome would be good. This, after discussing bird microchromosomes in the curation workshop. there were many potential issues. I felt like each chromosome was a genome. so, yes, it could benefit from higher resolution. This was done on a hires map, but still not high enough.

@erga-ear-bot
Copy link
Contributor

erga-ear-bot bot commented Nov 21, 2025

Ping @tbrown91,
One week without any movements on this PR!

2 similar comments
@erga-ear-bot
Copy link
Contributor

erga-ear-bot bot commented Nov 29, 2025

Ping @tbrown91,
One week without any movements on this PR!

@erga-ear-bot
Copy link
Contributor

erga-ear-bot bot commented Dec 6, 2025

Ping @tbrown91,
One week without any movements on this PR!

@talioto
Copy link
Collaborator Author

talioto commented Jan 16, 2026

OK. I've come back to this one. I've generated pretextmaps for each SUPER.
Please take a look at SUPER_1
pretext
savestate
The higher resolution has enabled me to remove many more haptigs. Any occurring at gaps I am now removing regardless of coverage. If they are higher coverage than normal, I leave them, since they indicate a collapsed repeat. I can often see fewer hi-c contacts but no change in HiFi depth. I remove these now as there is usually some continuity break, too.

If anyone wants to curate a few SUPERs, let me know. This took me about 1.5 hours I'd say. 8 more to go.

@erga-ear-bot erga-ear-bot bot removed the STALLED label Jan 16, 2026
@talioto
Copy link
Collaborator Author

talioto commented Jan 16, 2026

I will keep curating each SUPER one by one. I've finished SUPER_2.
All SUPERs that are done will have a savestate in this Onedrive folder.

@auryjm
Copy link
Collaborator

auryjm commented Jan 19, 2026

SUPER1:

  • Invert the region from 612.74 Mbp to 616.36 Mbp.
  • The last two contigs are not marked as haplotigs.

@auryjm
Copy link
Collaborator

auryjm commented Jan 19, 2026

SUPER2:

  • 375.78Mb - 375.97Mbp => haplotig
  • 929.27Mbp - 929.38Mbp => haplotig

@talioto
Copy link
Collaborator Author

talioto commented Jan 19, 2026

Hmm. My pretextview is crashing. I think all my savestates (except for SUPER_1) are corrupted. I'll have to start over. About a full days work gone.

@auryjm
Copy link
Collaborator

auryjm commented Jan 19, 2026

They look good to me, I was able to open your savestates for SUPER1, SUPER2, and SUPER3!

@talioto
Copy link
Collaborator Author

talioto commented Jan 21, 2026

I managed. I'm now on SUPER_7. Will finish by end of week I hope. One final check of savestates, then will process and merge outputs back into one assembly with all supers, folllowed by a final mapping to combined assembly as sanity check.

@auryjm
Copy link
Collaborator

auryjm commented Jan 21, 2026

SUPER3:

  • Move the 373.67–373.79 region into the gap at 375.06

@auryjm
Copy link
Collaborator

auryjm commented Jan 21, 2026

Hi @talioto
Could you please add savestate for SUPER 4, 5 and 6 in the onedrive directory ?

@talioto
Copy link
Collaborator Author

talioto commented Jan 21, 2026

Could you please add savestate for SUPER 4, 5 and 6 in the onedrive directory ?

done.
I'll work on 7, 8 and 9 today.

I am still uneasy about removing all of these duplications that occur at gaps but that don't drop in coverage. If coverage is higher, I leave them. If the signal is continuous across the gap, I'm inclined to leave them, but most are sharp breaks, so the majority of them I am taking out as haptigs. Internally contigged ones I leave, trusting hifiasm. The number of these is not low, making me think perhaps we are over-purging on the ones at gaps. Tandem seg dups is exactly where hifiasm would break the assembly graph, right? When it can't determine the proper number of cycles?? Anyway, I will keep going as I have been and we can evaluate the final result...

@talioto
Copy link
Collaborator Author

talioto commented Jan 22, 2026

Can't get all the way through SUPER_8 without crashing. I've done it twice to about 2/3 of the way through the chromosome. I need to submit an issue on github for pretextview - if it's platform specific, maybe I'll try on a different architecture - old mac, windows or linux. but it's likely just a pointer arithmetic bug - maybe it only happens occasionally on hi-res maps.

@auryjm
Copy link
Collaborator

auryjm commented Jan 27, 2026

Hi @talioto,
I managed to open SUPER_4, 5, 6, and 7, but I couldn’t load the savestate. I tried several times for each SUPER, but every attempt failed!

@talioto
Copy link
Collaborator Author

talioto commented Jan 28, 2026

If I manage to produce a new assembly (I have exported agps), I will try again and do only normal resolution on each super and see if that works.

@erga-ear-bot
Copy link
Contributor

erga-ear-bot bot commented Feb 13, 2026

Attention @auryjm, the EAR PDF was updated.

@talioto
Copy link
Collaborator Author

talioto commented Feb 13, 2026

@auryjm , I was able to finish curation of all the supers (similar to strategy you observed on the first few supers), but unable to save state. I was able to output agps though, so I've processed the changes, generated a new assembly and EAR.

Looking now at it, perhaps the sequence now called unplaced_4 (coming from the previous SUPER_4) could be placed at the beginning of SUPER_1, perhaps breaking off the part with no contacts and orienting with telomeric sequence at beginning. This is the downside of going super by super. inter-super translocations are missed during curation.

If you see anything else, let me know. I am so close to being done with this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants