-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hey,
I was wondering if someone has a better definition for some of the fields of the output files.
For example,
the following comes from 0_linearized_synteny_graph_edges.tsv:
"gene1 gene2 weight contiguous_region nb_internal_nodes_from_ancestor_with_updated_weight supporting_children predicted_edge_age_relative_to_root predicted_edge_lca
rootHOG_1 rootHOG_92 7 0 0 Phaeocystis_antarctica/Emiliania_huxleyi/Chrysochromulina_parva/Diacronema_lutheri/Chroomonas_placoidea/Teleaulax_amphioxeia/Guillardia_theta/Cryptomonas_paramecium/Cryptomonas_curvata/Boldia_erythrosiphon/Porphyra_purpurea/Membranoptera_tenuis/Gracilaria_firma/Cyanidium_caldarium/Nitzschia_traheaformis/Cyclotella_cryptica/Stephanodiscus_niagarae/Colpomenia_sinuosa/Cyanophora_sudae/Cyanophora_biloba/Glaucocystis_incrassata/Gloeochaete_wittrockiana/Matteuccia_struthiopteris/Ginkgo_biloba/Triticum_monococcum/Lactuca_sativa/Helianthus_annuus/Lepidodinium_chlorophorum/Colacium_mucronatum/Monomorphina_parapyrum/Cryptoglena_skujai/Trachelomonas_volvocina/Euglena_gracilis;Phaeocystis_globosa 1.00 Phaeocystis_globosa/Phaeocystis_antarctica/Emiliania_huxleyi/Chrysochromulina_parva/Diacronema_lutheri/Chroomonas_placoidea/Teleaulax_amphioxeia/Guillardia_theta/Cryptomonas_paramecium/Cryptomonas_curvata/Boldia_erythrosiphon/Porphyra_purpurea/Membranoptera_tenuis/Gracilaria_firma/Cyanidium_caldarium/Nitzschia_traheaformis/Cyclotella_cryptica/Stephanodiscus_niagarae/Colpomenia_sinuosa/Cyanophora_sudae/Cyanophora_biloba/Glaucocystis_incrassata/Gloeochaete_wittrockiana/Matteuccia_struthiopteris/Ginkgo_biloba/Triticum_monococcum/Lactuca_sativa/Helianthus_annuus/Lepidodinium_chlorophorum/Colacium_mucronatum/Monomorphina_parapyrum/Cryptoglena_skujai/Trachelomonas_volvocina/Euglena_gracilis"
The first two columns (Gene 1 and 2) are referencing the adjacent genes and rootHOG_1 and 2 I would assume means the ancestral versions of the genes thought to exist in the LCA of all species (in this example). However, in other outputs, like 11_linearized_synteny_graph_edges.tsv, I get the output:
"gene1 gene2 weight contiguous_region nb_internal_nodes_from_ancestor_with_updated_weight supporting_children predicted_edge_age_relative_to_root predicted_edge_lca
HOG_1 HOG_92_c3 1.0 0.0 6.0 Chroomonas_placoidea/Teleaulax_amphioxeia/Guillardia_theta 1.00 Phaeocystis_globosa/Phaeocystis_antarctica/Emiliania_huxleyi/Chrysochromulina_parva/Diacronema_lutheri/Chroomonas_placoidea/Teleaulax_amphioxeia/Guillardia_theta/Cryptomonas_paramecium/Cryptomonas_curvata/Boldia_erythrosiphon/Porphyra_purpurea/Membranoptera_tenuis/Gracilaria_firma/Cyanidium_caldarium/Nitzschia_traheaformis/Cyclotella_cryptica/Stephanodiscus_niagarae/Colpomenia_sinuosa/Cyanophora_sudae/Cyanophora_biloba/Glaucocystis_incrassata/Gloeochaete_wittrockiana/Matteuccia_struthiopteris/Ginkgo_biloba/Triticum_monococcum/Lactuca_sativa/Helianthus_annuus/Lepidodinium_chlorophorum/Colacium_mucronatum/Monomorphina_parapyrum/Cryptoglena_skujai/Trachelomonas_volvocina/Euglena_gracilis"
where gene 1 and 2 are refered to as HOG followed by some number (and in some cases also "c#"). What is the difference between rootHOG_##, HOG_##, and HOG_##_c##? I am also lost on how to interpret the following columns: contiguous_region nb_internal_nodes_from_ancestor_with_updated_weight, and predicted_edge_lca.
Thanks for your help!