Skip to content

Interpreting output #29

@DannyKlee

Description

@DannyKlee

Hey,
I was wondering if someone has a better definition for some of the fields of the output files.
For example,
the following comes from 0_linearized_synteny_graph_edges.tsv:
"gene1 gene2 weight contiguous_region nb_internal_nodes_from_ancestor_with_updated_weight supporting_children predicted_edge_age_relative_to_root predicted_edge_lca
rootHOG_1 rootHOG_92 7 0 0 Phaeocystis_antarctica/Emiliania_huxleyi/Chrysochromulina_parva/Diacronema_lutheri/Chroomonas_placoidea/Teleaulax_amphioxeia/Guillardia_theta/Cryptomonas_paramecium/Cryptomonas_curvata/Boldia_erythrosiphon/Porphyra_purpurea/Membranoptera_tenuis/Gracilaria_firma/Cyanidium_caldarium/Nitzschia_traheaformis/Cyclotella_cryptica/Stephanodiscus_niagarae/Colpomenia_sinuosa/Cyanophora_sudae/Cyanophora_biloba/Glaucocystis_incrassata/Gloeochaete_wittrockiana/Matteuccia_struthiopteris/Ginkgo_biloba/Triticum_monococcum/Lactuca_sativa/Helianthus_annuus/Lepidodinium_chlorophorum/Colacium_mucronatum/Monomorphina_parapyrum/Cryptoglena_skujai/Trachelomonas_volvocina/Euglena_gracilis;Phaeocystis_globosa 1.00 Phaeocystis_globosa/Phaeocystis_antarctica/Emiliania_huxleyi/Chrysochromulina_parva/Diacronema_lutheri/Chroomonas_placoidea/Teleaulax_amphioxeia/Guillardia_theta/Cryptomonas_paramecium/Cryptomonas_curvata/Boldia_erythrosiphon/Porphyra_purpurea/Membranoptera_tenuis/Gracilaria_firma/Cyanidium_caldarium/Nitzschia_traheaformis/Cyclotella_cryptica/Stephanodiscus_niagarae/Colpomenia_sinuosa/Cyanophora_sudae/Cyanophora_biloba/Glaucocystis_incrassata/Gloeochaete_wittrockiana/Matteuccia_struthiopteris/Ginkgo_biloba/Triticum_monococcum/Lactuca_sativa/Helianthus_annuus/Lepidodinium_chlorophorum/Colacium_mucronatum/Monomorphina_parapyrum/Cryptoglena_skujai/Trachelomonas_volvocina/Euglena_gracilis"

The first two columns (Gene 1 and 2) are referencing the adjacent genes and rootHOG_1 and 2 I would assume means the ancestral versions of the genes thought to exist in the LCA of all species (in this example). However, in other outputs, like 11_linearized_synteny_graph_edges.tsv, I get the output:
"gene1 gene2 weight contiguous_region nb_internal_nodes_from_ancestor_with_updated_weight supporting_children predicted_edge_age_relative_to_root predicted_edge_lca
HOG_1 HOG_92_c3 1.0 0.0 6.0 Chroomonas_placoidea/Teleaulax_amphioxeia/Guillardia_theta 1.00 Phaeocystis_globosa/Phaeocystis_antarctica/Emiliania_huxleyi/Chrysochromulina_parva/Diacronema_lutheri/Chroomonas_placoidea/Teleaulax_amphioxeia/Guillardia_theta/Cryptomonas_paramecium/Cryptomonas_curvata/Boldia_erythrosiphon/Porphyra_purpurea/Membranoptera_tenuis/Gracilaria_firma/Cyanidium_caldarium/Nitzschia_traheaformis/Cyclotella_cryptica/Stephanodiscus_niagarae/Colpomenia_sinuosa/Cyanophora_sudae/Cyanophora_biloba/Glaucocystis_incrassata/Gloeochaete_wittrockiana/Matteuccia_struthiopteris/Ginkgo_biloba/Triticum_monococcum/Lactuca_sativa/Helianthus_annuus/Lepidodinium_chlorophorum/Colacium_mucronatum/Monomorphina_parapyrum/Cryptoglena_skujai/Trachelomonas_volvocina/Euglena_gracilis"
where gene 1 and 2 are refered to as HOG followed by some number (and in some cases also "c#"). What is the difference between rootHOG_##, HOG_##, and HOG_##_c##? I am also lost on how to interpret the following columns: contiguous_region nb_internal_nodes_from_ancestor_with_updated_weight, and predicted_edge_lca.

Thanks for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions