Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions phylogenetic/all-clades/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
configfile: os.path.join(workflow.basedir, "../defaults/mpxv/config.yaml")


if os.path.exists("config.yaml"):

configfile: "config.yaml"


include: "../rules/main.smk"


rule _all:
input:
rules.all.input,
14 changes: 14 additions & 0 deletions phylogenetic/clade-i/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
configfile: os.path.join(workflow.basedir, "../defaults/clade-i/config.yaml")


if os.path.exists("config.yaml"):

configfile: "config.yaml"


include: "../rules/main.smk"


rule _all:
input:
rules.all.input,
14 changes: 14 additions & 0 deletions phylogenetic/clade-iib/Snakefile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to rename hmpxv1 to clade-iib (which seems fine 🤷 I don't have much context here) then we should go the whole way and rename everything - defaults/hmpxv1 etc. (Perhaps you plan to do this -- it's a draft PR after all -- if so this can just serve as a reminder)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users in office hours and myself have often been confused by the mapping between the config files and the named builds, so I thought the workflow should at least match the build name.

If there's not push back from others here, I'll rename everything.

Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
configfile: os.path.join(workflow.basedir, "../defaults/hmpxv1/config.yaml")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's cleaner and easier to understand if we co-locate the config.yaml with the Snakefile, i.e.

$ tree phylogenetic/clade-iib
clade-iib
├── Snakefile
└── config.yaml

This raises the question of where to store clade-iib specific config files: phylogenetic/defaults/clade-iib or phylogenetic/clade-iib/defaults (or some variant thereof). If we look at some values in the config YAML (swapping hmpxv1 for clade-iib for clarity) we have things like:

color_scheme: "color_schemes.tsv"
auspice_config: "clade-iib/auspice_config.json"

which isn't very ergonomic from an external users point of view. I think something like the following is cleaner:

color_scheme: "color_schemes.tsv"
auspice_config: "auspice_config.json"
$ tree
phylogenetic
├── defaults
│   └── color_schemes.tsv
├── clade-iib
│   ├── Snakefile
│   ├── config.yaml
│   ├── defaults            # maybe use a defaults subdir?
│   │   └── auspice_config.json
│   └── auspice_config.json # or maybe just do this?

Where you have a path resolution of (1) analysis dir, (2) clade-iib, (3) phylogenetic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, totally see your point about the config not being ergonomic for external users here. On the other hand, as a maintainer, it's no longer clear from

color_scheme: "color_schemes.tsv"
auspice_config: "auspice_config.json"

which files need to be edited to change the config...I am also wary of the splintering of config files, which is making me reconsider the separate Snakefile approach. See proposal in nextstrain/cli#454

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for exploring ways to get different configfile "entrypoints" into nextstrain run beyond creating multiple Snakefiles. I originally did just that for avian-flu, but my solution wasn't as nice as your proposed use of nextstrain-pathogen.yaml which might have been why it didn't get any traction.

Even with the --configfile approach we'll still have subtleties for analysis directory usage such as "color_schemes.tsv" vs "clade-iib/auspice_config.json". I guess the solution here is better per-pathogen docs.



if os.path.exists("config.yaml"):

configfile: "config.yaml"


include: "../rules/main.smk"


rule _all:
input:
rules.all.input,
26 changes: 13 additions & 13 deletions phylogenetic/defaults/clade-i/config.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
reference: "defaults/clade-i/reference.fasta"
genome_annotation: "defaults/clade-i/genome_annotation.gff3"
genbank_reference: "defaults/clade-i/reference.gb"
include: "defaults/clade-i/include.txt"
exclude: "defaults/exclude.txt"
clades: "defaults/clades.tsv"
lat_longs: "defaults/lat_longs.tsv"
color_ordering: "defaults/color_ordering.tsv"
color_scheme: "defaults/color_schemes.tsv"
auspice_config: "defaults/clade-i/auspice_config.json"
description: "defaults/description.md"
tree_mask: "defaults/clade-i/tree_mask.tsv"
reference: "clade-i/reference.fasta"
genome_annotation: "clade-i/genome_annotation.gff3"
genbank_reference: "clade-i/reference.gb"
include: "clade-i/include.txt"
exclude: "exclude.txt"
clades: "clades.tsv"
lat_longs: "lat_longs.tsv"
color_ordering: "color_ordering.tsv"
color_scheme: "color_schemes.tsv"
auspice_config: "clade-i/auspice_config.json"
description: "description.md"
tree_mask: "clade-i/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/mpox/issues/33
Expand Down Expand Up @@ -59,7 +59,7 @@ recency: true
mask:
from_beginning: 800
from_end: 6422
maskfile: "defaults/clade-i/mask.bed"
maskfile: "clade-i/mask.bed"

colors:
ignore_categories:
Expand Down
26 changes: 13 additions & 13 deletions phylogenetic/defaults/hmpxv1/config.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
reference: "defaults/reference.fasta"
genome_annotation: "defaults/genome_annotation.gff3"
genbank_reference: "defaults/reference.gb"
include: "defaults/hmpxv1/include.txt"
exclude: "defaults/exclude.txt"
clades: "defaults/clades.tsv"
lat_longs: "defaults/lat_longs.tsv"
color_ordering: "defaults/color_ordering.tsv"
color_scheme: "defaults/color_schemes.tsv"
auspice_config: "defaults/hmpxv1/auspice_config.json"
description: "defaults/description.md"
tree_mask: "defaults/tree_mask.tsv"
reference: "reference.fasta"
genome_annotation: "genome_annotation.gff3"
genbank_reference: "reference.gb"
include: "hmpxv1/include.txt"
exclude: "exclude.txt"
clades: "clades.tsv"
lat_longs: "lat_longs.tsv"
color_ordering: "color_ordering.tsv"
color_scheme: "color_schemes.tsv"
auspice_config: "hmpxv1/auspice_config.json"
description: "description.md"
tree_mask: "tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/mpox/issues/33
Expand Down Expand Up @@ -101,4 +101,4 @@ recency: true
mask:
from_beginning: 800
from_end: 6422
maskfile: "defaults/mask.bed"
maskfile: "mask.bed"
26 changes: 13 additions & 13 deletions phylogenetic/defaults/hmpxv1_big/config.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
reference: "defaults/reference.fasta"
genome_annotation: "defaults/genome_annotation.gff3"
genbank_reference: "defaults/reference.gb"
include: "defaults/hmpxv1_big/include.txt"
exclude: "defaults/exclude.txt"
clades: "defaults/clades.tsv"
lat_longs: "defaults/lat_longs.tsv"
color_ordering: "defaults/color_ordering.tsv"
color_scheme: "defaults/color_schemes.tsv"
auspice_config: "defaults/hmpxv1_big/auspice_config.json"
description: "defaults/description.md"
tree_mask: "defaults/tree_mask.tsv"
reference: "reference.fasta"
genome_annotation: "genome_annotation.gff3"
genbank_reference: "reference.gb"
include: "hmpxv1_big/include.txt"
exclude: "exclude.txt"
clades: "clades.tsv"
lat_longs: "lat_longs.tsv"
color_ordering: "color_ordering.tsv"
color_scheme: "color_schemes.tsv"
auspice_config: "hmpxv1_big/auspice_config.json"
description: "description.md"
tree_mask: "tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/mpox/issues/33
Expand Down Expand Up @@ -64,4 +64,4 @@ recency: true
mask:
from_beginning: 800
from_end: 6422
maskfile: "defaults/mask.bed"
maskfile: "mask.bed"
26 changes: 13 additions & 13 deletions phylogenetic/defaults/mpxv/config.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
auspice_config: "defaults/mpxv/auspice_config.json"
include: "defaults/mpxv/include.txt"
exclude: "defaults/exclude.txt"
reference: "defaults/reference.fasta"
genome_annotation: "defaults/genome_annotation.gff3"
genbank_reference: "defaults/reference.gb"
lat_longs: "defaults/lat_longs.tsv"
color_ordering: "defaults/color_ordering.tsv"
color_scheme: "defaults/color_schemes.tsv"
description: "defaults/description.md"
clades: "defaults/clades.tsv"
tree_mask: "defaults/tree_mask.tsv"
auspice_config: "mpxv/auspice_config.json"
include: "mpxv/include.txt"
exclude: "exclude.txt"
reference: "reference.fasta"
genome_annotation: "genome_annotation.gff3"
genbank_reference: "reference.gb"
lat_longs: "lat_longs.tsv"
color_ordering: "color_ordering.tsv"
color_scheme: "color_schemes.tsv"
description: "description.md"
clades: "clades.tsv"
tree_mask: "tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/mpox/issues/33
Expand Down Expand Up @@ -94,4 +94,4 @@ recency: true
mask:
from_beginning: 1350
from_end: 6422
maskfile: "defaults/mask_overview.bed"
maskfile: "mask_overview.bed"
14 changes: 14 additions & 0 deletions phylogenetic/lineage-b.1/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
configfile: os.path.join(workflow.basedir, "../defaults/hmpxv1_big/config.yaml")


if os.path.exists("config.yaml"):

configfile: "config.yaml"


include: "../rules/main.smk"


rule _all:
input:
rules.all.input,
12 changes: 6 additions & 6 deletions phylogenetic/rules/annotate_phylogeny.smk
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ rule translate:
input:
tree=build_dir + "/{build_name}/tree.nwk",
node_data=build_dir + "/{build_name}/nt_muts.json",
genome_annotation=config["genome_annotation"],
genome_annotation=phylo_resolve_config_path(config["genome_annotation"]),
output:
node_data=build_dir + "/{build_name}/aa_muts.json",
log:
Expand Down Expand Up @@ -120,7 +120,7 @@ rule clades:
tree=build_dir + "/{build_name}/tree.nwk",
aa_muts=build_dir + "/{build_name}/aa_muts.json",
nuc_muts=build_dir + "/{build_name}/nt_muts.json",
clades=config["clades"],
clades=phylo_resolve_config_path(config["clades"]),
output:
node_data=build_dir + "/{build_name}/clades_raw.json",
log:
Expand Down Expand Up @@ -154,7 +154,7 @@ rule rename_clades:
r"""
exec &> >(tee {log:q})

python scripts/clades_renaming.py \
python {workflow.basedir}/../scripts/clades_renaming.py \
--input-node-data {input:q} \
--output-node-data {output.node_data:q}
"""
Expand All @@ -180,7 +180,7 @@ rule assign_clades_via_metadata:
r"""
exec &> >(tee {log:q})

python scripts/assign-clades-via-metadata.py \
python {workflow.basedir}/../scripts/assign-clades-via-metadata.py \
--metadata {input.metadata:q} \
--tree {input.tree:q} \
--output-node-data {output.node_data:q}
Expand All @@ -201,7 +201,7 @@ rule mutation_context:
r"""
exec &> >(tee {log:q})

python3 scripts/mutation_context.py \
python3 {workflow.basedir}/../scripts/mutation_context.py \
--tree {input.tree:q} \
--mutations {input.node_data:q} \
--output {output.node_data:q}
Expand All @@ -226,7 +226,7 @@ rule recency:
r"""
exec &> >(tee {log:q})

python3 scripts/construct-recency-from-submission-date.py \
python3 {workflow.basedir}/../scripts/construct-recency-from-submission-date.py \
--metadata {input.metadata:q} \
--metadata-id-columns {params.strain_id:q} \
--output {output:q} 2>&1
Expand Down
17 changes: 17 additions & 0 deletions phylogenetic/rules/config.smk
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,23 @@ from textwrap import dedent, indent
from typing import Union


include: "../../shared/vendored/snakemake/config.smk"


def phylo_resolve_config_path(path: str) -> Callable[[Wildcards], str]:
"""
Wrapper around the shared `resolve_config_path` to force the default directory
to be `phylogenetic/defaults`. This is necessary because the entry point for
each build is nested within phylogenetic (e.g. phylogenetic/clade-i/Snakefile).
"""
PHYLO_DEFAULTS_DIR = os.path.normpath(os.path.join(workflow.current_basedir, "../defaults"))
# Strip the `defaults/` prefix to be backwards compatible with older configs
# This is necessary in this wrapper because we are providing a custom defaults dir
# which skips the handling of the defaults/ prefix within resolve_config_path.
path = path.removeprefix("defaults/")
return resolve_config_path(path, PHYLO_DEFAULTS_DIR)


Comment on lines +10 to +26
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still wish we got rid of all this magic around defaults/ and the path in the config.yaml was the path used, relative to an ordered set of "base directories" which are easily documented. I understand people don't want external users to have to make a "defaults" directory, but if we used "config" instead I think there'd be less objection.

Secondly I still think phylo_resolve_config_path is overly verbose and leaves the door open for unintentionally using resolve_config_path (which is still in the namespace). I continue to think the following is preferable:

include: "../../shared/vendored/snakemake/config.smk"

_shared_resolve_config_path = resolve_config_path

def resolve_config_path(path: str) -> Callable[[Wildcards], str]:
    return _shared_resolve_config_path(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is only necessary because of the nested Snakefiles. If we keep the main phylogenetic/Snakefile then we could just directly use the shared resolve_config_path.

def as_list(config_param: Union[list,str]) -> list:
if isinstance(config_param, list):
return config_param
Expand Down
4 changes: 2 additions & 2 deletions phylogenetic/rules/construct_phylogeny.smk
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ rule tree:
"""
input:
alignment=build_dir + "/{build_name}/masked.fasta",
tree_mask=config["tree_mask"],
tree_mask=phylo_resolve_config_path(config["tree_mask"]),
output:
tree=build_dir + "/{build_name}/tree_raw.nwk",
threads: workflow.cores
Expand Down Expand Up @@ -64,7 +64,7 @@ rule fix_tree:
r"""
exec &> >(tee {log:q})

python3 scripts/fix_tree.py \
python3 {workflow.basedir}/../scripts/fix_tree.py \
--alignment {input.alignment:q} \
--input-tree {input.tree:q} \
{params.root} \
Expand Down
14 changes: 7 additions & 7 deletions phylogenetic/rules/export.smk
Original file line number Diff line number Diff line change
Expand Up @@ -43,16 +43,16 @@ rule remove_time:
r"""
exec &> >(tee {log:q})

python3 scripts/remove_timeinfo.py \
python3 {workflow.basedir}/../scripts/remove_timeinfo.py \
--input-node-data {input:q} \
--output-node-data {output:q}
"""


rule colors:
input:
ordering=config["color_ordering"],
color_schemes=config["color_scheme"],
ordering=phylo_resolve_config_path(config["color_ordering"]),
color_schemes=phylo_resolve_config_path(config["color_scheme"]),
metadata=build_dir + "/{build_name}/metadata.tsv",
output:
colors=build_dir + "/{build_name}/colors.tsv",
Expand All @@ -66,7 +66,7 @@ rule colors:
r"""
exec &> >(tee {log:q})

python3 scripts/assign-colors.py \
python3 {workflow.basedir}/../scripts/assign-colors.py \
--ordering {input.ordering:q} \
--color-schemes {input.color_schemes:q} \
--output {output.colors:q} \
Expand Down Expand Up @@ -102,9 +102,9 @@ rule export:
else []
),
colors=build_dir + "/{build_name}/colors.tsv",
lat_longs=config["lat_longs"],
description=config["description"],
auspice_config=config["auspice_config"],
lat_longs=phylo_resolve_config_path(config["lat_longs"]),
description=phylo_resolve_config_path(config["description"]),
auspice_config=phylo_resolve_config_path(config["auspice_config"]),
output:
auspice_json=build_dir + "/{build_name}/tree.json",
root_sequence=build_dir + "/{build_name}/tree_root-sequence.json",
Expand Down
22 changes: 12 additions & 10 deletions phylogenetic/Snakefile → phylogenetic/rules/main.smk
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,6 @@ if version.parse(augur_version) < version.parse(min_augur_version):
)
sys.exit(1)

if not config:

configfile: "defaults/hmpxv1/config.yaml"


build_dir = "results"
auspice_dir = "auspice"
Expand All @@ -38,18 +34,24 @@ rule all:
"""


include: "rules/config.smk"
include: "rules/prepare_sequences.smk"
include: "rules/construct_phylogeny.smk"
include: "rules/annotate_phylogeny.smk"
include: "rules/export.smk"
include: "config.smk"
include: "prepare_sequences.smk"
include: "construct_phylogeny.smk"
include: "annotate_phylogeny.smk"
include: "export.smk"


# Include custom rules defined in the config.
if "custom_rules" in config:
for rule_file in config["custom_rules"]:

include: rule_file
# Relative custom rule paths in the config are relative to the analysis
# directory (i.e. the current working directory, or workdir, usually
# given by --directory), but the "include" directive treats relative
# paths as relative to the workflow (e.g. workflow.current_basedir).
# Convert to an absolute path based on the analysis/current directory
# to avoid this mismatch of expectations.
include: os.path.join(os.getcwd(), rule_file)


rule clean:
Expand Down
Loading
Loading