Skip to content

Feature Request: Support Configurable Folder‑Name Parsing via Naming Strategy + Regex #5

@Alienor134

Description

@Alienor134

Feature
Many datasets encode acquisition parameters directly in the folder name. For example:

2025-04-09_Dronpa2_zoom3_filter_405_exp3

It would be nice to add a configurable naming strategy so users can define how parameter values should be parsed from folder names.

Proposed Solution
Allow users to specify a naming template that maps semantic configuration keys to their location in the folder name. For example:

$DATE$_$PROTEIN$_zoom$ZOOM$_filter_$FILTER$_exp$EXP_NUMBER$

Using regex it would be possible to extract a dictionary for the config file.

Example generated with ChatGPT:

import re
import json

template = "$DATE$_$PROTEIN$_zoom$ZOOM$_filter_$FILTER$_exp$EXP_NUMBER$"
folder_name = "2025-04-09_Dronpa2_zoom3_filter_405_exp3"


def template_to_regex(template: str) -> str:
    # Split into literal text and $VAR$ tokens
    parts = re.split(r"(\$[A-Z_]+\$)", template)
    regex_parts = []

    for part in parts:
        if part.startswith("$") and part.endswith("$"):
            name = part[1:-1]  # strip $...$
            # Capture anything (lazy) – no type assumptions
            regex_parts.append(rf"(?P<{name}>.+?)")
        else:
            # Escape literal text
            regex_parts.append(re.escape(part))

    return "^" + "".join(regex_parts) + "$"


# ---------------------------------------------------------
# Parse folder name
# ---------------------------------------------------------
regex_pattern = template_to_regex(template)
print("Generated regex:", regex_pattern)

match = re.match(regex_pattern, folder_name)
if not match:
    raise ValueError("Folder name does not match template")

config = match.groupdict()

print("Extracted config:", config)

# ---------------------------------------------------------
# Write JSON config file
# ---------------------------------------------------------
with open("config.json", "w") as f:
    json.dump(config, f, indent=2)

print("Config written to config.json")

Output:

Generated regex: 
^(?P<DATE>.+?)_(?P<PROTEIN>.+?)_zoom(?P<ZOOM>.+?)_filter_(?P<FILTER>.+?)_exp(?P<EXP_NUMBER>.+?)$

Extracted config: 
{'DATE': '2025-04-09', 
'PROTEIN': 'Dronpa2', 
'ZOOM': '3', 
'FILTER': '405', 
'EXP_NUMBER': '3'}
Config written to config.json

Of course it would require rigor from the user.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions