Skip to content

Conversation

@evhub
Copy link
Owner

@evhub evhub commented Feb 1, 2026

Summary

This PR implements support for PEP 822 dedented strings (d-strings) in Coconut, adding four new string prefix variants: d, db, df, and dt.

Key Changes

  • Grammar updates (grammar.py):

    • Added dedent_d literal and four new string forward declarations: d_string, db_string, df_string, dt_string
    • Defined token patterns for all d-string variants using any_len_perm to handle prefix combinations with optional r (raw)
    • Updated string atom rules to include d-string variants in appropriate positions
    • Updated f-string regex pattern to recognize d-prefixed f/t strings
  • Compiler implementation (compiler.py):

    • Added _d_string_dedent() static method implementing PEP 822 dedentation logic:
      • Removes leading newline after opening quotes
      • Calculates common indentation across non-blank lines
      • Strips common indentation while preserving relative indentation
      • Validates consistent indentation
    • Implemented four handler methods:
      • d_string_handle(): Basic dedented strings with optional raw prefix
      • db_string_handle(): Dedented byte strings with optional raw prefix
      • df_string_handle() / dt_string_handle(): Dedented f-strings/t-strings
      • _d_f_string_handle(): Shared logic for f/t variants that handles dedentation with expression placeholders
    • Bound all handlers in the bind() method
  • Tests (primary_2.coco):

    • Added comprehensive test cases covering:
      • Basic d-string dedentation
      • Trailing newline handling
      • Blank lines preservation
      • Relative indentation preservation
      • Raw d-strings (dr)
      • Byte d-strings (db)
      • Formatted d-strings (df)

Implementation Details

The dedentation algorithm treats f-string expressions as non-whitespace placeholders during indentation calculation, ensuring expressions don't affect indent detection. All d-string variants require triple-quoted strings and must have content starting with a newline after the opening quotes, as per PEP 822.

https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj

Implements the `d` string prefix that automatically removes common
indentation from triple-quoted strings at compile time. Supports all
prefix combinations and orderings: d, dr/rd, db/bd, df/fd, dt/td,
and three-prefix variants like dfr, rdb, etc.

Closes #892

https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
Add tests for all orderings of d/r, d/b, d/b/r, d/f, d/f/r, d/t,
and d/t/r prefix combinations. Add dt-string tests to specific.coco
alongside existing t-string tests (requires py310+).

https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
Copy link
Owner Author

@evhub evhub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude try to fix these review comments

d_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d,)) + string_item)
db_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d, bit_b)) + string_item)
df_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d, format_f)) + string_item)
dt_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d, template_t)) + string_item)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call these _ref instead of _tokens.

fixed_len_string_tokens = OneOrMore(nonbf_string) | OneOrMore(b_string | db_string)
f_string_atom = Forward()
f_string_atom_ref = ZeroOrMore(nonbf_string) + f_string + ZeroOrMore(nonb_string)
f_string_atom_ref = ZeroOrMore(nonbf_string) + (f_string | df_string | dt_string) + ZeroOrMore(nonb_string)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since t_string isn't there, dt_string shouldn't be either.

text, strchar = self.get_ref("str", string[1:-1])

# must be triple-quoted
if len(strchar) == 1:
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do len(strchar) != 3 to be more precise, here and in the other locations.


return ("r" if raw else "") + self.wrap_str(text, strchar[0], multiline=True)

def db_string_handle(self, original, loc, tokens):
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d_string_handle and db_string_handle are similar enough that they should just be one function.

string = string[1:]
if string.startswith("r"):
raw = True
string = string[1:]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like it handles all orderings.


def df_string_handle(self, original, loc, tokens):
"""Process d-string with f prefix."""
return self._d_f_string_handle(original, loc, tokens, is_t=False)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of wrapping these functions like this, just use one d_f_string_handle function that accepts an is_t parameter that is passed to cls.method.

elif indent.startswith(line_no_ph) and line_no_ph.strip() == "":
result_lines.append("")
else:
raise CoconutDeferredSyntaxError("inconsistent indentation in d-string", loc)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find a way to use _d_string_dedent here to avoid repeating yourself.

- Rename d-string grammar rules from _tokens to _ref
- Remove dt_string from f_string_atom_ref (matching t_string behavior)
- Use len(strchar) != 3 instead of len(strchar) == 1 for precision
- Merge d_string_handle and db_string_handle into one function with is_b param
- Use _strip_raw_and_b helper to handle all prefix orderings robustly
- Use cls.method with is_t param for dt_string instead of wrapper functions
- Reuse _d_string_dedent with placeholder param to avoid code duplication
  in d_f_string_handle

https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
"""Process PEP 822 d-strings (dedented strings), with optional b prefix."""
string, = tokens

raw, has_b, string = self._strip_raw_and_b(string)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can just use has_b, why do we need is_b, or for that matter to have a separate db_string grammar element at all? What do we do for normal b strings?

raise CoconutDeferredSyntaxError("d-string prefix requires triple-quoted string", loc)

# apply dedentation to the f-string parts using placeholder for expressions
placeholder = "\x00"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure null bytes can't appear in the string normally? At least we need to assert that they don't.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if there's a placeholder we know can't appear in the string and that naturally doesn't count as whitespace, that would make the implementation easier because we wouldn't need the placeholder logic in _d_string_dedent.

# blank lines are ignored except the last line (closing quotes line)
indent = None
for i, line in enumerate(lines):
is_last = (i == len(lines) - 1)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary parens (and below).

"""Strip r and b prefixes from a string token, returning (raw, has_b, string)."""
raw = False
has_b = False
while string and string[0] in "rRbB":
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this just while string: and then check for bB in an elif, and break in the else.

- Remove is_b parameter; rely on has_b from _strip_raw_and_b since
  bit_b is not suppressed in grammar
- Use strwrapper as placeholder instead of null byte (can't appear in
  string contents since str_proc uses it as delimiter), with assertion
- Refactor _strip_raw_and_b to use while/elif/break pattern
- Remove unnecessary parentheses in prefix construction

https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
strwrapper can appear in string contents; null bytes cannot appear
in Python source code, making them a safe placeholder. Keep the
assertion as a safety check.

https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
- Remove parens around `i == len(lines) - 1` assignments
- Revert prefix construction back to inline ternary style

https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
@evhub evhub added the feature label Feb 1, 2026
@evhub evhub added this to the v3.2.1 milestone Feb 1, 2026
@evhub
Copy link
Owner Author

evhub commented Feb 1, 2026

Resolves #892.

@evhub evhub merged commit 86f6019 into develop Feb 1, 2026
14 checks passed
@evhub evhub added the resolved label Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants