Add support for PEP 822 dedented strings (d-strings) #896

evhub · 2026-02-01T05:16:49Z

Summary

This PR implements support for PEP 822 dedented strings (d-strings) in Coconut, adding four new string prefix variants: d, db, df, and dt.

Key Changes

Grammar updates (grammar.py):
- Added dedent_d literal and four new string forward declarations: d_string, db_string, df_string, dt_string
- Defined token patterns for all d-string variants using any_len_perm to handle prefix combinations with optional r (raw)
- Updated string atom rules to include d-string variants in appropriate positions
- Updated f-string regex pattern to recognize d-prefixed f/t strings
Compiler implementation (compiler.py):
- Added _d_string_dedent() static method implementing PEP 822 dedentation logic:
  - Removes leading newline after opening quotes
  - Calculates common indentation across non-blank lines
  - Strips common indentation while preserving relative indentation
  - Validates consistent indentation
- Implemented four handler methods:
  - d_string_handle(): Basic dedented strings with optional raw prefix
  - db_string_handle(): Dedented byte strings with optional raw prefix
  - df_string_handle() / dt_string_handle(): Dedented f-strings/t-strings
  - _d_f_string_handle(): Shared logic for f/t variants that handles dedentation with expression placeholders
- Bound all handlers in the bind() method
Tests (primary_2.coco):
- Added comprehensive test cases covering:
  - Basic d-string dedentation
  - Trailing newline handling
  - Blank lines preservation
  - Relative indentation preservation
  - Raw d-strings (dr)
  - Byte d-strings (db)
  - Formatted d-strings (df)

Implementation Details

The dedentation algorithm treats f-string expressions as non-whitespace placeholders during indentation calculation, ensuring expressions don't affect indent detection. All d-string variants require triple-quoted strings and must have content starting with a newline after the opening quotes, as per PEP 822.

https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj

Implements the `d` string prefix that automatically removes common indentation from triple-quoted strings at compile time. Supports all prefix combinations and orderings: d, dr/rd, db/bd, df/fd, dt/td, and three-prefix variants like dfr, rdb, etc. Closes #892 https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj

Add tests for all orderings of d/r, d/b, d/b/r, d/f, d/f/r, d/t, and d/t/r prefix combinations. Add dt-string tests to specific.coco alongside existing t-string tests (requires py310+). https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj

evhub

@claude try to fix these review comments

evhub · 2026-02-01T05:28:12Z

coconut/compiler/grammar.py

+        d_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d,)) + string_item)
+        db_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d, bit_b)) + string_item)
+        df_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d, format_f)) + string_item)
+        dt_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d, template_t)) + string_item)


Call these _ref instead of _tokens.

evhub · 2026-02-01T05:28:29Z

coconut/compiler/grammar.py

+        fixed_len_string_tokens = OneOrMore(nonbf_string) | OneOrMore(b_string | db_string)
        f_string_atom = Forward()
-        f_string_atom_ref = ZeroOrMore(nonbf_string) + f_string + ZeroOrMore(nonb_string)
+        f_string_atom_ref = ZeroOrMore(nonbf_string) + (f_string | df_string | dt_string) + ZeroOrMore(nonb_string)


Since t_string isn't there, dt_string shouldn't be either.

evhub · 2026-02-01T05:29:54Z

coconut/compiler/compiler.py

+        text, strchar = self.get_ref("str", string[1:-1])
+
+        # must be triple-quoted
+        if len(strchar) == 1:


Do len(strchar) != 3 to be more precise, here and in the other locations.

evhub · 2026-02-01T05:34:37Z

coconut/compiler/compiler.py

+
+        return ("r" if raw else "") + self.wrap_str(text, strchar[0], multiline=True)
+
+    def db_string_handle(self, original, loc, tokens):


d_string_handle and db_string_handle are similar enough that they should just be one function.

evhub · 2026-02-01T05:34:49Z

coconut/compiler/compiler.py

+            string = string[1:]
+        if string.startswith("r"):
+            raw = True
+            string = string[1:]


This doesn't look like it handles all orderings.

evhub · 2026-02-01T05:36:11Z

coconut/compiler/compiler.py

+
+    def df_string_handle(self, original, loc, tokens):
+        """Process d-string with f prefix."""
+        return self._d_f_string_handle(original, loc, tokens, is_t=False)


Instead of wrapping these functions like this, just use one d_f_string_handle function that accepts an is_t parameter that is passed to cls.method.

evhub · 2026-02-01T05:37:07Z

coconut/compiler/compiler.py

+            elif indent.startswith(line_no_ph) and line_no_ph.strip() == "":
+                result_lines.append("")
+            else:
+                raise CoconutDeferredSyntaxError("inconsistent indentation in d-string", loc)


Find a way to use _d_string_dedent here to avoid repeating yourself.

- Rename d-string grammar rules from _tokens to _ref - Remove dt_string from f_string_atom_ref (matching t_string behavior) - Use len(strchar) != 3 instead of len(strchar) == 1 for precision - Merge d_string_handle and db_string_handle into one function with is_b param - Use _strip_raw_and_b helper to handle all prefix orderings robustly - Use cls.method with is_t param for dt_string instead of wrapper functions - Reuse _d_string_dedent with placeholder param to avoid code duplication in d_f_string_handle https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj

evhub · 2026-02-01T05:52:05Z

coconut/compiler/compiler.py

+        """Process PEP 822 d-strings (dedented strings), with optional b prefix."""
+        string, = tokens
+
+        raw, has_b, string = self._strip_raw_and_b(string)


If we can just use has_b, why do we need is_b, or for that matter to have a separate db_string grammar element at all? What do we do for normal b strings?

evhub · 2026-02-01T05:56:15Z

coconut/compiler/compiler.py

+            raise CoconutDeferredSyntaxError("d-string prefix requires triple-quoted string", loc)
+
+        # apply dedentation to the f-string parts using placeholder for expressions
+        placeholder = "\x00"


Are we sure null bytes can't appear in the string normally? At least we need to assert that they don't.

Also, if there's a placeholder we know can't appear in the string and that naturally doesn't count as whitespace, that would make the implementation easier because we wouldn't need the placeholder logic in _d_string_dedent.

evhub · 2026-02-01T05:57:02Z

coconut/compiler/compiler.py

+        # blank lines are ignored except the last line (closing quotes line)
+        indent = None
+        for i, line in enumerate(lines):
+            is_last = (i == len(lines) - 1)


Unnecessary parens (and below).

evhub · 2026-02-01T05:58:53Z

coconut/compiler/compiler.py

+        """Strip r and b prefixes from a string token, returning (raw, has_b, string)."""
+        raw = False
+        has_b = False
+        while string and string[0] in "rRbB":


Make this just while string: and then check for bB in an elif, and break in the else.

- Remove is_b parameter; rely on has_b from _strip_raw_and_b since bit_b is not suppressed in grammar - Use strwrapper as placeholder instead of null byte (can't appear in string contents since str_proc uses it as delimiter), with assertion - Refactor _strip_raw_and_b to use while/elif/break pattern - Remove unnecessary parentheses in prefix construction https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj

strwrapper can appear in string contents; null bytes cannot appear in Python source code, making them a safe placeholder. Keep the assertion as a safety check. https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj

- Remove parens around `i == len(lines) - 1` assignments - Revert prefix construction back to inline ternary style https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj

evhub · 2026-02-01T06:11:51Z

Resolves #892.

claude added 2 commits February 1, 2026 04:53

evhub commented Feb 1, 2026

View reviewed changes

claude added 3 commits February 1, 2026 06:05

Use null byte as placeholder instead of strwrapper

2e17970

strwrapper can appear in string contents; null bytes cannot appear in Python source code, making them a safe placeholder. Keep the assertion as a safety check. https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj

Remove unnecessary parens and revert prefix to inline style

9f8f4e3

- Remove parens around `i == len(lines) - 1` assignments - Revert prefix construction back to inline ternary style https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj

evhub added the feature label Feb 1, 2026

evhub added this to the v3.2.1 milestone Feb 1, 2026

evhub merged commit 86f6019 into develop Feb 1, 2026
14 checks passed

evhub added the resolved label Feb 1, 2026


		return ("r" if raw else "") + self.wrap_str(text, strchar[0], multiline=True)

		def db_string_handle(self, original, loc, tokens):

Uh oh!

Add support for PEP 822 dedented strings (d-strings) #896

Add support for PEP 822 dedented strings (d-strings) #896

Conversation

evhub commented Feb 1, 2026

Summary

Key Changes

Implementation Details

Uh oh!

evhub left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evhub commented Feb 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants