-
-
Notifications
You must be signed in to change notification settings - Fork 134
Add support for PEP 822 dedented strings (d-strings) #896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implements the `d` string prefix that automatically removes common indentation from triple-quoted strings at compile time. Supports all prefix combinations and orderings: d, dr/rd, db/bd, df/fd, dt/td, and three-prefix variants like dfr, rdb, etc. Closes #892 https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
Add tests for all orderings of d/r, d/b, d/b/r, d/f, d/f/r, d/t, and d/t/r prefix combinations. Add dt-string tests to specific.coco alongside existing t-string tests (requires py310+). https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
evhub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@claude try to fix these review comments
coconut/compiler/grammar.py
Outdated
| d_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d,)) + string_item) | ||
| db_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d, bit_b)) + string_item) | ||
| df_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d, format_f)) + string_item) | ||
| dt_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d, template_t)) + string_item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call these _ref instead of _tokens.
coconut/compiler/grammar.py
Outdated
| fixed_len_string_tokens = OneOrMore(nonbf_string) | OneOrMore(b_string | db_string) | ||
| f_string_atom = Forward() | ||
| f_string_atom_ref = ZeroOrMore(nonbf_string) + f_string + ZeroOrMore(nonb_string) | ||
| f_string_atom_ref = ZeroOrMore(nonbf_string) + (f_string | df_string | dt_string) + ZeroOrMore(nonb_string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since t_string isn't there, dt_string shouldn't be either.
coconut/compiler/compiler.py
Outdated
| text, strchar = self.get_ref("str", string[1:-1]) | ||
|
|
||
| # must be triple-quoted | ||
| if len(strchar) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do len(strchar) != 3 to be more precise, here and in the other locations.
coconut/compiler/compiler.py
Outdated
|
|
||
| return ("r" if raw else "") + self.wrap_str(text, strchar[0], multiline=True) | ||
|
|
||
| def db_string_handle(self, original, loc, tokens): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d_string_handle and db_string_handle are similar enough that they should just be one function.
coconut/compiler/compiler.py
Outdated
| string = string[1:] | ||
| if string.startswith("r"): | ||
| raw = True | ||
| string = string[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look like it handles all orderings.
coconut/compiler/compiler.py
Outdated
|
|
||
| def df_string_handle(self, original, loc, tokens): | ||
| """Process d-string with f prefix.""" | ||
| return self._d_f_string_handle(original, loc, tokens, is_t=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of wrapping these functions like this, just use one d_f_string_handle function that accepts an is_t parameter that is passed to cls.method.
coconut/compiler/compiler.py
Outdated
| elif indent.startswith(line_no_ph) and line_no_ph.strip() == "": | ||
| result_lines.append("") | ||
| else: | ||
| raise CoconutDeferredSyntaxError("inconsistent indentation in d-string", loc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Find a way to use _d_string_dedent here to avoid repeating yourself.
- Rename d-string grammar rules from _tokens to _ref - Remove dt_string from f_string_atom_ref (matching t_string behavior) - Use len(strchar) != 3 instead of len(strchar) == 1 for precision - Merge d_string_handle and db_string_handle into one function with is_b param - Use _strip_raw_and_b helper to handle all prefix orderings robustly - Use cls.method with is_t param for dt_string instead of wrapper functions - Reuse _d_string_dedent with placeholder param to avoid code duplication in d_f_string_handle https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
| """Process PEP 822 d-strings (dedented strings), with optional b prefix.""" | ||
| string, = tokens | ||
|
|
||
| raw, has_b, string = self._strip_raw_and_b(string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can just use has_b, why do we need is_b, or for that matter to have a separate db_string grammar element at all? What do we do for normal b strings?
| raise CoconutDeferredSyntaxError("d-string prefix requires triple-quoted string", loc) | ||
|
|
||
| # apply dedentation to the f-string parts using placeholder for expressions | ||
| placeholder = "\x00" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure null bytes can't appear in the string normally? At least we need to assert that they don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if there's a placeholder we know can't appear in the string and that naturally doesn't count as whitespace, that would make the implementation easier because we wouldn't need the placeholder logic in _d_string_dedent.
coconut/compiler/compiler.py
Outdated
| # blank lines are ignored except the last line (closing quotes line) | ||
| indent = None | ||
| for i, line in enumerate(lines): | ||
| is_last = (i == len(lines) - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary parens (and below).
coconut/compiler/compiler.py
Outdated
| """Strip r and b prefixes from a string token, returning (raw, has_b, string).""" | ||
| raw = False | ||
| has_b = False | ||
| while string and string[0] in "rRbB": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this just while string: and then check for bB in an elif, and break in the else.
- Remove is_b parameter; rely on has_b from _strip_raw_and_b since bit_b is not suppressed in grammar - Use strwrapper as placeholder instead of null byte (can't appear in string contents since str_proc uses it as delimiter), with assertion - Refactor _strip_raw_and_b to use while/elif/break pattern - Remove unnecessary parentheses in prefix construction https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
strwrapper can appear in string contents; null bytes cannot appear in Python source code, making them a safe placeholder. Keep the assertion as a safety check. https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
- Remove parens around `i == len(lines) - 1` assignments - Revert prefix construction back to inline ternary style https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
|
Resolves #892. |
Summary
This PR implements support for PEP 822 dedented strings (d-strings) in Coconut, adding four new string prefix variants:
d,db,df, anddt.Key Changes
Grammar updates (
grammar.py):dedent_dliteral and four new string forward declarations:d_string,db_string,df_string,dt_stringany_len_permto handle prefix combinations with optionalr(raw)Compiler implementation (
compiler.py):_d_string_dedent()static method implementing PEP 822 dedentation logic:d_string_handle(): Basic dedented strings with optional raw prefixdb_string_handle(): Dedented byte strings with optional raw prefixdf_string_handle()/dt_string_handle(): Dedented f-strings/t-strings_d_f_string_handle(): Shared logic for f/t variants that handles dedentation with expression placeholdersbind()methodTests (
primary_2.coco):dr)db)df)Implementation Details
The dedentation algorithm treats f-string expressions as non-whitespace placeholders during indentation calculation, ensuring expressions don't affect indent detection. All d-string variants require triple-quoted strings and must have content starting with a newline after the opening quotes, as per PEP 822.
https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj