PEP 822: d-string draft implementation #108

methane · 2026-01-05T11:27:38Z

Discourse thread: https://discuss.python.org/t/pep-822-dedented-multiline-string-d-string/105519

PEP PR (not merged yet): python/peps#4768

Summary by CodeRabbit

New Features
- Adds a new "d" dedent string prefix for triple-quoted multiline strings that strip common leading indentation and integrate with f-strings.
Bug Fixes
- Enforces triple-quoted form for dedent strings, disallows incompatible prefix combinations, and provides clearer syntax errors for single-quoted or empty forms.
- Ensures prefix parsing validates combinations involving d and exercises multiline prefix checks.
Tests
- Adds tests covering dedent-string behavior, indentation/dedent rules, and related error messages.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai

Actionable comments posted: 5

Fix all issues with AI Agents 🤖

In @Parser/action_helpers.c:
- Around line 1295-1392: unicodewriter_write_line currently writes trailing
backslash+newline sequences verbatim for non-raw d-strings, causing newline
suppression to be ignored; modify unicodewriter_write_line to, when is_raw is
false, detect a final "\\\n" at the end of the provided segment and drop the
backslash and newline before writing (so non-raw d-strings and the dedent logic
in _PyPegen_dedent_string_part preserve line-continuation behavior), while
leaving the raw path unchanged (the early is_raw short-circuit should still
bypass this trimming).

In @Parser/string_parser.c:
- Line 284: Update the comment that currently reads "checks the prefix is
consistant." to fix the typo by changing "consistant" to "consistent" so the
comment reads "checks the prefix is consistent." This change is in the comment
string present in Parser/string_parser.c.
- Around line 337-344: The code passes raw=1 to _PyPegen_decode_string in the
non-raw branch so escapes are not processed and also fails to discard the
PyUnicodeWriter on the error return; change the call to
_PyPegen_decode_string(p, 0, line_start, line_end - line_start, token) (or use
the existing is_raw false variable) so escape sequences are handled, and on the
error path before returning NULL call PyUnicodeWriter_Discard(w) to free writer
state (ensure you still Py_XDECREF(line) as currently done).
- Around line 280-292: The code reads endline[0] without ensuring indent_len>0
which can read past the buffer for cases like d"""\nfoo\n"""; add an early check
for indent_len == 0 before computing indent_char/endline loop and raise a syntax
error using RAISE_ERROR_KNOWN_LOCATION (use PyExc_SyntaxError or equivalent)
with the same token position info (token->end_lineno, token->end_col_offset
offsets) and a message like "d-strings require trailing indentation to determine
dedent level" so the function returns NULL instead of reading out-of-bounds.

🧹 Nitpick comments (3)

Parser/action_helpers.c (1)
1394-1572: Dedent-aware f-/t-string plumbing is structurally sound, but is_raw is overloaded

The integration between _get_resized_exprs() and _PyPegen_decode_fstring_part() (handling is_first, indent_char, and dedent_count) looks correct: enforcing

“d-string must start with a newline” via the first Constant,

“d-string must end with an indent line” and computing a uniform indent_char/indent_count from the last Constant,

and reusing that dedent context across all constant pieces of the f-/t-string.

One readability nit is that is_raw now encodes both the actual r-prefix and the “no backslashes in this piece” optimization:
is_raw = is_raw || strchr(bstr, '\\') == NULL;
This is fine functionally, but separating these concerns into has_raw_prefix vs. no_backslashes (and deriving an internal decode_raw flag from them) would make the control flow clearer, especially as more d-string edge cases accumulate.
Lib/test/test_dstring.py (1)
4-13: Avoid shadowing built-in str in the helper

In assertAllRaise, the loop variable is named str, which shadows the built-in type. It’s harmless here but slightly obscures tracebacks and REPL debugging; renaming it to something like s or expr would be clearer.
Small naming tweak
-    def assertAllRaise(self, exception_type, regex, error_strings):
-        for str in error_strings:
-            with self.subTest(str=str):
-                with self.assertRaisesRegex(exception_type, regex) as cm:
-                    eval(str)
+    def assertAllRaise(self, exception_type, regex, error_strings):
+        for expr in error_strings:
+            with self.subTest(expr=expr):
+                with self.assertRaisesRegex(exception_type, regex) as cm:
+                    eval(expr)
Parser/string_parser.c (1)

381-384: Minor style inconsistency.

Line 382 is missing a space after = (compare with line 371: quote =(unsigned char)*++s; vs quote = (unsigned char)*++s;). This matches the existing inconsistency on line 371, so it may be intentional to match, but worth noting.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 61fc72a and 5ce8afc.

📒 Files selected for processing (6)

Lib/test/test_dstring.py
Lib/test/test_tokenize.py
Lib/tokenize.py
Parser/action_helpers.c
Parser/lexer/lexer.c
Parser/string_parser.c

🧰 Additional context used

🧬 Code graph analysis (3)

Parser/action_helpers.c (2)

Parser/string_parser.c (1)

_PyPegen_decode_string (241-248)

Parser/pegen.h (1)

RAISE_ERROR_KNOWN_LOCATION (187-200)

Parser/string_parser.c (3)

Parser/pegen.h (1)

RAISE_ERROR_KNOWN_LOCATION (187-200)

Objects/unicode_writer.c (6)

PyUnicodeWriter_Create (157-185)

PyUnicodeWriter_Discard (188-195)

PyUnicodeWriter_WriteChar (316-326)

PyUnicodeWriter_WriteUTF8 (531-548)

PyUnicodeWriter_WriteStr (360-380)

PyUnicodeWriter_Finish (625-632)

Include/refcount.h (1)

Py_XDECREF (510-515)

Parser/lexer/lexer.c (1)

Parser/tokenizer/helpers.c (1)

_PyTokenizer_syntaxerror (65-74)

🪛 GitHub Actions: Tests

Lib/test/test_dstring.py

[error] 41-41: test_simple_dstring failed. Expected ' hello world' but got ' hello world\\n'.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)

GitHub Check: Sanitizers / TSan
GitHub Check: Sanitizers / UBSan
GitHub Check: Sanitizers / TSan (free-threading)
GitHub Check: Ubuntu (free-threading) / build and test (ubuntu-24.04-arm)
GitHub Check: Ubuntu / build and test (ubuntu-24.04-arm)
GitHub Check: Windows / Build and test (Win32)
GitHub Check: Windows (free-threading) / Build and test (x64)
GitHub Check: Windows / Build and test (x64)
GitHub Check: Windows (free-threading) / Build and test (arm64)
GitHub Check: Windows / Build and test (arm64)
GitHub Check: WASI / build and test
GitHub Check: Hypothesis tests on Ubuntu
GitHub Check: Ubuntu SSL tests with OpenSSL (ubuntu-24.04, 1.1.1w)
GitHub Check: Ubuntu SSL tests with OpenSSL (ubuntu-24.04, 3.3.5)
GitHub Check: Ubuntu SSL tests with OpenSSL (ubuntu-24.04, 3.2.6)
GitHub Check: Ubuntu SSL tests with OpenSSL (ubuntu-24.04, 3.0.18)
GitHub Check: Ubuntu SSL tests with OpenSSL (ubuntu-24.04, 3.5.4)
GitHub Check: Check if generated files are up to date

🔇 Additional comments (9)

Lib/tokenize.py (1)

85-92: d-string prefixes list looks consistent with lexer behavior

The extended _valid_string_prefixes set (adding d and df/dt/dr/dfr/dtr) matches the new dedent-string semantics and should stay in sync with the prefix handling in Parser/lexer/lexer.c. No functional issues spotted here.

Lib/test/test_tokenize.py (1)

3415-3455: Prefix-discovery test correctly updated for multiline-only d-strings

Switching eval to use """\n""" ensures determine_valid_prefixes() only treats d-prefixed forms as valid in the multiline (triple-quoted) case, which matches the d-string requirements and the updated prefix set in tokenize.StringPrefix. The change is sound.

Parser/lexer/lexer.c (2)

456-505: Prefix compatibility matrix correctly extended for d-strings

The updated maybe_raise_syntax_error_for_string_prefixes():

keeps existing constraints (no u combos, no b with f/t, no f+t),

newly forbids u+d and b+d,

which matches the intended valid prefix set (d, df, dt, dr, dfr, dtr plus permutations, but no ud, bd, etc.) and stays aligned with Lib/tokenize._all_string_prefixes. The behavior here looks correct.

747-792: f/t/d-string prefix parsing and “d must be triple-quoted” check look correct

The identifier/prefix scanner now tracks saw_d and passes it into maybe_raise_syntax_error_for_string_prefixes, then:

routes only f/t-containing prefixes into the f_string_quote path (df, dt, dfr, dtr, plus permutations),

leaves d/dr prefixes for plain d-strings to the regular string path.

Inside f_string_quote, the new prefix walk:

sets string_kind to TSTRING when a t/T is present,

sets raw when an r/R is present,

enforces that any d/D in the prefix requires quote_size == 3, otherwise returning a syntax error (“d-string must be a multiline string”).

This neatly enforces the syntactic constraints for dedented f-/t-strings without regressing existing f-string behavior.

Also applies to: 1061-1138

Parser/string_parser.c (5)

250-264: LGTM!

The initial validation correctly checks for empty string before accessing s[0], and the error message provides clear feedback about the newline requirement.

266-278: LGTM!

The backwards search for the last newline is correctly guarded by the assert, and non-whitespace characters after the last newline are properly rejected.

316-323: LGTM!

The empty line handling correctly allows lines with fewer characters than indent_len if they contain only a newline, which permits blank lines within the d-string content.

424-429: LGTM!

Correctly enforces that d-strings must be triple-quoted with a clear error message.

433-454: LGTM!

The assert correctly documents that bytes mode and dedent mode are mutually exclusive, and the routing to _PyPegen_dedent_string is properly placed.

Lib/test/test_dstring.py

Parser/action_helpers.c

coderabbitai · 2026-01-05T11:46:25Z

Parser/string_parser.c

+    // Now, prefix is both the dedent indentation and the end of the d-string body.
+    Py_ssize_t indent_len = s + len - endline;
+    int indent_char = endline[0];  // ' ', '\t', or '\0'.
+
+    // checks the prefix is consistant.
+    for (Py_ssize_t i = 1; i < indent_len; i++) {
+        if (endline[i] != indent_char) {
+            RAISE_ERROR_KNOWN_LOCATION(
+                p, PyExc_TabError, token->end_lineno, token->end_col_offset - 3, token->end_lineno, token->end_col_offset -2,
+                "inconsistent use of tabs and spaces in indentation");
+            return NULL;
+        }
+    }


⚠️ Potential issue | 🟠 Major

Edge case: indent_len == 0 leads to undefined behavior.

If the d-string ends with just a newline and no trailing whitespace (e.g., d"""\nfoo\n"""), then indent_len is 0 and endline points to s + len. Accessing endline[0] on line 282 reads past the string boundary.

Consider adding a check for indent_len == 0 and raising an appropriate syntax error (d-strings require trailing indentation to determine dedent level).

🔎 Proposed fix

// Now, prefix is both the dedent indentation and the end of the d-string body. Py_ssize_t indent_len = s + len - endline; + if (indent_len == 0) { + RAISE_ERROR_KNOWN_LOCATION(p, PyExc_SyntaxError, + token->end_lineno, token->end_col_offset - 3, token->end_lineno, token->end_col_offset - 2, + "d-string must have trailing indentation before closing quotes"); + return NULL; + } int indent_char = endline[0]; // ' ', '\t', or '\0'.

🤖 Prompt for AI Agents

In @Parser/string_parser.c around lines 280-292, The code reads endline[0] without ensuring indent_len>0 which can read past the buffer for cases like d"""\nfoo\n"""; add an early check for indent_len == 0 before computing indent_char/endline loop and raise a syntax error using RAISE_ERROR_KNOWN_LOCATION (use PyExc_SyntaxError or equivalent) with the same token position info (token->end_lineno, token->end_col_offset offsets) and a message like "d-strings require trailing indentation to determine dedent level" so the function returns NULL instead of reading out-of-bounds.

coderabbitai · 2026-01-05T11:46:25Z

Parser/string_parser.c

+    Py_ssize_t indent_len = s + len - endline;
+    int indent_char = endline[0];  // ' ', '\t', or '\0'.
+
+    // checks the prefix is consistant.


⚠️ Potential issue | 🟡 Minor

Typo: "consistant" → "consistent".

- // checks the prefix is consistant. + // checks the prefix is consistent.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// checks the prefix is consistant.

// checks the prefix is consistent.

🤖 Prompt for AI Agents

In @Parser/string_parser.c around line 284, Update the comment that currently reads "checks the prefix is consistant." to fix the typo by changing "consistant" to "consistent" so the comment reads "checks the prefix is consistent." This change is in the comment string present in Parser/string_parser.c.

Parser/string_parser.c

coderabbitai · 2026-01-20T16:41:06Z

📝 Walkthrough

Walkthrough

Adds support for a new dedented d-string literal: lexer recognizes d prefixes, parser and action helpers implement dedentation and validation, a unicode helper for computing common leading whitespace was renamed/exported, tokenize tests adjusted, and a new unittest validates d-string behavior.

Changes

Cohort / File(s)	Summary
Tests `Lib/test/test_dstring.py`, `Lib/test/test_tokenize.py`	New `Lib/test/test_dstring.py` with `DStringTestCase` and helpers testing d-string errors, empty/start/newline rules, and dedentation. Updated `Lib/test/test_tokenize.py` eval checks to use multiline string literals (`eval(f'{letter}\n')`, `eval(f'{p}\n')`).
Tokenizer prefixes `Lib/tokenize.py`	Extended `_all_string_prefixes` with `d` and combined prefixes (`bd`,`rd`,`fd`,`td`,`brd`,`frd`,`trd`).
Lexer `Parser/lexer/lexer.c`	Track `saw_d`; extend `maybe_raise_syntax_error_for_string_prefixes` to accept `saw_d`; validate incompatible combos with `d`; require triple-quoted d-strings; integrate `d` into f/t-string detection and token initialization.
String parsing / dedent `Parser/string_parser.c`, `Parser/action_helpers.c`	Added `_PyPegen_dedent_string` and dedent helpers; refactored `_PyPegen_decode_fstring_part` to accept `is_first`, `indent`, and `indent_len`; compute common leading whitespace, validate per-line indentation, build dedented bytes, and use dedented content in parsing/decoding flows.
Unicode helper rename / export `Objects/unicodeobject.c`, `Parser/...`	Renamed `search_longest_common_leading_whitespace` → `_Py_search_longest_common_leading_whitespace`, exported via extern, and updated call sites to use the new symbol.
API / signatures `Parser/action_helpers.c`, `Parser/lexer/lexer.c`	Updated `_PyPegen_decode_fstring_part` signature to include dedent-related params; updated `maybe_raise_syntax_error_for_string_prefixes` to accept `saw_d`.
Memory / error handling `Parser/...`, `Objects/...`	Added temporary buffers and arena tracking for dedent computation; ensured proper Py_XDECREF of dedented bytes; introduced d-string-specific SyntaxError messages (e.g., "d-string must be triple-quoted", "d-string must start with a newline") and per-line indentation validation.

Sequence Diagram(s)

sequenceDiagram
    participant L as Lexer (lexer.c)
    participant P as Parser (string_parser.c)
    participant A as ActionHelpers (action_helpers.c)
    participant U as UnicodeHelper (unicodeobject.c)

    L->>P: Emit string token with saw_d flag
    P->>P: Validate triple-quote & dedent preconditions
    alt invalid d-string
        P->>P: Raise SyntaxError ("d-string must be triple-quoted" / "must start with a newline" / indentation error)
    else valid d-string
        P->>U: Call _Py_search_longest_common_leading_whitespace(src, end)
        U-->>P: Return common indent start/length
        P->>P: Build dedented bytes (_PyPegen_dedent_string)
        P->>A: Decode parts via _PyPegen_decode_fstring_part(is_first, indent, indent_len, ...)
        A->>U: Use _Py_search_longest_common_leading_whitespace for part lines as needed
        A-->>P: Return decoded/concatenated parts
        P-->>L: Produce final string token / AST node
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I found a tiny d in the quotes,
Nibbled margins, lined up all the notes,
Triple-quote gates kept the messy at bay,
Lexer hummed, parser danced, dedent led the way,
Rabbit hops on—clean strings for today.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 31.58% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and specifically summarizes the main change: implementation of PEP 822's d-string feature across multiple files and subsystems.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

Parser/lexer/lexer.c (1)
483-495: Missing incompatibility check between 'b' and 'd' prefixes.

The code adds incompatibility checks for u and d, but doesn't check for b and d combinations. The d prefix is for dedented strings which should produce str, not bytes. A bd or db prefix combination should be invalid.
🐛 Proposed fix
     if (saw_u && saw_d) {
         RETURN_SYNTAX_ERROR("u", "d");
     }
+    if (saw_b && saw_d) {
+        RETURN_SYNTAX_ERROR("b", "d");
+    }

     if (saw_b && saw_f) {
         RETURN_SYNTAX_ERROR("b", "f");

🤖 Fix all issues with AI agents

In `@Parser/action_helpers.c`:
- Around line 1475-1517: After creating temp_bytes via PyBytesWriter_Finish,
check the return value of _PyArena_AddPyObject(p->arena, temp_bytes); if it
fails (non-zero) then DECREF/clean up temp_bytes (e.g., Py_DECREF(temp_bytes) or
PyBytesWriter_Discard equivalent) and return NULL instead of proceeding to use
temp_bytes; ensure you only call PyBytes_AsString(temp_bytes) after a successful
_PyArena_AddPyObject to avoid leaking temp_bytes on arena insertion failure and
to propagate the existing exception.

♻️ Duplicate comments (2)

Parser/action_helpers.c (1)
1295-1311: Process escapes in non‑raw dedented lines.
This branch runs when is_raw is false, but raw=1 skips escape decoding (e.g., \n stays literal).
Proposed fix
-        PyObject *line = _PyPegen_decode_string(p, 1, line_start, line_end - line_start, token);
+        PyObject *line = _PyPegen_decode_string(p, 0, line_start, line_end - line_start, token);
Lib/test/test_dstring.py (1)

29-34: Tests correctly verify d-string dedentation semantics.

The test expectations align with d-string specification: dedentation is based on the minimum common leading whitespace. Lines 33-34 correctly distinguish between non-raw (backslash-newline as continuation) and raw (backslash-newline preserved) behavior. The failing test at line 33 reflects the implementation bug noted in previous review.

🧹 Nitpick comments (1)

Lib/test/test_dstring.py (1)

5-12: Avoid shadowing built-in str and remove debug code.

The parameter name str shadows Python's built-in str type, which can cause confusion and potential issues. Also, the commented debug print statements should be removed before merging.

♻️ Proposed fix

-    def assertAllRaise(self, exception_type, regex, error_strings):
-        for str in error_strings:
-            with self.subTest(str=str):
+    def assertAllRaise(self, exception_type, regex, error_strings):
+        for expr in error_strings:
+            with self.subTest(expr=expr):
                 with self.assertRaisesRegex(exception_type, regex) as cm:
-                    eval(str)
-                # print("Testing expression:", repr(str))
-                # print(repr(cm.exception))
-                # print(repr(cm.exception.text))
+                    eval(expr)

Parser/action_helpers.c

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@Parser/string_parser.c`:
- Line 392: Rename the local variable _prev_call_invald to _prev_call_invalid in
Parser/string_parser.c and update all its usages (the assignments/reads
currently referencing _prev_call_invald around the earlier usages) so they
reference _prev_call_invalid instead (also update the declaration "int
_prev_call_invald = p->call_invalid_rules;" and every subsequent use at the
three places noted).
- Around line 269-290: The inner loop in Parser/string_parser.c that compares
s[i] to indent (inside the for (; s < end; lineno++) block) can read past the
buffer when the remaining bytes are shorter than indent_len; modify the loop in
the function to check bounds (ensure s + i < end) before accessing s[i] and
break/handle as a short-line case, and also guard the subsequent check that uses
s[i] after the loop (the `if (s[i] == '\n')` branch) so it only runs when i <
remaining_length; on short lines call the same error/empty-line handling path
(using PyBytesWriter_Discard, RAISE_ERROR_KNOWN_LOCATION, or writing the newline
via PyBytesWriter_WriteBytes) as appropriate.

coderabbitai · 2026-01-23T14:08:17Z

Parser/string_parser.c

+    for (; s < end; lineno++) {
+        Py_ssize_t i;
+        for (i = 0; i < indent_len; i++) {
+            if (s[i] != indent[i]) {
+                if (s[i] == '\n') {
+                    break; // empty line
+                }
+                PyBytesWriter_Discard(w);
+                RAISE_ERROR_KNOWN_LOCATION(p, PyExc_IndentationError, lineno, i, lineno, i+1,
+                    "d-string missing valid indentation");
+                return NULL;
+            }
+        }
+
+        if (s[i] == '\n') {  // found an empty line with newline.
+            if (PyBytesWriter_WriteBytes(w, "\n", 1) < 0) {
+                PyBytesWriter_Discard(w);
+                return NULL;
+            }
+            s += i+1;
+            continue;
+        }


⚠️ Potential issue | 🟠 Major

Potential buffer overread when processing lines shorter than indent_len.

The inner loop accesses s[i] without verifying s + i < end. If the remaining content is shorter than indent_len (e.g., an incomplete last line or edge cases), this reads past the buffer. Similarly, line 283 accesses s[i] where i == indent_len after the loop completes, which could also exceed bounds.

🔎 Proposed fix: Add bounds check in inner loop

for (; s < end; lineno++) { Py_ssize_t i; - for (i = 0; i < indent_len; i++) { + for (i = 0; i < indent_len && s + i < end; i++) { if (s[i] != indent[i]) { if (s[i] == '\n') { break; // empty line } PyBytesWriter_Discard(w); RAISE_ERROR_KNOWN_LOCATION(p, PyExc_IndentationError, lineno, i, lineno, i+1, "d-string missing valid indentation"); return NULL; } } - if (s[i] == '\n') { // found an empty line with newline. + if (s + i >= end) { + break; // reached end of content + } + else if (s[i] == '\n') { // found an empty line with newline. if (PyBytesWriter_WriteBytes(w, "\n", 1) < 0) { PyBytesWriter_Discard(w); return NULL; } s += i+1; continue; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for (; s < end; lineno++) {

Py_ssize_t i;

for (i = 0; i < indent_len; i++) {

if (s[i] != indent[i]) {

if (s[i] == '\n') {

break; // empty line

}

PyBytesWriter_Discard(w);

RAISE_ERROR_KNOWN_LOCATION(p, PyExc_IndentationError, lineno, i, lineno, i+1,

"d-string missing valid indentation");

return NULL;

}

}

if (s[i] == '\n') { // found an empty line with newline.

if (PyBytesWriter_WriteBytes(w, "\n", 1) < 0) {

PyBytesWriter_Discard(w);

return NULL;

}

s += i+1;

continue;

}

for (; s < end; lineno++) {

Py_ssize_t i;

for (i = 0; i < indent_len && s + i < end; i++) {

if (s[i] != indent[i]) {

if (s[i] == '\n') {

break; // empty line

}

PyBytesWriter_Discard(w);

RAISE_ERROR_KNOWN_LOCATION(p, PyExc_IndentationError, lineno, i, lineno, i+1,

"d-string missing valid indentation");

return NULL;

}

}

if (s + i >= end) {

break; // reached end of content

}

else if (s[i] == '\n') { // found an empty line with newline.

if (PyBytesWriter_WriteBytes(w, "\n", 1) < 0) {

PyBytesWriter_Discard(w);

return NULL;

}

s += i+1;

continue;

}

🤖 Prompt for AI Agents

In `@Parser/string_parser.c` around lines 269 - 290, The inner loop in Parser/string_parser.c that compares s[i] to indent (inside the for (; s < end; lineno++) block) can read past the buffer when the remaining bytes are shorter than indent_len; modify the loop in the function to check bounds (ensure s + i < end) before accessing s[i] and break/handle as a short-line case, and also guard the subsequent check that uses s[i] after the loop (the `if (s[i] == '\n')` branch) so it only runs when i < remaining_length; on short lines call the same error/empty-line handling path (using PyBytesWriter_Discard, RAISE_ERROR_KNOWN_LOCATION, or writing the newline via PyBytesWriter_WriteBytes) as appropriate.

coderabbitai · 2026-01-23T14:08:17Z

Parser/string_parser.c

    /* Avoid invoking escape decoding routines if possible. */
    rawmode = rawmode || strchr(s, '\\') == NULL;
+
+    int _prev_call_invald = p->call_invalid_rules;


⚠️ Potential issue | 🟡 Minor

Typo: _prev_call_invald → _prev_call_invalid.

- int _prev_call_invald = p->call_invalid_rules; + int _prev_call_invalid = p->call_invalid_rules;

Update all usages on lines 404, 452, and 467 accordingly.

🤖 Prompt for AI Agents

In `@Parser/string_parser.c` at line 392, Rename the local variable _prev_call_invald to _prev_call_invalid in Parser/string_parser.c and update all its usages (the assignments/reads currently referencing _prev_call_invald around the earlier usages) so they reference _prev_call_invalid instead (also update the declaration "int _prev_call_invald = p->call_invalid_rules;" and every subsequent use at the three places noted).

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@Parser/action_helpers.c`:
- Around line 1295-1311: The bug is that unicodewriter_write_line calls
_PyPegen_decode_string with a hardcoded raw=1, preventing escape processing for
non-raw strings; update the call in unicodewriter_write_line to pass the correct
raw flag (use 0 or the is_raw variable) instead of 1 so escapes (e.g.
backslash-newline) are decoded; ensure the call to
_PyPegen_decode_string(line_start, line_end - line_start, token) uses the proper
raw parameter consistent with is_raw.

In `@Parser/lexer/lexer.c`:
- Around line 483-485: The parser's prefix validation misses the incompatible
combination of byte and dedent prefixes: add a check that if saw_b && saw_d then
call RETURN_SYNTAX_ERROR("b", "d") (place it alongside the existing saw_b &&
saw_f / saw_b && saw_t checks in the same validation function), and update the
nearby comment that lists unsupported prefix combos to include "bd" so the
supported/incompatible list accurately reflects this case; reference the boolean
flags saw_b and saw_d and the macro/function RETURN_SYNTAX_ERROR to locate where
to insert the check and update the comment.

♻️ Duplicate comments (1)

Parser/action_helpers.c (1)

1538-1543: Check _PyArena_AddPyObject return value to prevent memory leak.

If arena insertion fails, temp_bytes leaks while an exception is set.

Proposed fix

         PyObject *temp_bytes = PyBytesWriter_Finish(w);
         if (temp_bytes == NULL) {
             return NULL;
         }
-        _PyArena_AddPyObject(p->arena, temp_bytes);
+        if (_PyArena_AddPyObject(p->arena, temp_bytes) < 0) {
+            Py_DECREF(temp_bytes);
+            return NULL;
+        }
         const char *temp_str = PyBytes_AsString(temp_bytes);

🧹 Nitpick comments (1)

Parser/action_helpers.c (1)

1448-1454: Consider moving extern declaration to a shared header.

Declaring extern functions inline in .c files works but risks signature drift if the definition in Objects/unicodeobject.c changes. Consider adding this declaration to pycore_unicodeobject.h or a similar internal header for better maintainability.

coderabbitai · 2026-01-25T08:33:23Z

Parser/action_helpers.c

+static int
+unicodewriter_write_line(Parser *p, PyUnicodeWriter *w, const char *line_start, const char *line_end,
+                         int is_raw, Token* token)
+{
+    if (is_raw || memchr(line_start, '\\', line_end - line_start) == NULL) {
+        return PyUnicodeWriter_WriteUTF8(w, line_start, line_end - line_start);
+    }
+    else {
+        PyObject *line = _PyPegen_decode_string(p, 1, line_start, line_end - line_start, token);
+        if (line == NULL || PyUnicodeWriter_WriteStr(w, line) < 0) {
+            Py_XDECREF(line);
+            return -1;
+        }
+        Py_DECREF(line);
+    }
+    return 0;
+}


⚠️ Potential issue | 🔴 Critical

Critical: Wrong raw parameter passed to _PyPegen_decode_string.

Line 1303 passes 1 (raw=true) to _PyPegen_decode_string, but this is in the else branch where is_raw is known to be false. This means escape sequences like \n, \t, etc. won't be processed for non-raw d-strings - they'll be written as literal characters.

This is likely the root cause of the line continuation issue flagged in past reviews. When escape processing is skipped, backslash-newline continuation won't work.

Proposed fix

static int unicodewriter_write_line(Parser *p, PyUnicodeWriter *w, const char *line_start, const char *line_end, int is_raw, Token* token) { if (is_raw || memchr(line_start, '\\', line_end - line_start) == NULL) { return PyUnicodeWriter_WriteUTF8(w, line_start, line_end - line_start); } else { - PyObject *line = _PyPegen_decode_string(p, 1, line_start, line_end - line_start, token); + PyObject *line = _PyPegen_decode_string(p, 0, line_start, line_end - line_start, token); if (line == NULL || PyUnicodeWriter_WriteStr(w, line) < 0) { Py_XDECREF(line); return -1; } Py_DECREF(line); } return 0; }

🤖 Prompt for AI Agents

In `@Parser/action_helpers.c` around lines 1295 - 1311, The bug is that unicodewriter_write_line calls _PyPegen_decode_string with a hardcoded raw=1, preventing escape processing for non-raw strings; update the call in unicodewriter_write_line to pass the correct raw flag (use 0 or the is_raw variable) instead of 1 so escapes (e.g. backslash-newline) are decoded; ensure the call to _PyPegen_decode_string(line_start, line_end - line_start, token) uses the proper raw parameter consistent with is_raw.

coderabbitai · 2026-01-25T08:33:23Z

Parser/lexer/lexer.c

+    if (saw_u && saw_d) {
+        RETURN_SYNTAX_ERROR("u", "d");
+    }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for any existing tests or documentation about b+d prefix combination rg -n "bd\"|db\"" --type py

Repository: methane/cpython

Length of output: 8394

🏁 Script executed:

#!/bin/bash # Search in the lexer.c file for how b prefix is handled in validation rg -n "saw_b" Parser/lexer/lexer.c | head -20

Repository: methane/cpython

Length of output: 471

🏁 Script executed:

#!/bin/bash # Look at the full validation function around lines 456-485 sed -n '450,490p' Parser/lexer/lexer.c

Repository: methane/cpython

Length of output: 1455

🏁 Script executed:

#!/bin/bash # Get the complete validation function - expand the range to see all checks sed -n '450,520p' Parser/lexer/lexer.c

Repository: methane/cpython

Length of output: 2017

Add incompatibility check for b and d prefixes.

The validation function checks for u+d, b+f, and b+t incompatibilities, but b+d is missing. A bd"..." combination would silently fall through to regular string handling without dedentation. Since b (bytes) and f/t (formatting) are already marked incompatible, b+d (dedented strings) should also be explicitly validated as incompatible.

Add a check after line 490:

if (saw_b && saw_d) { RETURN_SYNTAX_ERROR("b", "d"); }

Also update the comment at line 460 to include bd in the unsupported list for clarity.

🤖 Prompt for AI Agents

In `@Parser/lexer/lexer.c` around lines 483 - 485, The parser's prefix validation misses the incompatible combination of byte and dedent prefixes: add a check that if saw_b && saw_d then call RETURN_SYNTAX_ERROR("b", "d") (place it alongside the existing saw_b && saw_f / saw_b && saw_t checks in the same validation function), and update the nearby comment that lists unsupported prefix combos to include "bd" so the supported/incompatible list accurately reflects this case; reference the boolean flags saw_b and saw_d and the macro/function RETURN_SYNTAX_ERROR to locate where to insert the check and update the comment.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@Parser/action_helpers.c`:
- Around line 1365-1377: The code reads line_start[i] after the
indentation-match loop without ensuring the pointer is still within bounds; add
a guard after the loop to ensure (line_start + i) < end before any dereference.
Specifically, in the block containing variables line_start, end, i, indent_len
and the calls to PyUnicodeWriter_WriteChar, insert a check like if (line_start +
i >= end) break; (or otherwise handle end-of-buffer) before testing
line_start[i] == '\0' or '\n' so you never read past end.

coderabbitai · 2026-01-27T10:07:35Z

Parser/action_helpers.c

+    while (line_start < end) {
+        lineno++;
+
+        Py_ssize_t i = 0;
+        while (line_start + i < end && i < indent_len && line_start[i] == indent[i]) {
+            i++;
+        }
+
+        if (line_start[i] == '\0') {  // found an empty line without newline.
+            break;
+        }
+        if (line_start[i] == '\n') {  // found an empty line with newline.
+            if (PyUnicodeWriter_WriteChar(w, '\n') < 0) {


⚠️ Potential issue | 🟠 Major

Add bounds check before reading line_start[i].

If the remaining line is shorter than indent_len, line_start[i] can read past end. Guard the access after the loop.

🔧 Proposed fix

- while (line_start + i < end && i < indent_len && line_start[i] == indent[i]) { + while (line_start + i < end && i < indent_len && line_start[i] == indent[i]) { i++; } - - if (line_start[i] == '\0') { // found an empty line without newline. + if (line_start + i >= end) { + break; // reached end of content + } + if (line_start[i] == '\0') { // found an empty line without newline. break; }

🤖 Prompt for AI Agents

In `@Parser/action_helpers.c` around lines 1365 - 1377, The code reads line_start[i] after the indentation-match loop without ensuring the pointer is still within bounds; add a guard after the loop to ensure (line_start + i) < end before any dereference. Specifically, in the block containing variables line_start, end, i, indent_len and the calls to PyUnicodeWriter_WriteChar, insert a check like if (line_start + i >= end) break; (or otherwise handle end-of-buffer) before testing line_start[i] == '\0' or '\n' so you never read past end.

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

Repository owner deleted a comment from coderabbitai bot Jan 20, 2026

methane force-pushed the peps/0822-d-string branch from 5ce8afc to 21cc35a Compare January 20, 2026 16:40

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

Parser/action_helpers.c Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

coderabbitai bot reviewed Jan 25, 2026

View reviewed changes

methane added 3 commits January 27, 2026 18:56

first implementation of d-string

935d4f0

use least indent instead of closing quote indent

3187540

fix invalid escape sequences position

e1320af

methane force-pushed the peps/0822-d-string branch from 57a3451 to 8d6981d Compare January 27, 2026 09:57

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

methane force-pushed the peps/0822-d-string branch from 8d6981d to fbaa98e Compare January 27, 2026 10:23

improve tests

733c2d0

methane force-pushed the peps/0822-d-string branch from fbaa98e to 733c2d0 Compare January 27, 2026 12:28

t/f-string: calculate common indent without temp string

5faa196

	// checks the prefix is consistant.
	// checks the prefix is consistent.

PEP 822: d-string draft implementation #108

Are you sure you want to change the base?

PEP 822: d-string draft implementation #108

Uh oh!

Conversation

methane commented Jan 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

methane commented Jan 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 20, 2026 •

edited

Loading