Skip to content

re module implementation#157

Open
Embers-of-the-Fire wants to merge 15 commits intopydantic:mainfrom
Embers-of-the-Fire:re
Open

re module implementation#157
Embers-of-the-Fire wants to merge 15 commits intopydantic:mainfrom
Embers-of-the-Fire:re

Conversation

@Embers-of-the-Fire
Copy link
Contributor

@Embers-of-the-Fire Embers-of-the-Fire commented Feb 13, 2026

This PR introduces a subset of Python’s re module. It might close #132 as a partial implementation.

Full API list

  • re.Match
    • bool, type, __eq__, __repr__
    • .group(), .groups()
    • .span(), .end()
    • .string
  • re.Pattern
    • .pattern
    • .flags
    • Shares logic with module-level functions
  • re.IGNORECASE, re.MULTILINE, re.DOTALL
  • re.compile(), re.match(), re.fullmatch(), re.findall(), re.sub()
  • re.PatternError (notes)

Unresolved questions

re.PatternError behavior

The monty implementation introduces re.PatternError as a plain error with only a test.
The monty-python binding falls back to RuntimeError because retrieving re.PatternError is awkward unless we do something like py.import("re")?.get_attr("PatternError")?.call1((msg,)).

Regex behavior

By default, this PR uses the regex crate, which does not support backreferences, lookahead, etc.
Switching to fancy_regex could more closely match CPython’s default behavior, but might introduce ReDoS risk.

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 13, 2026

Merging this PR will not alter performance

✅ 13 untouched benchmarks


Comparing Embers-of-the-Fire:re (9b7b706) with main (a07e336)

Open in CodSpeed

@samuelcolvin
Copy link
Member

The other option is regex-pcre2 which should be most compatible with python.

No idea about it's dos risks or whether the c code is harder to compile for all platforms than pure rust?

@Embers-of-the-Fire
Copy link
Contributor Author

@samuelcolvin ReDoS stems from backtracking features, so just one more implementation won’t work. IIRC google's re2 avoids those constructs entirely like regex. So it’s a trade‑off between a fast, stable engine and full compatibility.

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review since I have to go, but this looks great overall.

I would suggest we use fancy-regex with backtrack_limit() set to something reasonable, with that the dos risk seems acceptable.

@Embers-of-the-Fire
Copy link
Contributor Author

Embers-of-the-Fire commented Feb 13, 2026

So iirc re.PatternError was added and replaced re.error in some version, maybe that's why CI/CD complain about it :(. Could anyone show me how to create a proper alias?

@samuelcolvin
Copy link
Member

So iirc re.PatternError was added and replaced re.error in some version, maybe that's why CI/CD complain about it :(. Could anyone show me how to create a proper alias?

A few options:

  • use pyo3 macros (like this to have different logic for older python
  • try to import re.PatternError and fallback to importing re.error
  • try to import re.PatternError and fallback to RuntimeError

@Embers-of-the-Fire
Copy link
Contributor Author

@samuelcolvin apologies for disturbing, but I wish to confirm the codecov. Certain sections of code prove difficult to construct during testing, so I am uncertain whether they require coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

re module impl in scope?

2 participants