-
Notifications
You must be signed in to change notification settings - Fork 0
feat(codec): integrate SafeCodec into Python bridge for NaN rejection #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Integrate SafeCodec from safe_codec.py into python_bridge.py to: - Reject NaN/Infinity values at encoding time with clear error messages - Handle numpy scalars via .item() extraction for JSON serialization - Handle pandas NaT, Timestamp, and Timedelta types properly - Convert Python sets to lists for JSON serialization Changes: - Import SafeCodec and CodecError in python_bridge.py - Create _response_codec instance with allow_nan=False - Replace json.dumps with _response_codec.encode in encode_response() - Update adversarial_module.py to use lambda (truly non-serializable) instead of set (now serializable as list) - Update test regex to accept new NaN rejection error message Fixes #95, #45, #41 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughThe Python bridge now encodes responses using a SafeCodec instance configured with Changes
Sequence Diagram(s)(Skipped — changes are localized to encoding logic and tests and do not introduce a new multi-component sequential flow that requires visualization.) Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (5)📓 Common learnings📚 Learning: 2026-01-19T21:48:45.693ZApplied to files:
📚 Learning: 2026-01-19T21:49:05.612ZApplied to files:
📚 Learning: 2026-01-19T21:14:37.032ZApplied to files:
📚 Learning: 2026-01-20T16:00:49.738ZApplied to files:
🧬 Code graph analysis (1)runtime/python_bridge.py (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (2)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e51c0f7d92
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Address Codex review feedback: The original python_bridge.py had no size limit unless TYWRAP_CODEC_MAX_BYTES was set. Using sys.maxsize preserves this behavior while letting the explicit size check in encode_response() provide the specific error message mentioning the env var name. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
runtime/python_bridge.py (1)
796-804: Consider consolidating UTF-8 encoding to avoid redundant work in the response loop.
SafeCodec.encode()already calls.encode('utf-8')internally to measure payload size (line 162 of safe_codec.py). The second call topayload.encode('utf-8')on line 801 of python_bridge.py repeats this work unnecessarily in the request/response handler loop. Since_response_codecis initialized withmax_payload_bytes=sys.maxsize(effectively disabling SafeCodec's internal check), consider refactoring to avoid the redundant encoding—either by exposing the byte length fromSafeCodec.encode()or by restructuring the size check to reuse the already-encoded bytes.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
runtime/python_bridge.py
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:168-172
Timestamp: 2026-01-20T16:00:49.738Z
Learning: In the tywrap project's BridgeProtocol SafeCodec implementation, Arrow format decoders can produce NaN/Infinity values from binary representations even when the raw JSON payload doesn't contain them. This is why validation for special floats must occur both before encoding (to reject invalid inputs) and after applying decoders (to catch values introduced during Arrow deserialization), protecting downstream consumers from unexpected NaN/Infinity values.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:260-298
Timestamp: 2026-01-19T21:48:45.693Z
Learning: In `src/runtime/bridge-core.ts`, keep `normalizeErrorPayload` to validate error payloads from the Python subprocess. The subprocess boundary is effectively untrusted, and normalizing error responses prevents `undefined: undefined` errors on malformed payloads. Error responses are not the hot path, so the small validation overhead is acceptable for the added resilience.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:375-385
Timestamp: 2026-01-19T21:14:37.032Z
Learning: In tywrap (src/runtime/bridge-core.ts and similar), environment variable parsing follows a tolerant/best-effort policy. For example, `TYWRAP_CODEC_MAX_BYTES=1024abc` should be accepted as 1024. Only reject clearly invalid values (non-numeric start or <=0). This avoids surprising failures from minor typos.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:260-263
Timestamp: 2026-01-19T21:14:40.872Z
Learning: In `src/runtime/bridge-core.ts` and similar hot request/response loop implementations in the tywrap repository, avoid adding extra defensive validation (e.g., runtime shape checks on error payloads) in tight loops unless the protocol boundary is untrusted or there's a concrete bug report. The Python bridge protocol is controlled and validated via tests, so defensive checks would add unnecessary branching overhead without meaningful benefit.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: test/fixtures/out_of_order_bridge.py:29-48
Timestamp: 2026-01-19T21:00:52.689Z
Learning: In `test/fixtures/out_of_order_bridge.py`, the fixture intentionally leaves a pending request unanswered at EOF to simulate missing/out-of-order responses and validate bridge behavior when requests never complete; this is the exact failure mode being tested and must be preserved.
📚 Learning: 2026-01-19T21:48:45.693Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:260-298
Timestamp: 2026-01-19T21:48:45.693Z
Learning: In `src/runtime/bridge-core.ts`, keep `normalizeErrorPayload` to validate error payloads from the Python subprocess. The subprocess boundary is effectively untrusted, and normalizing error responses prevents `undefined: undefined` errors on malformed payloads. Error responses are not the hot path, so the small validation overhead is acceptable for the added resilience.
Applied to files:
runtime/python_bridge.py
📚 Learning: 2026-01-19T21:49:05.612Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: runtime/python_bridge.py:99-123
Timestamp: 2026-01-19T21:49:05.612Z
Learning: In the tywrap repository, TYWRAP_REQUEST_MAX_BYTES uses strict integer parsing that rejects values with trailing characters (e.g., "1024abc"). This differs from TYWRAP_CODEC_MAX_BYTES, which uses tolerant/best-effort parsing that accepts numeric prefixes. The strict policy for REQUEST_MAX_BYTES ensures explicit integer values and consistent parse behavior across Node/Python implementations.
Applied to files:
runtime/python_bridge.py
📚 Learning: 2026-01-19T21:14:37.032Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:375-385
Timestamp: 2026-01-19T21:14:37.032Z
Learning: In tywrap (src/runtime/bridge-core.ts and similar), environment variable parsing follows a tolerant/best-effort policy. For example, `TYWRAP_CODEC_MAX_BYTES=1024abc` should be accepted as 1024. Only reject clearly invalid values (non-numeric start or <=0). This avoids surprising failures from minor typos.
Applied to files:
runtime/python_bridge.py
📚 Learning: 2026-01-20T16:00:49.738Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:168-172
Timestamp: 2026-01-20T16:00:49.738Z
Learning: In the tywrap project's BridgeProtocol SafeCodec implementation, Arrow format decoders can produce NaN/Infinity values from binary representations even when the raw JSON payload doesn't contain them. This is why validation for special floats must occur both before encoding (to reject invalid inputs) and after applying decoders (to catch values introduced during Arrow deserialization), protecting downstream consumers from unexpected NaN/Infinity values.
Applied to files:
runtime/python_bridge.py
🧬 Code graph analysis (1)
runtime/python_bridge.py (2)
src/index.ts (1)
SafeCodec(17-17)runtime/safe_codec.py (1)
CodecError(19-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: codec-suite
🔇 Additional comments (2)
runtime/python_bridge.py (2)
14-15: LGTM!The import of
SafeCodecandCodecErrorfrom the localsafe_codecmodule is appropriate for the NaN rejection feature.
100-108: LGTM!Good design: using
sys.maxsizefor SafeCodec's internal limit preserves the original "no limit unless env var" behavior, while the explicitCODEC_MAX_BYTEScheck inencode_response()provides the environment-variable-specific error message. Theallow_nan=Falseconfiguration correctly addresses the NaN/Infinity rejection requirement from Issue#95.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Summary
.item()extraction for proper JSON serializationChanges
_response_codecinstance withallow_nan=Falsejson.dumpswith_response_codec.encodeinencode_response()Test plan
Fixes #95, #45, #41
🤖 Generated with Claude Code