Skip to content

Conversation

@bbopen
Copy link
Owner

@bbopen bbopen commented Jan 21, 2026

Summary

  • Integrate SafeCodec into python_bridge.py for NaN/Infinity rejection at encoding time
  • Handle numpy scalars via .item() extraction for proper JSON serialization
  • Handle pandas NaT, Timestamp, and Timedelta types correctly
  • Convert Python sets to lists for JSON compatibility (improves serialization coverage)

Changes

  • Import SafeCodec and CodecError in python_bridge.py
  • Create _response_codec instance with allow_nan=False
  • Replace json.dumps with _response_codec.encode in encode_response()
  • Update adversarial_module.py to use lambda (truly non-serializable) instead of set (now serializable)
  • Update test regex to accept new NaN rejection error message

Test plan

  • All 1225 tests pass
  • Adversarial playground tests pass (40/40)
  • Build and typecheck pass
  • Python integration test verifies NaN rejection and numpy scalar handling

Fixes #95, #45, #41

🤖 Generated with Claude Code

Integrate SafeCodec from safe_codec.py into python_bridge.py to:
- Reject NaN/Infinity values at encoding time with clear error messages
- Handle numpy scalars via .item() extraction for JSON serialization
- Handle pandas NaT, Timestamp, and Timedelta types properly
- Convert Python sets to lists for JSON serialization

Changes:
- Import SafeCodec and CodecError in python_bridge.py
- Create _response_codec instance with allow_nan=False
- Replace json.dumps with _response_codec.encode in encode_response()
- Update adversarial_module.py to use lambda (truly non-serializable)
  instead of set (now serializable as list)
- Update test regex to accept new NaN rejection error message

Fixes #95, #45, #41

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 21, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

The Python bridge now encodes responses using a SafeCodec instance configured with allow_nan=False, replacing direct json.dumps usage and converting codec errors into ValueError. Tests and fixtures were updated to reflect new error messages and a non-serializable fixture value.

Changes

Cohort / File(s) Summary
Core codec integration
runtime/python_bridge.py
Added SafeCodec/CodecError import and module-level _response_codec (with allow_nan=False and large max_payload_bytes); replaced json.dumps(out) with _response_codec.encode(out) and map CodecErrorValueError.
Tests & fixtures
test/adversarial_playground.test.ts, test/fixtures/python/adversarial_module.py
Expanded test assertion regex to accept SafeCodec error messages about NaN; changed return_unserializable() return from a set to a lambda to ensure non-JSON-serializable behavior.

Sequence Diagram(s)

(Skipped — changes are localized to encoding logic and tests and do not introduce a new multi-component sequential flow that requires visualization.)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

🐰 Hop, hop — I guard the stream so neat,
No NaN or Infinity will sneak or bleat.
Bytes get checked, errors politely told,
JSON stays tidy, robust, and bold. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: integrating SafeCodec into Python bridge for NaN rejection, which is the core objective of the PR.
Description check ✅ Passed The description is well-related to the changeset, detailing the SafeCodec integration, specific changes made to files, test results, and linked issues.
Linked Issues check ✅ Passed The PR addresses all requirements from issue #95: it integrates SafeCodec to reject NaN/Infinity at encoding time, surfaces clear errors, and includes test updates to validate the new error handling.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the linked issues: SafeCodec integration in python_bridge.py, test updates for NaN rejection, and adversarial module updates for proper test coverage.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e51c0f7 and 09557ce.

📒 Files selected for processing (1)
  • runtime/python_bridge.py
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:168-172
Timestamp: 2026-01-20T16:00:49.738Z
Learning: In the tywrap project's BridgeProtocol SafeCodec implementation, Arrow format decoders can produce NaN/Infinity values from binary representations even when the raw JSON payload doesn't contain them. This is why validation for special floats must occur both before encoding (to reject invalid inputs) and after applying decoders (to catch values introduced during Arrow deserialization), protecting downstream consumers from unexpected NaN/Infinity values.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:260-298
Timestamp: 2026-01-19T21:48:45.693Z
Learning: In `src/runtime/bridge-core.ts`, keep `normalizeErrorPayload` to validate error payloads from the Python subprocess. The subprocess boundary is effectively untrusted, and normalizing error responses prevents `undefined: undefined` errors on malformed payloads. Error responses are not the hot path, so the small validation overhead is acceptable for the added resilience.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:375-385
Timestamp: 2026-01-19T21:14:37.032Z
Learning: In tywrap (src/runtime/bridge-core.ts and similar), environment variable parsing follows a tolerant/best-effort policy. For example, `TYWRAP_CODEC_MAX_BYTES=1024abc` should be accepted as 1024. Only reject clearly invalid values (non-numeric start or <=0). This avoids surprising failures from minor typos.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:260-263
Timestamp: 2026-01-19T21:14:40.872Z
Learning: In `src/runtime/bridge-core.ts` and similar hot request/response loop implementations in the tywrap repository, avoid adding extra defensive validation (e.g., runtime shape checks on error payloads) in tight loops unless the protocol boundary is untrusted or there's a concrete bug report. The Python bridge protocol is controlled and validated via tests, so defensive checks would add unnecessary branching overhead without meaningful benefit.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: test/fixtures/out_of_order_bridge.py:29-48
Timestamp: 2026-01-19T21:00:52.689Z
Learning: In `test/fixtures/out_of_order_bridge.py`, the fixture intentionally leaves a pending request unanswered at EOF to simulate missing/out-of-order responses and validate bridge behavior when requests never complete; this is the exact failure mode being tested and must be preserved.
📚 Learning: 2026-01-19T21:48:45.693Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:260-298
Timestamp: 2026-01-19T21:48:45.693Z
Learning: In `src/runtime/bridge-core.ts`, keep `normalizeErrorPayload` to validate error payloads from the Python subprocess. The subprocess boundary is effectively untrusted, and normalizing error responses prevents `undefined: undefined` errors on malformed payloads. Error responses are not the hot path, so the small validation overhead is acceptable for the added resilience.

Applied to files:

  • runtime/python_bridge.py
📚 Learning: 2026-01-19T21:49:05.612Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: runtime/python_bridge.py:99-123
Timestamp: 2026-01-19T21:49:05.612Z
Learning: In the tywrap repository, TYWRAP_REQUEST_MAX_BYTES uses strict integer parsing that rejects values with trailing characters (e.g., "1024abc"). This differs from TYWRAP_CODEC_MAX_BYTES, which uses tolerant/best-effort parsing that accepts numeric prefixes. The strict policy for REQUEST_MAX_BYTES ensures explicit integer values and consistent parse behavior across Node/Python implementations.

Applied to files:

  • runtime/python_bridge.py
📚 Learning: 2026-01-19T21:14:37.032Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:375-385
Timestamp: 2026-01-19T21:14:37.032Z
Learning: In tywrap (src/runtime/bridge-core.ts and similar), environment variable parsing follows a tolerant/best-effort policy. For example, `TYWRAP_CODEC_MAX_BYTES=1024abc` should be accepted as 1024. Only reject clearly invalid values (non-numeric start or <=0). This avoids surprising failures from minor typos.

Applied to files:

  • runtime/python_bridge.py
📚 Learning: 2026-01-20T16:00:49.738Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:168-172
Timestamp: 2026-01-20T16:00:49.738Z
Learning: In the tywrap project's BridgeProtocol SafeCodec implementation, Arrow format decoders can produce NaN/Infinity values from binary representations even when the raw JSON payload doesn't contain them. This is why validation for special floats must occur both before encoding (to reject invalid inputs) and after applying decoders (to catch values introduced during Arrow deserialization), protecting downstream consumers from unexpected NaN/Infinity values.

Applied to files:

  • runtime/python_bridge.py
🧬 Code graph analysis (1)
runtime/python_bridge.py (2)
src/index.ts (1)
  • SafeCodec (17-17)
runtime/safe_codec.py (1)
  • CodecError (19-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: codec-suite
🔇 Additional comments (2)
runtime/python_bridge.py (2)

14-15: LGTM!

The import of SafeCodec and CodecError from the local safe_codec module is appropriate for the NaN rejection feature.


100-108: LGTM!

Good design: using sys.maxsize for SafeCodec's internal limit preserves the original "no limit unless env var" behavior, while the explicit CODEC_MAX_BYTES check in encode_response() provides the environment-variable-specific error message. The allow_nan=False configuration correctly addresses the NaN/Infinity rejection requirement from Issue #95.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai bot added area:codec Area: codecs and serialization enhancement New feature or request labels Jan 21, 2026
coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 21, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e51c0f7d92

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Address Codex review feedback: The original python_bridge.py had no size
limit unless TYWRAP_CODEC_MAX_BYTES was set. Using sys.maxsize preserves
this behavior while letting the explicit size check in encode_response()
provide the specific error message mentioning the env var name.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
runtime/python_bridge.py (1)

796-804: Consider consolidating UTF-8 encoding to avoid redundant work in the response loop.

SafeCodec.encode() already calls .encode('utf-8') internally to measure payload size (line 162 of safe_codec.py). The second call to payload.encode('utf-8') on line 801 of python_bridge.py repeats this work unnecessarily in the request/response handler loop. Since _response_codec is initialized with max_payload_bytes=sys.maxsize (effectively disabling SafeCodec's internal check), consider refactoring to avoid the redundant encoding—either by exposing the byte length from SafeCodec.encode() or by restructuring the size check to reuse the already-encoded bytes.

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e51c0f7 and 09557ce.

📒 Files selected for processing (1)
  • runtime/python_bridge.py
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:168-172
Timestamp: 2026-01-20T16:00:49.738Z
Learning: In the tywrap project's BridgeProtocol SafeCodec implementation, Arrow format decoders can produce NaN/Infinity values from binary representations even when the raw JSON payload doesn't contain them. This is why validation for special floats must occur both before encoding (to reject invalid inputs) and after applying decoders (to catch values introduced during Arrow deserialization), protecting downstream consumers from unexpected NaN/Infinity values.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:260-298
Timestamp: 2026-01-19T21:48:45.693Z
Learning: In `src/runtime/bridge-core.ts`, keep `normalizeErrorPayload` to validate error payloads from the Python subprocess. The subprocess boundary is effectively untrusted, and normalizing error responses prevents `undefined: undefined` errors on malformed payloads. Error responses are not the hot path, so the small validation overhead is acceptable for the added resilience.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:375-385
Timestamp: 2026-01-19T21:14:37.032Z
Learning: In tywrap (src/runtime/bridge-core.ts and similar), environment variable parsing follows a tolerant/best-effort policy. For example, `TYWRAP_CODEC_MAX_BYTES=1024abc` should be accepted as 1024. Only reject clearly invalid values (non-numeric start or <=0). This avoids surprising failures from minor typos.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:260-263
Timestamp: 2026-01-19T21:14:40.872Z
Learning: In `src/runtime/bridge-core.ts` and similar hot request/response loop implementations in the tywrap repository, avoid adding extra defensive validation (e.g., runtime shape checks on error payloads) in tight loops unless the protocol boundary is untrusted or there's a concrete bug report. The Python bridge protocol is controlled and validated via tests, so defensive checks would add unnecessary branching overhead without meaningful benefit.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: test/fixtures/out_of_order_bridge.py:29-48
Timestamp: 2026-01-19T21:00:52.689Z
Learning: In `test/fixtures/out_of_order_bridge.py`, the fixture intentionally leaves a pending request unanswered at EOF to simulate missing/out-of-order responses and validate bridge behavior when requests never complete; this is the exact failure mode being tested and must be preserved.
📚 Learning: 2026-01-19T21:48:45.693Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:260-298
Timestamp: 2026-01-19T21:48:45.693Z
Learning: In `src/runtime/bridge-core.ts`, keep `normalizeErrorPayload` to validate error payloads from the Python subprocess. The subprocess boundary is effectively untrusted, and normalizing error responses prevents `undefined: undefined` errors on malformed payloads. Error responses are not the hot path, so the small validation overhead is acceptable for the added resilience.

Applied to files:

  • runtime/python_bridge.py
📚 Learning: 2026-01-19T21:49:05.612Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: runtime/python_bridge.py:99-123
Timestamp: 2026-01-19T21:49:05.612Z
Learning: In the tywrap repository, TYWRAP_REQUEST_MAX_BYTES uses strict integer parsing that rejects values with trailing characters (e.g., "1024abc"). This differs from TYWRAP_CODEC_MAX_BYTES, which uses tolerant/best-effort parsing that accepts numeric prefixes. The strict policy for REQUEST_MAX_BYTES ensures explicit integer values and consistent parse behavior across Node/Python implementations.

Applied to files:

  • runtime/python_bridge.py
📚 Learning: 2026-01-19T21:14:37.032Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:375-385
Timestamp: 2026-01-19T21:14:37.032Z
Learning: In tywrap (src/runtime/bridge-core.ts and similar), environment variable parsing follows a tolerant/best-effort policy. For example, `TYWRAP_CODEC_MAX_BYTES=1024abc` should be accepted as 1024. Only reject clearly invalid values (non-numeric start or <=0). This avoids surprising failures from minor typos.

Applied to files:

  • runtime/python_bridge.py
📚 Learning: 2026-01-20T16:00:49.738Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:168-172
Timestamp: 2026-01-20T16:00:49.738Z
Learning: In the tywrap project's BridgeProtocol SafeCodec implementation, Arrow format decoders can produce NaN/Infinity values from binary representations even when the raw JSON payload doesn't contain them. This is why validation for special floats must occur both before encoding (to reject invalid inputs) and after applying decoders (to catch values introduced during Arrow deserialization), protecting downstream consumers from unexpected NaN/Infinity values.

Applied to files:

  • runtime/python_bridge.py
🧬 Code graph analysis (1)
runtime/python_bridge.py (2)
src/index.ts (1)
  • SafeCodec (17-17)
runtime/safe_codec.py (1)
  • CodecError (19-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: codec-suite
🔇 Additional comments (2)
runtime/python_bridge.py (2)

14-15: LGTM!

The import of SafeCodec and CodecError from the local safe_codec module is appropriate for the NaN rejection feature.


100-108: LGTM!

Good design: using sys.maxsize for SafeCodec's internal limit preserves the original "no limit unless env var" behavior, while the explicit CODEC_MAX_BYTES check in encode_response() provides the environment-variable-specific error message. The allow_nan=False configuration correctly addresses the NaN/Infinity rejection requirement from Issue #95.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@bbopen bbopen merged commit cee80c3 into main Jan 21, 2026
19 of 20 checks passed
@bbopen bbopen deleted the feat/python-codec-integration branch January 21, 2026 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:codec Area: codecs and serialization enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python bridge should disallow NaN/Infinity in JSON responses

2 participants