-
Notifications
You must be signed in to change notification settings - Fork 0
fix(codec): extract Arrow values as plain arrays for ndarray decoding #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fix two issues in ndarray Arrow decoding: 1. **1D arrays returning raw Arrow Table**: Previously, the code only extracted values from Arrow tables when shape.length > 1, causing 1D arrays to return the raw Table object instead of the values. 2. **Typed arrays in reshaped data**: Arrow's column.toArray() returns typed arrays (Int32Array, BigInt64Array, etc.). When slicing these during reshape, they remained as typed arrays instead of plain JS arrays, causing test failures. Solution: - Add `typedArrayToPlain()` helper to convert typed arrays to plain arrays, with BigInt → Number conversion for safe integer range - Add `extractArrowValues()` helper to extract values from Arrow tables - Update ndarray decoder to always extract values (not just multi-dim) - Convert typed arrays to plain arrays before reshaping This fixes the codec-suite CI failures where tests expected plain arrays but received Arrow Tables or BigInt64Arrays. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughAdds helpers to extract plain JS arrays from typed arrays and Arrow tables, refactors ndarray decoding to extract values (sync or awaited) then reshape when shape length > 1, and preserves existing fallbacks if extraction fails or no shape is present. Changes
Sequence Diagram(s)(omitted — changes are internal refactors without multi-component sequential interactions) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
✏️ Tip: You can disable this entire section by setting Warning Tools execution failed with the following error: Failed to run tools: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error) Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/utils/codec.ts`:
- Around line 278-280: The fallback Array.from(arr as Iterable<unknown>) can
throw if arr is null/undefined or a non-iterable plain object; add a guard
before this call (check arr != null and typeof arr === 'object' and
Symbol.iterator in arr) or use an isIterable(arr) helper, and if not iterable
return an empty array or throw a descriptive TypeError; update the fallback in
the same function where Array.from is used and reference the existing
ArrayBuffer.isView check and the arr variable to place the guard.
- Around line 439-454: The scalar handling bug: `shape` check skips scalars so
`reshapeArray` isn't called and single-value ndarrays return the full `values`
array; update both places where you test the shape (the async branch returning
reshapeArray(values, shape) and the sync branch after
extractArrowValues(decoded)) to call reshapeArray when shape exists and
shape.length !== 1 (i.e., include shape.length === 0 and >1) so scalars
(shape.length === 0) and multi-dimensional arrays are reshaped while 1D arrays
are returned as-is; locate the checks around the calls to reshapeArray,
extractArrowValues and the variables decoded/values to make the change.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
src/utils/codec.ts
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:168-172
Timestamp: 2026-01-20T16:00:49.738Z
Learning: In the tywrap project's BridgeProtocol SafeCodec implementation, Arrow format decoders can produce NaN/Infinity values from binary representations even when the raw JSON payload doesn't contain them. This is why validation for special floats must occur both before encoding (to reject invalid inputs) and after applying decoders (to catch values introduced during Arrow deserialization), protecting downstream consumers from unexpected NaN/Infinity values.
📚 Learning: 2026-01-20T16:00:49.738Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:168-172
Timestamp: 2026-01-20T16:00:49.738Z
Learning: In the tywrap project's BridgeProtocol SafeCodec implementation, Arrow format decoders can produce NaN/Infinity values from binary representations even when the raw JSON payload doesn't contain them. This is why validation for special floats must occur both before encoding (to reject invalid inputs) and after applying decoders (to catch values introduced during Arrow deserialization), protecting downstream consumers from unexpected NaN/Infinity values.
Applied to files:
src/utils/codec.ts
📚 Learning: 2026-01-20T01:34:07.064Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 136
File: src/runtime/node.ts:444-458
Timestamp: 2026-01-20T01:34:07.064Z
Learning: When reviewing promise-based polling patterns (e.g., recursive setTimeout in waitForAvailableWorker functions), ensure that any recursive timer is tracked and cleared on timeout or promise resolution to prevent timer leaks. Do not rely on no-op checks after settlement; verify clearTimeout is invoked in all paths and consider using an explicit cancellation (e.g., AbortController) or a guard to cancel pending timers when the promise settles.
Applied to files:
src/utils/codec.ts
📚 Learning: 2026-01-20T18:37:05.670Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: src/runtime/process-io.ts:37-40
Timestamp: 2026-01-20T18:37:05.670Z
Learning: In the tywrap repo, ESLint is used for linting (not Biome). Do not flag regex literals that contain control characters (e.g., \u001b, \u0000-\u001F) as lint errors, since the current ESLint configuration allows them. When reviewing TypeScript files (e.g., src/**/*.ts), rely on ESLint results and avoid raising issues about these specific control-character regex literals unless there is a competing config change or a policy requiring explicit escaping.
Applied to files:
src/utils/codec.ts
📚 Learning: 2026-01-20T16:00:49.829Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:203-211
Timestamp: 2026-01-20T16:00:49.829Z
Learning: In the BridgeProtocol implementation (tywrap), reject Map and Set explicitly before serialization (e.g., in safeStringify or the serialization entrypoint) with a clear error like "Bridge protocol does not support Map/Set values". Do not rely on post-hoc checks of non-string keys at the point of JSON.stringify, since Maps/Sets cannot round-trip. This should be enforced at the boundary where data is prepared for serialization to ensure deterministic errors and prevent invalid data from propagating.
Applied to files:
src/utils/codec.ts
🔇 Additional comments (1)
src/utils/codec.ts (1)
282-301: LGTM!The extraction logic correctly handles the Arrow table structure, extracts from the first column (appropriate for flattened ndarray data), and properly converts typed arrays to plain arrays. The null return for unextractable data provides a clean fallback path for callers.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
1. Add iterable guard before Array.from fallback in typedArrayToPlain() to prevent throwing on null/undefined/non-iterables 2. Fix scalar handling: change condition from shape.length > 1 to shape.length !== 1 so that scalars (shape.length === 0) are properly extracted via reshapeArray returning flat[0] Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Fixes codec-suite CI failures where ndarray Arrow decoding returned raw Arrow Tables or typed arrays instead of plain JavaScript arrays.
Problem
Two issues in ndarray Arrow decoding:
shape.length > 1, so 1D arrays returned the raw Table objectcolumn.toArray()returns typed arrays (BigInt64Array, etc.) which persisted through reshapeCI failure example:
Solution
typedArrayToPlain()helper: converts typed arrays to plain arrays with BigInt → Number for safe integersextractArrowValues()helper: extracts column values from Arrow tablesTest Results
All 1232 tests pass locally. This should fix the codec-suite CI.
🤖 Generated with Claude Code