Skip to content

Conversation

@bbopen
Copy link
Owner

@bbopen bbopen commented Jan 21, 2026

Summary

Fixes codec-suite CI failures where ndarray Arrow decoding returned raw Arrow Tables or typed arrays instead of plain JavaScript arrays.

Problem

Two issues in ndarray Arrow decoding:

  1. 1D arrays returning raw Arrow Table: Code only extracted values when shape.length > 1, so 1D arrays returned the raw Table object
  2. Typed arrays in reshaped data: column.toArray() returns typed arrays (BigInt64Array, etc.) which persisted through reshape

CI failure example:

- Expected: [1, 2, 3, 4, 5]
+ Received: Table { ... }

- Expected: [[1, 2], [3, 4]]
+ Received: [BigInt64Array { "0": 1n, "1": 2n }, BigInt64Array { "0": 3n, "1": 4n }]

Solution

  • Add typedArrayToPlain() helper: converts typed arrays to plain arrays with BigInt → Number for safe integers
  • Add extractArrowValues() helper: extracts column values from Arrow tables
  • Update ndarray decoder to always extract values (not just multi-dimensional)

Test Results

All 1232 tests pass locally. This should fix the codec-suite CI.

🤖 Generated with Claude Code

Fix two issues in ndarray Arrow decoding:

1. **1D arrays returning raw Arrow Table**: Previously, the code only
   extracted values from Arrow tables when shape.length > 1, causing
   1D arrays to return the raw Table object instead of the values.

2. **Typed arrays in reshaped data**: Arrow's column.toArray() returns
   typed arrays (Int32Array, BigInt64Array, etc.). When slicing these
   during reshape, they remained as typed arrays instead of plain JS
   arrays, causing test failures.

Solution:
- Add `typedArrayToPlain()` helper to convert typed arrays to plain
  arrays, with BigInt → Number conversion for safe integer range
- Add `extractArrowValues()` helper to extract values from Arrow tables
- Update ndarray decoder to always extract values (not just multi-dim)
- Convert typed arrays to plain arrays before reshaping

This fixes the codec-suite CI failures where tests expected plain arrays
but received Arrow Tables or BigInt64Arrays.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 21, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Adds helpers to extract plain JS arrays from typed arrays and Arrow tables, refactors ndarray decoding to extract values (sync or awaited) then reshape when shape length > 1, and preserves existing fallbacks if extraction fails or no shape is present.

Changes

Cohort / File(s) Summary
Arrow value extraction & ndarray decoding
src/utils/codec.ts
Adds typedArrayToPlain() and extractArrowValues() helpers; consolidates Arrow-table → plain-array extraction; updates ndarray decode path to extract values (including awaiting promise-based Arrow decodes) then apply reshape when needed; removes inline Arrow-to-array branches and preserves fallback behavior.

Sequence Diagram(s)

(omitted — changes are internal refactors without multi-component sequential interactions)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

Poem

🐰
I hopped through Arrow's curious maze,
turned typed arrays into plain, bright phase.
I waited for promises, then reshaped the view,
now values stretch tidy and true. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and specifically summarizes the main change: fixing Arrow value extraction for ndarray decoding by converting to plain arrays.
Description check ✅ Passed The description clearly relates to the changeset, detailing the bugs fixed, the solution implemented, and the test results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72fd3ca and 2dd95a6.

📒 Files selected for processing (1)
  • src/utils/codec.ts

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Warning

Tools execution failed with the following error:

Failed to run tools: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai bot added area:codec Area: codecs and serialization bug Something isn't working labels Jan 21, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/utils/codec.ts`:
- Around line 278-280: The fallback Array.from(arr as Iterable<unknown>) can
throw if arr is null/undefined or a non-iterable plain object; add a guard
before this call (check arr != null and typeof arr === 'object' and
Symbol.iterator in arr) or use an isIterable(arr) helper, and if not iterable
return an empty array or throw a descriptive TypeError; update the fallback in
the same function where Array.from is used and reference the existing
ArrayBuffer.isView check and the arr variable to place the guard.
- Around line 439-454: The scalar handling bug: `shape` check skips scalars so
`reshapeArray` isn't called and single-value ndarrays return the full `values`
array; update both places where you test the shape (the async branch returning
reshapeArray(values, shape) and the sync branch after
extractArrowValues(decoded)) to call reshapeArray when shape exists and
shape.length !== 1 (i.e., include shape.length === 0 and >1) so scalars
(shape.length === 0) and multi-dimensional arrays are reshaped while 1D arrays
are returned as-is; locate the checks around the calls to reshapeArray,
extractArrowValues and the variables decoded/values to make the change.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac735a5 and ab44a0d.

📒 Files selected for processing (1)
  • src/utils/codec.ts
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:168-172
Timestamp: 2026-01-20T16:00:49.738Z
Learning: In the tywrap project's BridgeProtocol SafeCodec implementation, Arrow format decoders can produce NaN/Infinity values from binary representations even when the raw JSON payload doesn't contain them. This is why validation for special floats must occur both before encoding (to reject invalid inputs) and after applying decoders (to catch values introduced during Arrow deserialization), protecting downstream consumers from unexpected NaN/Infinity values.
📚 Learning: 2026-01-20T16:00:49.738Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:168-172
Timestamp: 2026-01-20T16:00:49.738Z
Learning: In the tywrap project's BridgeProtocol SafeCodec implementation, Arrow format decoders can produce NaN/Infinity values from binary representations even when the raw JSON payload doesn't contain them. This is why validation for special floats must occur both before encoding (to reject invalid inputs) and after applying decoders (to catch values introduced during Arrow deserialization), protecting downstream consumers from unexpected NaN/Infinity values.

Applied to files:

  • src/utils/codec.ts
📚 Learning: 2026-01-20T01:34:07.064Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 136
File: src/runtime/node.ts:444-458
Timestamp: 2026-01-20T01:34:07.064Z
Learning: When reviewing promise-based polling patterns (e.g., recursive setTimeout in waitForAvailableWorker functions), ensure that any recursive timer is tracked and cleared on timeout or promise resolution to prevent timer leaks. Do not rely on no-op checks after settlement; verify clearTimeout is invoked in all paths and consider using an explicit cancellation (e.g., AbortController) or a guard to cancel pending timers when the promise settles.

Applied to files:

  • src/utils/codec.ts
📚 Learning: 2026-01-20T18:37:05.670Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: src/runtime/process-io.ts:37-40
Timestamp: 2026-01-20T18:37:05.670Z
Learning: In the tywrap repo, ESLint is used for linting (not Biome). Do not flag regex literals that contain control characters (e.g., \u001b, \u0000-\u001F) as lint errors, since the current ESLint configuration allows them. When reviewing TypeScript files (e.g., src/**/*.ts), rely on ESLint results and avoid raising issues about these specific control-character regex literals unless there is a competing config change or a policy requiring explicit escaping.

Applied to files:

  • src/utils/codec.ts
📚 Learning: 2026-01-20T16:00:49.829Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:203-211
Timestamp: 2026-01-20T16:00:49.829Z
Learning: In the BridgeProtocol implementation (tywrap), reject Map and Set explicitly before serialization (e.g., in safeStringify or the serialization entrypoint) with a clear error like "Bridge protocol does not support Map/Set values". Do not rely on post-hoc checks of non-string keys at the point of JSON.stringify, since Maps/Sets cannot round-trip. This should be enforced at the boundary where data is prepared for serialization to ensure deterministic errors and prevent invalid data from propagating.

Applied to files:

  • src/utils/codec.ts
🔇 Additional comments (1)
src/utils/codec.ts (1)

282-301: LGTM!

The extraction logic correctly handles the Arrow table structure, extracts from the first column (appropriate for flattened ndarray data), and properly converts typed arrays to plain arrays. The null return for unextractable data provides a clean fallback path for callers.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

bbopen and others added 3 commits January 21, 2026 12:33
1. Add iterable guard before Array.from fallback in typedArrayToPlain()
   to prevent throwing on null/undefined/non-iterables

2. Fix scalar handling: change condition from shape.length > 1 to
   shape.length !== 1 so that scalars (shape.length === 0) are properly
   extracted via reshapeArray returning flat[0]

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@bbopen bbopen merged commit 26140c5 into main Jan 21, 2026
19 of 20 checks passed
@bbopen bbopen deleted the fix/arrow-typed-array-conversion branch January 21, 2026 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:codec Area: codecs and serialization bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants