-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Description
Problem
serialize_ndarray() in runtime/python_bridge.py uses pa.array(obj) which only handles 1-dimensional arrays. Multi-dimensional arrays (e.g., 2D torch tensors) fail with:
pyarrow.lib.ArrowInvalid: only handle 1-dimensional arrays
This causes the codec-suite CI to fail on the torch tensor serialization test.
Root Cause
- PyArrow's
pa.array()is designed for columnar (1D) data - PyArrow has
FixedShapeTensorArrayfor multi-dimensional data, but arrow-js doesn't support it yet (open since 2020) - The test was previously using JSON fallback which worked, but ADR-002 migration removed the fallback to test Arrow encoding
Solution: Flatten + Reshape
Implement a flatten-on-encode, reshape-on-decode strategy:
Python Side (runtime/python_bridge.py)
def serialize_ndarray(obj):
# Flatten multi-dimensional arrays for Arrow compatibility
# pa.array() only handles 1D arrays
flat = obj.flatten() if obj.ndim > 1 else obj
arr = pa.array(flat)
# ... Arrow IPC serialization ...
return {
'__tywrap__': 'ndarray',
'codecVersion': CODEC_VERSION,
'encoding': 'arrow',
'b64': b64,
'shape': list(obj.shape), # Original shape for reconstruction
'dtype': str(obj.dtype),
}JavaScript Side (src/runtime/safe-codec.ts)
// In ndarray decoder, after Arrow decode:
if (envelope.shape && envelope.shape.length > 1) {
// Reshape flat array back to original dimensions
data = reshapeArray(data, envelope.shape);
}Benefits
- Maintains Arrow efficiency - Binary encoding is ~10x smaller than JSON for numeric data
- Works with current arrow-js - No dependency on unimplemented tensor support
- Forward-compatible - Can switch to
FixedShapeTensorArraywhen arrow-js supports it - Zero-copy potential - Arrow IPC supports memory-mapped reads
Future Consideration
Consider contributing FixedShapeTensorArray support to apache/arrow-js as an upstream contribution.
Test Case
// 2D torch tensor that previously failed
const result = await bridge.call('torch', 'tensor', [[[1, 2], [3, 4]]]);
expect(result.shape).toEqual([2, 2]);
expect(result.data).toEqual([[1, 2], [3, 4]]);Affected Files
runtime/python_bridge.py- Add flatten logic toserialize_ndarray()src/runtime/safe-codec.ts- Add reshape logic to ndarray decodertest/runtime_codec_scientific.test.ts- Remove JSON fallback workaround
Metadata
Metadata
Assignees
Labels
No labels