-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Grossly speaking a batch is an encoding of multiple Haskell or Java values as a bunch of primitive arrays. If we have a pair in Haskell (True, 1), a batch for the type (Bool, Int32) will have an array of Bool and an array of Int32 on which the components of the pair are stored at a given position.
On the java side, this works too: new scala.Tuple2<Boolean, Integer>(true, 1) can be stored in a couple of primitive arrays in the same way. Primitive arrays are cheap to pass from Java to Haskell.
But what do we do if the Java tuple is or contains null? There is no way to store null in primitive arrays, so we are forced to have a separate boolean array (boolean isnull[]) which tells for each position in the batch if it corresponds to a null value or not.
This is the interface that we currently have to reify a batch:
class BatchReify a where
...
reifyBatch :: J (Batch a) -> Int32 -> IO (Vector a)There are a few alternatives to handle nulls.
1. All batches can contain null.
Our interface changes to
class BatchReify a where
...
reifyBatch :: J (Batch a) -> Int32 -> IO (Vector (Nullable a))where Nullable a is isomorphic to Maybe a. All instances are forced to wrap values with the Nullable type.
2. Only batches of types of the form Nullable a may contain null.
We can have an instance like
type instance Batch (Nullable a)
= 'Class "scala.Tuple2" <>
'[ 'Array ('Prim 'PrimBoolean)
, Batch a
]
instance BatchReify a => BatchReify (Nullable a) where
...
reifyBatch jxs n = do
isnull <- [java| $jxs._1() |]
v <- [java| $jxs._2() |]
-- reify a batch of values of type `a` and later pick the
-- non-null values as told by the @isnull@ vector.
>>= flip reifyBatch n
return $ V.zipWith toNullable isnull v
where
toNullable :: Bool -> a -> Nullable a
toNullable 0 a = NotNull a
toNullable _ _ = NullUnfortunately, the above scheme requires producing dummy/default Haskell values in the positions of the vector v that correspond to nulls. Ideally, we would find a way to skip producing these values at all.
We could change reifyBatch to:
class BatchReify a where
...
reifyBatch :: J (Batch a) -> Int32 -> (Int32 -> Bool) -> IO (Vector (Maybe a))reifyBatch j sz p produces a vector where some positions are yielded with Nothing. Only those positions whose index satisfies p provide a Just value.
Any preferences?