Skip to content

Support null values when doing batched marshaling #112

@facundominguez

Description

@facundominguez

Grossly speaking a batch is an encoding of multiple Haskell or Java values as a bunch of primitive arrays. If we have a pair in Haskell (True, 1), a batch for the type (Bool, Int32) will have an array of Bool and an array of Int32 on which the components of the pair are stored at a given position.

On the java side, this works too: new scala.Tuple2<Boolean, Integer>(true, 1) can be stored in a couple of primitive arrays in the same way. Primitive arrays are cheap to pass from Java to Haskell.

But what do we do if the Java tuple is or contains null? There is no way to store null in primitive arrays, so we are forced to have a separate boolean array (boolean isnull[]) which tells for each position in the batch if it corresponds to a null value or not.

This is the interface that we currently have to reify a batch:

class BatchReify a where
  ...
  reifyBatch :: J (Batch a) -> Int32 -> IO (Vector a)

There are a few alternatives to handle nulls.

1. All batches can contain null.

Our interface changes to

class BatchReify a where
  ...
  reifyBatch :: J (Batch a) -> Int32 -> IO (Vector (Nullable a))

where Nullable a is isomorphic to Maybe a. All instances are forced to wrap values with the Nullable type.

2. Only batches of types of the form Nullable a may contain null.

We can have an instance like

  type instance Batch (Nullable a)
    = 'Class "scala.Tuple2" <>
         '[ 'Array ('Prim 'PrimBoolean)
          , Batch a
          ]

  instance BatchReify a => BatchReify (Nullable a) where
    ...
    reifyBatch jxs n = do
      isnull <- [java| $jxs._1() |]
      v <- [java| $jxs._2() |]
             -- reify a batch of values of type `a` and later pick the
             -- non-null values as told by the @isnull@ vector.
             >>= flip reifyBatch n
      return $ V.zipWith toNullable isnull v
      where
        toNullable :: Bool -> a -> Nullable a
        toNullable 0 a = NotNull a
        toNullable _ _ = Null

Unfortunately, the above scheme requires producing dummy/default Haskell values in the positions of the vector v that correspond to nulls. Ideally, we would find a way to skip producing these values at all.

We could change reifyBatch to:

class BatchReify a where
  ...
  reifyBatch :: J (Batch a) -> Int32 -> (Int32 -> Bool) -> IO (Vector (Maybe a))

reifyBatch j sz p produces a vector where some positions are yielded with Nothing. Only those positions whose index satisfies p provide a Just value.


Any preferences?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions