Skip to content

Feature: support Dask Dataframes for larger than memory returns #569

@ihs-nick

Description

@ihs-nick

Would it be possible to not only support a pandas dataframe serialization/deserialization for the return, but also a dask dataframe for when the v3io return is much larger than memory?

I was thinking that perhaps if we are reading chunks of any of these data sources into in memory arrow, then could serialize to parquet (I would prefer pure arrow files, but dask doesn't support that right now), and finally read those into a dask dataframe with read_parquet. Would love more thoughts on this, so we could support larger than memory dataframe operations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions