Skip to content

Column-wise dask map_blocks #3723

@ilan-gold

Description

@ilan-gold

What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

Please describe your wishes

Almost all of our dask functionality relies implicitly or explicitly on row-wise chunking but likely many of the algorithms can be adapted to column-wise chunking.

For example (not a comprehensive list but should highlight the general procedure):

  • (feat): sc.get.aggregate via dask #3700 - sc.get.aggregate can use csc matrices and break up the computation across features (since the computations are independent over features), and then concatenate.
  • PCA can likely be done as a multi-pass algorithm over CSC matrices - not efficient but doable.
  • HVG in general operates as a feature-space algorithm and, aside from seurat v3, really only relies on a mean-var calculation which is already CSC compatible (even seurat v3 does but in this case, also has this additional loess step). The mark here would be seurat v3/batched HVG selection where row-wise chunking is actually bad for the computation since it requires (likely) random subsets. In this case, proceeding in a chunked manner i.e., chunk-of-genes by chunk-of-genes probably would not be too bad.
  • top_segment_proportions (i.e., from the percent_top argument in calculate_qc_metrics) could be done in feature-wise chunks as well, and then concatenated at the end

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    MetaA meta-request to be implemented in multiple steps

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions