-
Notifications
You must be signed in to change notification settings - Fork 699
Labels
MetaA meta-request to be implemented in multiple stepsA meta-request to be implemented in multiple steps
Milestone
Description
What kind of feature would you like to request?
Additional function parameters / changed functionality / changed defaults?
Please describe your wishes
Almost all of our dask functionality relies implicitly or explicitly on row-wise chunking but likely many of the algorithms can be adapted to column-wise chunking.
For example (not a comprehensive list but should highlight the general procedure):
- (feat):
sc.get.aggregateviadask#3700 -sc.get.aggregatecan usecscmatrices and break up the computation across features (since the computations are independent over features), and then concatenate. - PCA can likely be done as a multi-pass algorithm over CSC matrices - not efficient but doable.
- HVG in general operates as a feature-space algorithm and, aside from seurat v3, really only relies on a mean-var calculation which is already CSC compatible (even seurat v3 does but in this case, also has this additional
loessstep). The mark here would be seurat v3/batched HVG selection where row-wise chunking is actually bad for the computation since it requires (likely) random subsets. In this case, proceeding in a chunked manner i.e., chunk-of-genes by chunk-of-genes probably would not be too bad. -
top_segment_proportions(i.e., from thepercent_topargument incalculate_qc_metrics) could be done in feature-wise chunks as well, and then concatenated at the end
Sub-issues
Metadata
Metadata
Assignees
Labels
MetaA meta-request to be implemented in multiple stepsA meta-request to be implemented in multiple steps