Skip to content

Splitting and Sampling Strategies #59

@selmanozleyen

Description

@selmanozleyen

The interface would be at least like the following: `AnnData -> dict[AnnData]'.

Let's limit our variables to consider to at most 2 (e.g., cell_type:a/b and split: train/val/test)

Cases:

  • User wants to split for each set train, test, val s.t. all these preserve their cell type distribution. E.g. if there is quarter celltype a in the whole dataset in train, test and val they should also be 1/4 of their respective sets. (here no entry is duplicate but in sampling this might not be the case)
  • User wants to sample each set so that the classes have equal proportion. (here there can be duplicate entries)

@FrancescaDr here do you have any more cases you'd like to discuss? Also which one would be more important for you?

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions