Skip to content

Make PDSH benchmarks (or most of them) work #110

@seberg

Description

@seberg

Some already work, some of the fixes are work in progress, but to organize the missing pieces a bit, creating an overview issue:

  • q1,q21: missing groupby-agg length (others work). q21: now fails due to slice not working on string arrays.
  • q2, q9, q20: require (some) string functions.
  • q4, q18 requires "semi" join. (We can already map that to a left join probably, but I suspect we should just implement the semi join.
    • (should add a test for duplicate handling on the RHS, I am not quite sure how it is defined for a semi-join.)
  • q7: Requires union (not immediately sure if that can be mapped to a join).
  • q8, q14: Needs libcudf copy_if_else and enable ternary operation in polars.
  • q11, q22: uses a conditional join, have to see if that can be done in the polars layer or needs deeper support.
  • q12: (not sure currently a small bug, but something else may crop up)
  • q13: "boolean not" failing, likely a small issue.
  • q15, q17: "round" missing.
  • q15: running into a broadcast issue now (may or may not be small one).
  • q19: isin missing.
  • q2: missing mask_nans should be easy to add.
  • q10, q21: missing string_array.slice() for the final .head(), only. Likely being fixed in legate/legion.
  • q20: missing unique (may be able to hack it via join in a pickle)

Still a lot not working (to be honest, more than I thought), but many should not need much.
(Overall, there is some slowness going on, but not sure where yet.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions