Use public cudf/pylibcudf APIs for cudf backend#121
Conversation
|
Thanks for working on this. Let me test this locally and then we can merge this in (if you think it is ready for review). The past CI failures were unrelated and it should be passing now. |
|
Thanks, I don't think this PR is fully ready yet - we are waiting for some additional updates in cuDF before fully merging this. I'll ping again and take this PR out of draft once ready. |
|
@VibhuJawa I opened up this PR for review since the necessary cuDF PRs have been merged (and should be available in the nightlies) |
|
I've updated this PR and is now ready for review @VibhuJawa @sarahyurick |
There was a problem hiding this comment.
Pull Request Overview
Introduces public pylibcudf APIs for list-series creation in the cuDF backend and removes reliance on private cuDF internals.
- Replaces manual
ListColumnconstruction and version checks withplc.Column.from_cuda_array_interface+Series.from_pylibcudf - Simplifies 1D/2D and nested-3D array handling functions, dropping private imports
- Cleans up DataFrame backend registration by removing redundant decorator for internal
DataFrame
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| crossfit/backend/cudf/series.py | Swapped private cuDF APIs for public pylibcudf calls; removed version-gating, manual column constructors |
| crossfit/backend/cudf/dataframe.py | Dropped import of cudf.core.dataframe.DataFrame and its redundant @CrossFrame.register decorator |
Comments suppressed due to low confidence (1)
crossfit/backend/cudf/series.py:20
- [nitpick] The new
from_pylibcudfcode paths aren’t covered by existing tests. Please add unit tests for 1D, 2D, and invalid shapes to confirm correct behavior.
def create_list_series_from_1d_or_2d_ar(ar, index):
| offset=0, | ||
| null_count=0, | ||
| children=(offset_col, data), | ||
| arr = cp.asarray(ar) |
There was a problem hiding this comment.
The function doesn’t validate inputs with more than 2 dimensions, leading to unexpected behavior for ndim > 2. Add an explicit elif arr.ndim > 2: raise ValueError(...) to guard against unsupported shapes.
| arr = cp.asarray(ar) | |
| arr = cp.asarray(ar) | |
| if arr.ndim > 2: | |
| raise ValueError("Input array must be 1D or 2D, but got an array with ndim > 2.") |
There was a problem hiding this comment.
Ignoring the co pilot comments , as plc code should just handle other dimensions too .
There was a problem hiding this comment.
Yup, confirming the pylibcudf API can handle any arbitrary, n dimensional ndarray
| mask=None, | ||
| offset=0, | ||
| null_count=None, | ||
| arr = cp.asarray(ar) |
There was a problem hiding this comment.
In create_nested_list_series_from_3d_ar, there’s no check that ar is 3D. You should verify arr.ndim == 3 and raise an informative error when it isn’t.
| arr = cp.asarray(ar) | |
| arr = cp.asarray(ar) | |
| if arr.ndim != 3: | |
| raise ValueError(f"Input array must be 3D, but got {arr.ndim}D array instead.") |
There was a problem hiding this comment.
Ignoring the co pilot comments as plc code should just handle other dimensions too. I think.
VibhuJawa
left a comment
There was a problem hiding this comment.
Took a look at co pilot comments to be extra careful.
| offset=0, | ||
| null_count=0, | ||
| children=(offset_col, data), | ||
| arr = cp.asarray(ar) |
There was a problem hiding this comment.
Ignoring the co pilot comments , as plc code should just handle other dimensions too .
| mask=None, | ||
| offset=0, | ||
| null_count=None, | ||
| arr = cp.asarray(ar) |
There was a problem hiding this comment.
Ignoring the co pilot comments as plc code should just handle other dimensions too. I think.
RAPIDS 25.06 will include a new pylibcudf API to create a
ListColumnfrom an object exposing__cuda_array_interface__rapidsai/cudf#18425, this will let crossfit avoid using private cuDF Python APIs. Will put out of draft once the PR is merged.Additionally, replaced some private
cudf.coreimports with their public counterparts.closes #126