Skip to content

Use public cudf/pylibcudf APIs for cudf backend#121

Merged
VibhuJawa merged 7 commits intorapidsai:mainfrom
mroeschke:ref/cudf/public
Jul 2, 2025
Merged

Use public cudf/pylibcudf APIs for cudf backend#121
VibhuJawa merged 7 commits intorapidsai:mainfrom
mroeschke:ref/cudf/public

Conversation

@mroeschke
Copy link
Contributor

@mroeschke mroeschke commented Apr 16, 2025

RAPIDS 25.06 will include a new pylibcudf API to create a ListColumn from an object exposing __cuda_array_interface__ rapidsai/cudf#18425, this will let crossfit avoid using private cuDF Python APIs. Will put out of draft once the PR is merged.

Additionally, replaced some private cudf.core imports with their public counterparts.

closes #126

@VibhuJawa
Copy link
Member

Thanks for working on this. Let me test this locally and then we can merge this in (if you think it is ready for review). The past CI failures were unrelated and it should be passing now.

@mroeschke
Copy link
Contributor Author

Thanks, I don't think this PR is fully ready yet - we are waiting for some additional updates in cuDF before fully merging this. I'll ping again and take this PR out of draft once ready.

@mroeschke mroeschke marked this pull request as ready for review May 12, 2025 19:42
@mroeschke
Copy link
Contributor Author

@VibhuJawa I opened up this PR for review since the necessary cuDF PRs have been merged (and should be available in the nightlies)

@mroeschke
Copy link
Contributor Author

I've updated this PR and is now ready for review @VibhuJawa @sarahyurick

Copy link
Collaborator

@sarahyurick sarahyurick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VibhuJawa VibhuJawa requested a review from Copilot July 2, 2025 19:33
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Introduces public pylibcudf APIs for list-series creation in the cuDF backend and removes reliance on private cuDF internals.

  • Replaces manual ListColumn construction and version checks with plc.Column.from_cuda_array_interface + Series.from_pylibcudf
  • Simplifies 1D/2D and nested-3D array handling functions, dropping private imports
  • Cleans up DataFrame backend registration by removing redundant decorator for internal DataFrame

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
crossfit/backend/cudf/series.py Swapped private cuDF APIs for public pylibcudf calls; removed version-gating, manual column constructors
crossfit/backend/cudf/dataframe.py Dropped import of cudf.core.dataframe.DataFrame and its redundant @CrossFrame.register decorator
Comments suppressed due to low confidence (1)

crossfit/backend/cudf/series.py:20

  • [nitpick] The new from_pylibcudf code paths aren’t covered by existing tests. Please add unit tests for 1D, 2D, and invalid shapes to confirm correct behavior.
def create_list_series_from_1d_or_2d_ar(ar, index):

offset=0,
null_count=0,
children=(offset_col, data),
arr = cp.asarray(ar)
Copy link

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function doesn’t validate inputs with more than 2 dimensions, leading to unexpected behavior for ndim > 2. Add an explicit elif arr.ndim > 2: raise ValueError(...) to guard against unsupported shapes.

Suggested change
arr = cp.asarray(ar)
arr = cp.asarray(ar)
if arr.ndim > 2:
raise ValueError("Input array must be 1D or 2D, but got an array with ndim > 2.")

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring the co pilot comments , as plc code should just handle other dimensions too .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, confirming the pylibcudf API can handle any arbitrary, n dimensional ndarray

mask=None,
offset=0,
null_count=None,
arr = cp.asarray(ar)
Copy link

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In create_nested_list_series_from_3d_ar, there’s no check that ar is 3D. You should verify arr.ndim == 3 and raise an informative error when it isn’t.

Suggested change
arr = cp.asarray(ar)
arr = cp.asarray(ar)
if arr.ndim != 3:
raise ValueError(f"Input array must be 3D, but got {arr.ndim}D array instead.")

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring the co pilot comments as plc code should just handle other dimensions too. I think.

Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look at co pilot comments to be extra careful.

offset=0,
null_count=0,
children=(offset_col, data),
arr = cp.asarray(ar)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring the co pilot comments , as plc code should just handle other dimensions too .

mask=None,
offset=0,
null_count=None,
arr = cp.asarray(ar)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring the co pilot comments as plc code should just handle other dimensions too. I think.

@VibhuJawa VibhuJawa merged commit 76f74d0 into rapidsai:main Jul 2, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update how we use cudf Column API for 25.06+

3 participants