Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

Summary

Converts _module.py to Cython (_module.pyx) for improved performance, adding RAII-based resource handle management for CUkernel and CUlibrary driver objects.

  • Converts Kernel, ObjectCode, KernelOccupancy, and KernelAttributes to cdef class
  • Adds LibraryHandle and KernelHandle to the resource_handles C++ infrastructure
  • Replaces Python-level driver API calls with cydriver calls wrapped in nogil blocks
  • Adds .pxd file for cross-module cimport support

Changes

  • _module.py_module.pyx with cdef class definitions
  • New _module.pxd with typed attribute and method declarations
  • Extended _cpp/resource_handles.{hpp,cpp} with library/kernel handle types
  • Updated _resource_handles.{pxd,pyx} with new handle functions
  • Updated _launcher.pyx to directly access kernel handles via cimport
  • Minor updates to _linker.py, _program.py to use new factory methods
  • Test updates for API changes

Test plan

  • All existing test_module.py tests pass
  • All existing test_program.py tests pass
  • CI passes

Convert Kernel, ObjectCode, and KernelOccupancy to cdef classes with
proper .pxd declarations. This phase establishes the Cython structure
while maintaining Python driver module usage.

Changes:
- Rename _module.py to _module.pyx
- Create _module.pxd with cdef class declarations
- Convert Kernel, ObjectCode, KernelOccupancy to cdef class
- Remove _backend dict in favor of direct driver calls
- Add _init_py() Python-accessible factory for ObjectCode
- Update _program.py and _linker.py to use _init_py()
- Fix test to handle cdef class property descriptors

Phase 2b will convert driver calls to cydriver with nogil blocks.
Phase 2c will add RAII handles to resource_handles.
- Use strong types in .pxd (ObjectCode, KernelOccupancy)
- Remove cdef public - attributes now private to C level
- Add Kernel.handle property for external access
- Add ObjectCode.symbol_mapping property (symmetric with input)
- Update _launcher.pyx, _linker.py, tests to use public APIs
- Module globals: _inited, _py_major_ver, _py_minor_ver, _driver_ver,
  _kernel_ctypes, _paraminfo_supported -> cdef typed
- Module functions: _lazy_init, _get_py_major_ver, _get_py_minor_ver,
  _get_driver_ver, _get_kernel_ctypes, _is_paraminfo_supported,
  _make_dummy_library_handle -> cdef inline with exception specs
- Module constant: _supported_code_type -> cdef tuple
- Kernel._get_arguments_info -> cdef tuple

Note: KernelAttributes remains a regular Python class due to
segfaults when converted to cdef class (likely due to weakref
interaction with cdef class properties).
Follow the _MemPoolAttributes pattern:
- cdef class with inline cdef attributes (_kernel_weakref, _cache)
- _init as @classmethod (not @staticmethod cdef)
- _get_cached_attribute and _resolve_device_id use except? -1
- Explicit cast when dereferencing weakref
Extends the RAII handle system to support CUlibrary and CUkernel driver
objects used in _module.pyx. This provides automatic lifetime management
and proper cleanup for library and kernel handles.

Changes:
- Add LibraryHandle/KernelHandle types with factory functions
- Update Kernel, ObjectCode, KernelOccupancy to use typed handles
- Move KernelAttributes cdef block to .pxd for strong typing
- Update _launcher.pyx to access kernel handle directly via cdef
Replaces Python-level driver API calls with low-level cydriver calls
wrapped in nogil blocks for improved performance. This allows the GIL
to be released during CUDA driver operations.

Changes:
- cuDriverGetVersion, cuKernelGetAttribute, cuKernelGetParamInfo
- cuOccupancy* functions (with appropriate GIL handling for callbacks)
- cuKernelGetLibrary
- Update KernelAttributes._get_cached_attribute to use cydriver types
@Andy-Jost Andy-Jost added enhancement Any code-related improvements cuda.core Everything related to the cuda.core module labels Jan 22, 2026
@Andy-Jost Andy-Jost self-assigned this Jan 22, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 22, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Contributor Author

/ok to test 82d92c9

@github-actions
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant