Conversation
- Rename _program.py to _program.pyx - Convert Program to cdef class with _program.pxd declarations - Extract _MembersNeededForFinalize to module-level _ProgramMNFF (nested classes not allowed in cdef class) - Add __repr__ method to Program - Keep ProgramOptions as @DataClass (unchanged) - Keep weakref.finalize pattern for handle cleanup
- Move _translate_program_options to Program_translate_options (cdef) - Move _can_load_generated_ptx to Program_can_load_generated_ptx (cdef) - Remove unused TYPE_CHECKING import block - Follow _memory/_buffer.pyx helper function patterns
- Reorganize file structure per developer guide (principal class first) - Add module docstring, __all__, type alias section - Factor long methods into cdef inline helpers - Add proper exception specs to cdef functions - Fix docstrings (use :class: refs, public paths) - Add type annotations to public methods - Inline _nvvm_exception_manager (single use) - Remove Union import, use | syntax - Add public Program.driver_can_load_nvrtc_ptx_output() API - Update tests to use new public API Closes NVIDIA#1082
Add fixtures for different Program backends (NVRTC, PTX, NVVM) and ObjectCode code types (cubin, PTX, LTOIR). Split API_TYPES into more precise HASH_TYPES, EQ_TYPES, and WEAKREF_TYPES lists. Derive DICT_KEY_TYPES and WEAK_KEY_TYPES for collection tests.
- Add NvrtcProgramHandle and NvvmProgramHandle to resource handles module - Add function pointer initialization for nvrtcDestroyProgram and nvvmDestroyProgram - Forward-declare nvvmProgram to avoid nvvm.h dependency - Refactor detail::make_py to accept module name parameter - Remove _ProgramMNFF class from _program.pyx - Program now uses typed handles directly with RAII cleanup - Update handle property to return None when handle is null
- Add NVVMError exception class - Add HANDLE_RETURN_NVRTC for nogil NVRTC error handling with program log - Add HANDLE_RETURN_NVVM for nogil NVVM error handling with program log - Remove vestigial supported_error_type fused type - Simplify HANDLE_RETURN to directly take cydriver.CUresult
- Change cdef function return types from ObjectCode to object (Cython limitation) - Remove unused imports: intptr_t, NvrtcProgramHandle, NvvmProgramHandle, as_intptr - Update as_py(NvvmProgramHandle) to return Python int via PyLong_FromSsize_t - Update test assertions: remove handle checks after close(), test idempotency instead - Update NVVM error message regex to match new unified format
… SPDX dates - Remove Program.driver_can_load_nvrtc_ptx_output() public static method - Make _can_load_generated_ptx a cpdef (callable from Python tests) - Update tests to import _can_load_generated_ptx from cuda.core._program - Update SPDX copyright years to 2024-2026 for files with 2024-2025 - Update get_kernel docstring: name parameter is str | bytes Co-authored-by: Cursor <cursoragent@cursor.com>
…error helpers - Make NVVMError inherit from cuda.bindings.nvvm.nvvmError for compatibility with code catching nvvmError, while also inheriting from CUDAError - Simplify HANDLE_RETURN_NVRTC and HANDLE_RETURN_NVVM to just check success and delegate to helper functions - Move all error handling logic into _raise_nvrtc_error and _raise_nvvm_error Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Rename _linker.py to _linker.pyx with cdef class Linker. Create _linker.pxd with typed attribute declarations. Move inner _MembersNeededForFinalize class to module level. No behavior change; all existing tests pass. Co-authored-by: Cursor <cursoragent@cursor.com>
Add resource handle infrastructure for linker cythonization: - NvJitLinkHandle and CuLinkHandle in resource_handles.hpp/.cpp with shared_ptr RAII ownership and automatic destroy on release. - TaggedHandle<T, int> template to make void*-based handle types (NvvmProgramHandle, NvJitLinkHandle) distinct for C++ overloading. - Factory functions, as_cu/as_intptr/as_py accessors, and function pointer initialization for nvJitLinkDestroy and cuLinkDestroy. - HANDLE_RETURN_NVJITLINK in cuda_utils for nvJitLink error handling with automatic error log retrieval. Pure infrastructure; no changes to _linker.pyx yet. Co-authored-by: Cursor <cursoragent@cursor.com>
The Linker class now holds NvJitLinkHandle and CuLinkHandle shared_ptrs directly as cdef attributes, replacing the _LinkerMembersNeededForFinalize helper class and weakref.finalize destructor pattern. close() simply resets the shared_ptr. Also removes _const_char_keep_alive (unnecessary) and TYPE_CHECKING block. Co-authored-by: Cursor <cursoragent@cursor.com>
Both nvJitLink and driver paths now use cynvjitlink/cydriver C-level calls directly, releasing the GIL around expensive operations. Option arrays built with std::vector (RAII, no manual malloc/free). Removed _exception_manager context manager (was too broad, caught TypeError); driver path errors annotated via targeted except CUDAError. Removed ctypes dependency, _input_type_from_code_type (inlined), and TYPE_CHECKING block. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace _nvjitlink Python module global with _use_nvjitlink_backend flag. Input type dicts now use C-level enum values (cynvjitlink/cydriver) instead of Python enum objects. The cuda.bindings.nvjitlink module is only imported locally for availability detection. Co-authored-by: Cursor <cursoragent@cursor.com>
- Add module docstring and __all__ declaration - Reorder file: Linker (principal) -> LinkerOptions (supporting) -> cdef inline helpers -> private state - Extract cdef inline helpers: Linker_init, Linker_add_code_object, Linker_link, Linker_annotate_error_log - Fix import ordering per developer guide (5 groups) - Apply c_ prefix to cdef variables for clarity - Replace _formatted_options/option_keys with typed log members - Cache decoded log strings after link() for efficiency - Type Linker _linker member in _program.pxd (object -> Linker) Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # cuda_core/cuda/core/_cpp/resource_handles.cpp # cuda_core/cuda/core/_cpp/resource_handles.hpp # cuda_core/cuda/core/_program.pxd # cuda_core/cuda/core/_resource_handles.pxd # cuda_core/cuda/core/_resource_handles.pyx # cuda_core/cuda/core/_utils/cuda_utils.pxd # cuda_core/cuda/core/_utils/cuda_utils.pyx # cuda_core/docs/source/release/0.6.x-notes.rst
Remove unused cimports (CuLinkHandle, NvJitLinkHandle already declared in .pxd) and use __import__ for the availability check to avoid an unused-import lint error. Co-authored-by: Cursor <cursoragent@cursor.com>
|
/ok to test fc8e0c0 |
|
Made-with: Cursor # Conflicts: # cuda_core/cuda/core/_cpp/resource_handles.cpp # cuda_core/cuda/core/_cpp/resource_handles.hpp # cuda_core/cuda/core/_resource_handles.pxd # cuda_core/cuda/core/_resource_handles.pyx # cuda_core/docs/source/release/0.6.x-notes.rst
|
/ok to test a86b235 |
|
@leofang this one is ready to go. Can you take a look? |
|
@Andy-Jost Looks like there are conflicts. |
|
|
||
| .. currentmodule:: cuda.core | ||
|
|
||
| ``cuda.core`` 0.7.x Release Notes |
There was a problem hiding this comment.
Potential docs visibility issue: this file name ends with x-notes, but cuda_core/docs/source/release.rst currently globs release/*[0-9]-notes, which will not include 0.7.x-notes.rst. If this note should publish, consider widening the toctree glob (or renaming the file to match the existing pattern).
| def _nvjitlink_has_version_symbol(inner_nvjitlink) -> bool: | ||
| # This condition is equivalent to testing for version >= 12.3 | ||
| return bool(inner_nvjitlink._inspect_function_pointer("__nvJitLinkVersion")) | ||
| LinkerHandleT = Union["cuda.bindings.nvjitlink.nvJitLinkHandle", "cuda.bindings.driver.CUlinkState"] |
There was a problem hiding this comment.
Type annotation mismatch: Linker.handle returns an int on nvJitLink backend (see as_py(const NvJitLinkHandle&) -> PyLong), and cuda.bindings.nvjitlink does not expose nvJitLinkHandle as a Python type. Consider LinkerHandleT = Union[int, "cuda.bindings.driver.CUlinkState"] (or equivalent) to match runtime behavior and avoid unresolved forward refs in typing tools.
Summary
Cythonize the
_linkermodule, continuing the work from #1565 (Program) to complete #1079. This moves all linker backend calls (nvJitLink and cuLink) to the C level, releases the GIL during linking operations, and replaces the weakref/MNFF cleanup with RAII resource handles.Changes
_linker.pyto_linker.pyxwith acdef class LinkerNvJitLinkHandleandCuLinkHandleRAII wrappers to the resource handle infrastructure, includingHANDLE_RETURN_NVJITLINKfor standardized nvJitLink error handling with log annotationcuda.bindings.nvjitlink/cuda.bindings.drivercalls with C-levelcynvjitlink/cydrivercalls wrapped innogilblocksweakref.finalize/ MNFF cleanup in favor ofshared_ptr-based RAII handlescdef inlinehelpers at the end, proper import ordering,__all__, module docstringLinker _linkermember in_program.pxd(wasobject)cynvjitlinkto the 12.9.0 build requirement rationale)Performance
The Linker constructor + link cycle is dominated by the nvJitLink backend (~3.5 ms), so the Python overhead reduction is not visible in wall-clock benchmarks. The primary benefit is GIL release during those ~3.5 ms backend calls, enabling concurrent work in multithreaded applications.
For reference, the Program constructor (from #1565) saw a ~44% reduction (9.4 -> 5.3 us) where the backend overhead was much smaller.
Test Coverage
test_linker.pyandtest_program.pytests pass (98 passed, 5 skipped, 1 xfailed)Closes #1079.
Made with Cursor