refactor(pathfinder): descriptor-driven library discovery and loading#1685
refactor(pathfinder): descriptor-driven library discovery and loading#1685cpcloud wants to merge 16 commits intoNVIDIA:mainfrom
Conversation
Add a per-library descriptor dataclass that consolidates all metadata (sonames, DLLs, site-packages paths, dependencies, loader flags) into a single frozen object. The registry is built at import time from the existing data tables -- zero behavioral change. 291 parametrized tests verify the registry is a faithful representation of the source dicts. Co-authored-by: Cursor <cursoragent@cursor.com>
Introduce SearchContext and FindStep to replace the monolithic finder class. Each search mechanism (site-packages, conda, CUDA_HOME) becomes a standalone step with a uniform (SearchContext) -> FindResult | None signature. Keep already-loaded handling and dependency loading as orchestration concerns. Delete the old find_nvidia_dynamic_lib module after migrating its logic. Co-authored-by: Cursor <cursoragent@cursor.com>
Add per-platform anchor-relative directory lists to LibDescriptor and use them for CUDA_HOME/conda anchor resolution. This removes special-case branching (e.g. nvvm) from the anchor-point search. Co-authored-by: Cursor <cursoragent@cursor.com>
Update the platform-specific loader code to consume LibDescriptor directly instead of consulting supported_nvidia_libs tables at runtime. This makes the loading path data-driven (desc.linux_sonames/windows_dlls, desc.requires_* flags, desc.dependencies). Co-authored-by: Cursor <cursoragent@cursor.com>
Introduce the platform loader boundary for dlopen calls and fold the immediate wrapper cleanup into the same change so loader dispatch stays straightforward while preserving behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
Introduce search_platform.py, exporting a single PLATFORM instance that implements the per-OS filesystem search behavior. search_steps routes all platform differences through SearchContext.platform, removing OS branching from the search step implementations. Co-authored-by: Cursor <cursoragent@cursor.com>
Inline single-use lib-dir lookup helpers into the platform implementations to reduce helper surface area while keeping shared rel-dir scanning helpers. Co-authored-by: Cursor <cursoragent@cursor.com>
Drop the platform-dispatch convenience properties that became unused after introducing PlatformLoader/SearchPlatform. Co-authored-by: Cursor <cursoragent@cursor.com>
Add a canonical descriptor catalog module that contains one DescriptorSpec per supported dynamic library. Add exhaustive parity tests asserting the catalog matches the current LIB_DESCRIPTORS registry field-for-field before runtime wiring is flipped. Co-authored-by: Cursor <cursoragent@cursor.com>
Switch lib_descriptor.py from "assemble-from-supported tables" to "registry-from-authored catalog". Keep backward-compatible names (`LibDescriptor`, `Strategy`, and `LIB_DESCRIPTORS`) while making descriptor_catalog the canonical source. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the hand-authored supported_nvidia_libs tables with compatibility constants derived from DESCRIPTOR_CATALOG while preserving historical export names and behaviors. This makes descriptor data the single authored source and keeps supported_nvidia_libs as a derived-views shim for existing imports. Co-authored-by: Cursor <cursoragent@cursor.com>
Consolidate post-refactor fixes for driver-lib test alignment, platform search-path edge cases, and typing/import cleanup so behavior and diagnostics remain stable. Co-authored-by: Cursor <cursoragent@cursor.com>
…variants The previous tests compared catalog entries against LIB_DESCRIPTORS, which is built directly from the same catalog -- always passing by construction. Replace with parametrized checks that verify real properties: name uniqueness, valid identifiers, strategy values, dependency graph integrity, soname/dll format, and driver lib constraints. Co-authored-by: Cursor <cursoragent@cursor.com>
Simplify catalog entries by relying on DescriptorSpec defaults and fold companion import-order cleanup into the same readability-focused change. EOF && git cherry-pick -n 5d5547f a804f26c4 && git commit --trailer "Co-authored-by: Cursor <cursoragent@cursor.com>" -F - <<'EOF' refactor(pathfinder): split add-nv-library core flow from optional UI Keep add-nv-library lightweight by default with prompt/CLI-first behavior while moving Textual chrome behind explicit UI tasks and lockfile feature splits. Co-authored-by: Cursor <cursoragent@cursor.com>
Reinstate CTK-root canary discovery in the refactored loader path and define canary eligibility/anchors on per-library descriptors so fallback policy lives with the rest of library metadata. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Recommend going per-commit on this one. |
|
/ok to test |
|
That's not entirely fair: there is an existing very large degree of centralization of config-like information in Essentially, this PR is doing the equivalent of a matrix transposition (row-major vs column-major for the matrix case / libname-oriented (new) vs packaging/platform-oriented (old)). That has pros-and-cons either way. My main high-level concern: |
I'd rather keep a single source of truth. That is one of two main goals of this PR besides cleaning up all the comingled search and platform logic. We can adapt the toolshed scripts to the new layout. |
Co-authored-by: Cursor <cursoragent@cursor.com>
|
Just to note a few things:
I'll just adapt the scripts in this PR. |
That works. I assumed it's difficult/messy.
Totally, I was suggesting one source of truth, just with one level of indirection, so that we can still easily auto-populate the equivalent of SUPPORTED_LINUX_SONAMES_CTK and SUPPORTED_WINDOWS_DLLS_CTK. If you can achieve that without the indirection, even better. The indirection I had in mind, roughly: SUPPORTED_LINUX_SONAMES_CTK = ...
SUPPORTED_WINDOWS_DLLS_CTK = ...
...
DescriptorSpec(
name="cudart",
strategy="ctk",
linux_sonames=SUPPORTED_LINUX_SONAMES_CTK["cudart"],
windows_dlls=SUPPORTED_WINDOWS_DLLS_CTK["cudart"],
site_packages_linux=("nvidia/cu13/lib", "nvidia/cuda_runtime/lib"),
site_packages_windows=("nvidia/cu13/bin/x86_64", "nvidia/cuda_runtime/bin"),
),You could even hide that behind a BTW (because I'm copy-pasting that code): Could you please replace |
Yep. I like that much better. |
Summary
Library metadata was scattered across parallel hand-maintained tables, loader internals, and module-level constants. This made it easy for policy to drift between search and load paths, hard to review changes to individual libraries, and fragile to extend. This PR consolidates all of that into a single descriptor model and restructures the search/load pipeline around it.
Why
supported_nvidia_libstables, inline constants, per-platform branching). Changing one without the others caused subtle bugs.nvvmon non-standard installs) was wired through globals that the refactor initially dropped. This PR restores it as descriptor-configured behavior.Change progression
LibDescriptorand an authored catalog as the single metadata source, then derived runtime and legacy compatibility tables from it.SearchPlatform,PlatformLoader) so OS-specific behavior is localized and the cascade is auditable.Functional outcome
load_nvidia_dynamic_lib).Made with Cursor