Skip to content

refactor(tests): replace exec() with importlib in example runner#1628

Open
OfficialSerge wants to merge 4 commits intoNVIDIA:mainfrom
OfficialSerge:feat/refactor-example-tests
Open

refactor(tests): replace exec() with importlib in example runner#1628
OfficialSerge wants to merge 4 commits intoNVIDIA:mainfrom
OfficialSerge:feat/refactor-example-tests

Conversation

@OfficialSerge
Copy link

Description

This PR is in response to the following TODO item in cuda_core/tests/example_tests/utils.py.

# TODO: Refactor the examples to give them a common callable `main()` to avoid needing to use exec here?
exec(script, env if env else {})

Key Changes

  • refactor run_example in utils.py using importlib, adding support for both main() and exec() scripts, tracebacks should now provide real line number feedback, thus paving the way for us to progressively enhance the remaining non-module tests and examples.
  • updated test_basic_example.py to use pathlib.Path instead of os.path.join().
  • removed redundant manual Device(0) and set_current() calls in test class, instead relying on exiting deinit_cuda() fixture for consistent teardowns (see conftest.py)

Next Steps

  • go through the remaining examples still without main() and refactor them into Python modules, some examples are currently refactored while others are still plain executables so it's a mix.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 14, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

# Collect metadata for file 'module_name' located at 'fullpath'.
# CASE: file does not exist -> spec is none.
# CASE: file is not .py -> spec is none.
# CASE: file does not have proper loader (module.spec.__loader__) -> spec.loader is none.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we necessarily need to duplicate information from the Python stdlib docs here, but don't feel strongly either way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, my ruff linter was flagging it as a potential point of runtime failure, the if-none check fixed these issues though I figured I'd leave an explanation to help people understand the need for the check, in case there are any future revisions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit more complicated than necessary...? Assuming we always have main (#1664), can we simplify the logic here after merging the other PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, technically all we need once the examples are refactored to Python modules is the following

spec = importlib.util.spec_from_file_location(module_name, fullpath)
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module

spec.loader.exec_module(module)
module.main()

There are 2 possible ways example(s) code may be invoked

via a user: In this case, __name__ does equal __main__ and we execute the code in the if statement and the subsequent example.

via a script: In this case __name__ is equal to the module stem i.e vector_add, thus what is in the if statement doesn't run, calling module.main() here is what saves us and actually runs the example code.

def test_example(self, example, deinit_cuda):
run_example(samples_path, example)
if Device().device_id != 0:
Device(0).set_current()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should kick off CI and check if removing this line is OK. IIRC on a multi-GPU system it would fail without this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I get this, let me know if I mess up:

  • each thread gets only a single reference to any one Device (hence thread-local singleton pattern)
  • a thread can reference multiple Devices
  • a Device can have multiple CUDA Contexts but a Context can only belong to a single GPU Device
  • Contexts on the same Device are mutually exclusive
  • the driver manages the context stack for a given thread

So, in a multi-GPU example, the driver recruits n devices using a given thread, then runs the kernel and calls deinit_cuda(), popping the context off the context stack.

The Problem

The driver doesn't update the current_device to 0 when popping multiple shared (cudaDeviceEnablePeerAccess) Device contexts, thus when a program asks for a new Device the driver returns the nth device instead of the 0th.

A Possible Solution

Redundantly set Device(0) as the current device prior to running the example, if prior example was multi-GPU, we are now back to Device 0, otherwise the redundant call does nothing.

def test_example(self, example_rel_path: str, deinit_cuda) -> None:
    from cuda.core import Device

    Device(0).set_current()
    run_example(str(EXAMPLES_DIR), example_rel_path)

@OfficialSerge OfficialSerge force-pushed the feat/refactor-example-tests branch from d23ba8c to f025db4 Compare February 24, 2026 23:26
@OfficialSerge OfficialSerge force-pushed the feat/refactor-example-tests branch from f025db4 to 0a1aee7 Compare February 25, 2026 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants