[Perf] Streams 1: Add CUDA stream and event API#407
[Perf] Streams 1: Add CUDA stream and event API#407hughperkins wants to merge 2 commits intomainfrom
Conversation
Introduces qd.create_stream() and qd.create_event() for launching kernels on separate CUDA streams with event-based synchronization. The qd_stream kwarg on kernel calls routes the launch to a specific stream. Non-CUDA backends return no-op handles (0). Routes kernel launcher memory ops through the active stream.
- Make CUDAContext::stream_ thread_local for thread-safety - Convert sync memcpy_host_to_device to async on active_stream - Use weakref in Stream/Event __del__ to safely handle interpreter shutdown - Add __enter__/__exit__ context manager support for Stream and Event - Use consistent qd_stream parameter naming in Event.record and Event.wait - Add handle==0 guard to stream_synchronize
|
Review from Opus (written before the last commit above): PR Review: Add CUDA Stream and Event APIBranch: SummaryThis PR introduces a CUDA stream and event API to enable concurrent kernel execution on separate GPU streams. It adds:
The design is clean and well-layered. On non-CUDA backends, everything degrades to no-ops (handle=0). Issues and Concerns1. Thread-safety of
|
Introduces qd.create_stream() and qd.create_event() for launching kernels on separate CUDA streams with event-based synchronization. The qd_stream kwarg on kernel calls routes the launch to a specific stream. Non-CUDA backends return no-op handles (0). Routes kernel launcher memory ops through the active stream.
Lines of code added: +481 - 197 - 4 - 4 = +276
Issue: #
Brief Summary
copilot:summary
Walkthrough
copilot:walkthrough