refactor(core): drain GPU workers on unregister via shared-ref shutdown#329
refactor(core): drain GPU workers on unregister via shared-ref shutdown#329xiaguan wants to merge 2 commits into
Conversation
Instance unregister now closes each GPU worker's channel and joins the thread before returning, so any in-flight save/load finishes touching GPU memory before the caller releases CUDA tensors — no use-after-unmap. Worker shutdown runs through a shared reference (tx/handle behind Mutex<Option<...>>), so an instance can be torn down while other tasks still hold an Arc to its GpuContext. This removes the ownership handoff that previously forced the whole gating edifice: Arc::strong_count polling, active_gpu_refs, InstanceBusy, the take_idle retry loop, and the 20s/50ms timeout constants are all gone. - unregister_all is best-effort: one instance can no longer block teardown of the others (was all-or-nothing) - registry moved to parking_lot::RwLock, dropping the Poisoned error path and the inconsistent .expect() panic in list_ids - document that unregister_* block on join and must run via spawn_blocking - add GPU integration tests for unregister / unregister_all racing with concurrent saves (every interleaving valid; panic/CUDA fault = regression) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The task queue already uses crossbeam (its Sender must be Sync to live behind Arc<GpuContext>). Move the one-shot worker-ready handshake onto crossbeam too and drop the std::sync::mpsc import — one channel abstraction in the file, no 'as std_mpsc' alias. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
I found two lifecycle issues that should be addressed before merging:
The fix probably needs either a per-instance cleanup barrier that prevents same-id registration while teardown is in progress, or an engine/registry generation token so cleanup only releases tensors belonging to the old generation captured at cleanup start.
Please consider returning a richer unregister result such as "removed + shutdown_result", or otherwise ensuring that once an instance has been removed and no worker is still running, query leases and registry tensors are released even if shutdown reports an error. |
What
Instance teardown now drains and joins each GPU worker thread before returning, so any in-flight save/load finishes touching GPU memory before the caller releases CUDA tensors. This closes the use-after-unmap window on the unregister / cleanup path (drain-then-unmap).
Why the rewrite
The earlier approach inferred "in-flight" from
Arc::strong_countand polled with a 20s/50ms retry loop, refusing to unregister a "busy" instance (InstanceBusy). That is fragile (any incidentalArcclone breaks it; it doesn't even cover fire-and-forget loads — only the threadjoin()does) and it only existed to work around one constraint:shutdownneeded owned access (Arc::try_unwrap) tojoin()the worker.The fix makes worker shutdown callable through a shared reference:
tx/handlelive behindMutex<Option<…>>, soshutdown(&self)closes the channel (draining queued tasks) and joins the thread without owning theGpuContext. The real safety guarantee — thejoin()plus "release tensors after unregister" ordering — is unchanged; thestrong_countscaffolding is just deleted.Changes
gpu_worker.rs:Worker<T>shutdown via interior mutability;GpuWorkerPool::shutdown(&self). In-flight tasks drain before the join (crossbeam channel semantics).instance.rs:InstanceContext::shutdown(&self)/GpuContext::shutdown(&self); registryunregisterremoves-then-shuts-down. Removedstrong_countgating,active_gpu_refs,InstanceBusy*,take_idle*retry loops, timeout constants. Registry now usesparking_lot::RwLock(no poison ceremony, no.expect()panic inlist_ids).lib.rs: droppedEngineError::{InstanceBusy, Poisoned};unregister_all_instancesis best-effort and returnsVec<String>again (one stuck instance no longer blocks teardown of the others). Documented thatunregister_*block onjoin→ must run viaspawn_blocking.service.rs/http_server.rs: error-mapping arms for the removed variants dropped; cleanup keeps the "engine unregister, then release CUDA tensors" order.Sync-lock-in-async note
All locks are
parking_lot(sync). Audited: no guard is held across an.awaitor across the blockingjoin(save submit drops its guard before awaiting the reply; registry write guard is a temporary dropped beforeshutdown;metadatais cloned-and-released before joining). Sync is also required —Dropmust be able to callshutdownand cannot.await, andjoinis blocking (hencespawn_blocking).parking_lotguards are!Send, so holding one across an await wouldn't compile anyway.Tests
New GPU integration tests
pegaflow-core/tests/unregister_lifecycle.rs(real engine, real CUDA, public API, no mocks):unregister_while_saving_is_safe— 16 concurrent saves raceunregister_instance. Verified non-vacuous via temporary instrumentation: every run drains some in-flight saves to completion and rejects others cleanly; 10× stable.unregister_all_drains_every_instance_under_load— 3 instances under concurrent save load,unregister_alltears all down and empties the registry.Every interleaving satisfies the invariant (a save either completes or fails cleanly); a panic / CUDA fault is the regression.
Full
cargo test --release --no-default-features --features cuda-13,rdmaworkspace suite passes, includinghttp_cleanup_hang_reproandmock_vllm_rpc_e2e.🤖 Generated with Claude Code