Fix re-entrant routing-tables RwLock deadlock in admin-space handle#2619
Open
lindskogen wants to merge 1 commit into
Open
Fix re-entrant routing-tables RwLock deadlock in admin-space handle#2619lindskogen wants to merge 1 commit into
lindskogen wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The admin-space handlers
resources_data,linkstate_data, androute_successorholdzread!(tables.tables)acrossquery.reply(...).wait()..wait()routes the response synchronously viaFace::send_response→route_send_response, which re-acquires the samezread!(tables.tables)on the same thread.std::sync::RwLockis writer-preferring: once aregister_exprwriter (any peerdeclare) queues between the two reads, the re-entrant second read blocks behind it, the writer blocks behind the first read, and the entire routing layer wedges. The transport acceptor task is independent and keeps running, so the node still accepts connections and looks alive while routing is dead.The pattern has existed since at least 1.7.0 and is present on
main.Root cause
Three admin handlers in
zenoh/src/net/runtime/adminspace.rsreply while holding the routing-tables read guard:resources_datalet rtables = zread!(tables.tables);query.reply(...).wait()inside the looplinkstate_datalet rtables = zread!(tables.tables);query.reply(...).wait()inside the looproute_successorlet rtables = zread!(tables.tables);reply(...)(callsquery.reply().wait()) under the guard before the explicitdrop(rtables)query.reply(...).wait()ultimately callszenoh/src/net/routing/dispatcher/queries.rs::route_send_response:Concurrently,
zenoh/src/net/routing/dispatcher/resource.rs::register_exprtakeszwrite!(tables.tables)on everydeclare. Sequence that wedges:resources_dataand takes read no 1.register_exprand queues a write.query.reply(...).wait()which re-entersroute_send_responseand tries to take read no 2.RwLockblocks read no 2 behind the queued write.A live 8-thread `gdb thread apply all bt` showing all four states:
Secondary bug fixed in the same change
resources_datahadkeyexpr::new(res.0.expr()).unwrap(). Under declare churn a resource can be caught transiently with an expr that is not a valid keyexpr (e.g. empty), causing a panic.zenohd's release profile ispanic = "abort", so this took the entire process down once the primary deadlock was patched away. Changed to.ok()?(skip the resource).Suggested fix
Collect the data that needs the lock, drop the guard, then reply:
Applied identically in
linkstate_dataandroute_successor(the latter also folds both the shortcut and full-successor paths into one collect-then-drop block — stock already had adrop(rtables)before the full-successor reply, but the shortcutreply()calls above it were still under the guard).154-line patch, no behavior change beyond the lock-hold window. Replies still go out in the same order; the only externally visible difference is that responses are computed before they are streamed, which for these handlers (small admin queries) is negligible.
Reproduction
Python repro code
get @/**(admin-query side →resources_data)register_expr)get @/*/routerevery 8s; three consecutive failures = wedged.Run against a stock
zenohdwith no config file, no extra features, only-l tcp/127.0.0.1:7447so the harness can reach it. Effective config is zenohd's built-in default:mode=router, gossip on, multicast on, no gateway, no region routing, default features (notransport_quic, noshared-memory, nostats, nounstable).Two independent runs (with and without the bundled repro config that disables multicast) wedged at exactly t+44s, after ~110k declares — race is very tight and very reliable.
Operational impact
Until this lands, any peer that can open a session and declare resources can wedge a
zenohdrouter in seconds — remote DoS without any authentication-bypass needed (just a trusted peer that churns subscriptions fast enough; the original incident was triggered by accidental gossip churn, not an attack).Files changed
zenoh/src/net/runtime/adminspace.rs— three handlers fixed, ~154 lines.Related
TablesRwLockdeadlock observed viaroute_datarather thanroute_send_response— same lock, same trigger, this PR adds the concrete re-entrant mechanism + fix).🏷️ Label-Based Checklist
No specific label requirements detected.
Current labels: No labels
Add one of these labels to this PR to see relevant checklist items:
api-sync,breaking-change,bug,ci,dependencies,documentation,enhancement,new feature,internalThis section updates automatically when labels change.