fix routers unregistering liveness tokens and queryables#2603
Conversation
|
happy to iterate on this more to bring it inline with the rest of the codebase |
- router unregistering doesn't modify the net graph - add cleanup at the end of send_close() to actually update the graph - small refactor to cleanup duplicated logic
There was a problem hiding this comment.
Thanks for your contribution @Evanev7.
I think this is a regression introduced in #2096. I've confirmed that #2602 doesn't occur in Zenoh 1.7.1. I'm yet to check 1.8 but I'm positive that the issue is specific to 1.9.
remove_link is an old API: the root cause of the issue here is that the remove_node_* methods misuse its (undocumented) contract. So the proper fix would be to call it exactly once in a new remove_node_entities method such that:
fn remove_node_entities(&mut self, zid: &ZenohIdProto) -> UnregisterFaceEntitiesResult;and:
struct UnregisterFaceEntitiesResult {
subscribers: HashSet<Arc<Resource>>,
queryables: HashSet<Arc<Resource>>,
tokens: HashSet<Arc<Resource>>,
}This also implies replacing unregister_face_subscribers with a new unregister_face_entities method.
The name UnregisterFaceEntitiesResult is a bit of a mouthful but it is consistent with other names such as UnregisterEntityResult.
fuzzypixelz
left a comment
There was a problem hiding this comment.
We would also need a reproduction test under zenoh/src/net/tests/regions/declare.rs. But the framework has no support for closing connections yet. This can be the subject a separate pull request.
|
ECA signed, ill get to this when I have a minute. thanks for the review! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2603 +/- ##
==========================================
- Coverage 74.05% 74.04% -0.02%
==========================================
Files 400 400
Lines 60846 60863 +17
==========================================
+ Hits 45060 45065 +5
- Misses 15786 15798 +12 ☔ View full report in Codecov by Sentry. |
Description
Zenoh routers did not unregister liveness tokens or queryables on cold disconnects due to early graph mutation logic
What does this PR do?
this pr removes the graph mutating logic and defers it until after all unregistering operations have completed
Why is this change needed?
after cold disconnecting two routers, they would not register that the remote router had disconnected, and would have to re-build their queries to be informed of the disconnection. after this change, liveliness tokens are more likely to be correctly updated after a network interface goes down
Related Issues
fixes #2602
🏷️ Label-Based Checklist
Based on the labels applied to this PR, please complete these additional requirements:
Labels:
bug🐛 Bug Fix Requirements
Since this PR is labeled as a bug fix, please ensure:
Why this matters: Bugs without tests often reoccur.
Instructions:
- [ ]to- [x])This checklist updates automatically when labels change, but preserves your checked boxes.