-
Notifications
You must be signed in to change notification settings - Fork 353
Remove NACKs from IPC code #4041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
clumens
wants to merge
20
commits into
ClusterLabs:main
Choose a base branch
from
clumens:remove-naks
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
No other daemon behaves this way on unknown message, and this isn't old code. I just added it in 7d56d46 without mentioning any reasoning behind it. I can't find any clients that are expecting a NACK in this case so we should be good to just change this. Ref T991
It's possible for execd to reply with an ACK message with an error code. This happens if execd receives an invalid message - this could be a programming error, or it could potentially be a problem in transmission. The client code was only incidentally handling these messages. Before, it would fall through to the "Invalid registration message" case and be handled that way. However, because it's part of the protocol for the server to respond with a NACK on errors, it seems more correct to me for us to explicitly handle this message first thing. Note that this doesn't change any behavior. Before the client would get a message it didn't expect and the IPC connection would fail. Now the client gets an error message and the IPC connection will fail. I just like the more defined behavior. Ref T991
It's confusing to have both ACKs and NACKs in the IPC protocol when both of those can also include an error code. Most daemons do not send a NACK anyway, and most (all?) clients aren't set up to handle it. In the interest of simplifying the IPC code a bit, I'm going to remove NACKs so clients can always expect an ACK and read the error code to know what to do. Ref T991
There's no doubt more that could be done along these lines, but for the moment this will have to do. I need to add some additional error handling into this function so it makes sense to unindent to make some room.
It's possible for based to reply with an ACK message with an error code. This happens if based receives an invalid message - this could be a programming error, or it could potentially be a problem in transmission. The client code was only incidentally handling these messages. Before, it would fall through to the "Reply to the CIB registration message has unknown type" case and be handled that way. However, because it's part of the protocol for the server to respond with a NACK on errors, it seems more correct to me for us to explicitly handle this message first thing. Note that this doesn't change any behavior. Before the client would get a message it didn't expect and the IPC connection would fail. Now the client gets an error message and the IPC connection will fail. I just like the more defined behavior. Ref T991
* Change the tag that pcmk__log_xml_trace logs with to be the same as in cib_native_signon. * Use g_clear_pointer when cleaning up the answer variable. * Unindent the code at the end of the function. I'm going to be adding additional error handling here too, so it's nice to have some extra room to do so.
This is just like previous patches, but applies to signing on to a remote CIB instead. Ref T991
It's confusing to have both ACKs and NACKs in the IPC protocol when both of those can also include an error code. Most daemons do not send a NACK anyway, and most (all?) clients aren't set up to handle it. In the interest of simplifying the IPC code a bit, I'm going to remove NACKs so clients can always expect an ACK and read the error code to know what to do. Ref T991
I'm going to be adding some additional error handling in this function, so it'll be nice to have some room to do so. This only unindents the first level of the end of this function. There's more to do, but it's easier to follow if it's broken up into two patches.
It's possible for fenced to reply with an ACK message with an error code. This happens if fenced receives an invalid message - this could be a programming error, or it could potentially be a problem in transmission. The client code was only incidentally handling these messages. Before, it would fall through to the "invalid reply type" case and be handled that way. However, because it's part of the protocol for the server to respond with a NACK on errors, it seems more correct to me for us to explicitly handle this message first thing. Note that this doesn't change any behavior. Before the client would get a message it didn't expect and the IPC connection would fail. Now the client gets an error message and the IPC connection will fail. I just like the more defined behavior. Ref T991
It's confusing to have both ACKs and NACKs in the IPC protocol when both of those can also include an error code. Most daemons do not send a NACK anyway, and most (all?) clients aren't set up to handle it. In the interest of simplifying the IPC code a bit, I'm going to remove NACKs so clients can always expect an ACK and read the error code to know what to do. Ref T991
I've been testing the handling of invalid messages by making sure execd_invalid_msg always returns true. This will trigger for the very first message a daemon receives, which makes it pretty easy to experiment with. In this case, the first message is register: ([email protected]:358) trace: [remote msg] <lrmd_command t="lrmd" lrmd_op="register" lrmd_clientname="pacemaker-remote-rhel9-scratch-4:3121" lrmd_protocol_ version="1.2" lrmd_is_ipc_provider="true" lrmd_remote_msg_id="1" lrmd_remote_msg_type="request"/> In real life, we could be getting an invalid message by something being garbled or a misbehaving client. Tracing through the code that got us to the above log message, you'll see that the remote message was almost certainly received via the TLS channel (it could also be a TCP socket, but I haven't seen anything use that). However, ACKs/NACKs are an IPC mechanism, which happens on the local system only and uses completely different communications channels. There is a way to proxy IPC so that it's retransmitted to another system, but we only proxy IPC if the received message has lrmd_op="ipc_fwd". In this case, the local system is the end point of the invalid message. We would also be originating the ACK/NACK, which would mean we'd need to be able to construct some sort of fake proxied message and put it in the TLS or TCP channel. I guess it's possible we could do this, but it seems like a lot of work when the client isn't even expecting it (see handle_remote_msg). This was just recently added in 7acc4f3 and has not been in any release, so there's no mixed version upgrade concerns here. Ref T991
That way, we don't have to free it if we're just going to return right away.
Due to the if/else block, in all these log messages, op will be "request". This means all these log messages say something along the lines of "Relayed request request 1 from...".
* Make the comment explaining why we're sending a NACK a little more verbose so we can understand it better in the future. * Return after sending the NACK, allowing the code under it to be unindented. There's tons more of this that could be done, but I'm not going to worry about that right now. Ref T991
…_ack_as. All ACKs are now created with PCMK__XE_ACK so this argument serves no purpose. Ref T991
…_ack. All ACKs are now created with PCMK__XE_ACK so this argument serves no purpose. Ref T991
…ck_as. All ACKs are now created with PCMK__XE_ACK so this argument serves no purpose. Ref T991
All ACKs are now created with PCMK__XE_ACK so this argument serves no purpose. Fixes T991
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.