Skip to content

Let exceptions from lifecycle transition callbacks bubble up.#1673

Open
de-vri-es wants to merge 1 commit into
ros2:rollingfrom
de-vri-es:patch-1
Open

Let exceptions from lifecycle transition callbacks bubble up.#1673
de-vri-es wants to merge 1 commit into
ros2:rollingfrom
de-vri-es:patch-1

Conversation

@de-vri-es

@de-vri-es de-vri-es commented May 27, 2026

Copy link
Copy Markdown

This PR removed the try/catch around transition callbacks that silently swallows errors.

Swallowing exceptions makes debugging and error handling extremely difficult. All error details are lost, both to the developer or end-user running the code and to other code that might want to inspect the error details at a higher level.

The function contract of the transistion callbacks also specifies the function should return a TransistionCallbackReturn. Raising exceptions to signal errors is not documented and unexpected.

Python code is difficult enough to debug without errors silently dissapearing :(

/edit: I now have to resort to this to find typos in my code. And it still doesn't allow me to capture the exception at a higher level.

    def on_configure(self, state: LifecycleState):
        try:
            # Do stuff
        except:
            traceback.print_exc()
            sys.exit(1)
        return TransitionCallbackReturn.SUCCESS

@fujitatomoya fujitatomoya left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for flagging this.
the silent swallow is a real problem and the that TODO has sat too long...
but i think the fix should be to log the exception (traceback could included) inside the existing handler and still return ERROR, rather than removing the handler.
that addresses the debugging pain without changing the callback contract, diverging from rclcpp, or risking FSM inconsistency. what do you think?

@de-vri-es

de-vri-es commented May 28, 2026

Copy link
Copy Markdown
Author

I don't think catching the exception is correct behaviour, even if it is logged.

Firstly, it is this is not documented, which makes the behaviour a surprise.

But more importantly: even trivial programming errors (like a mistyped variable name) in python become a runtime exception. The standard behaviour of a python program in these scenarios is to crash. Differing from standard behaviour should require a good reason.

And there's no way for the user code to restore default behaviour. My work-around above is not standard behaviour either: it won't let a high level layer catch the exception anymore.

But also, keeping the program running after a bug triggered due to a programming error is what causes FSM inconsistencies, rather than solve them. Who knows what state the program is in now? It got interrupted at some arbitrary unknown point. There may be a user error handler that could function just fine if the program was correct, but because the bug didn't crash the program, we're in undefined behaviour.

In extreme cases this kind of thing could lead to a robot malfunctioning and causing bodily harm. I know this may sound far fetched, but this is such a low level component, it may be used for literally anything.

@de-vri-es

Copy link
Copy Markdown
Author

Ok, hopefully the last commit can make everyone happy:

By default, exceptions are still caught, but also logged.

Optionally, by calling node.catch_callback_errors(False), this behavior can be completely disabled, letting exceptions bubble up as usual (without logging, because they haven't been caught, don't want to log it more than once).

And log them when they are caught.

Signed-off-by: Maarten de Vries <maarten@de-vri.es>
@de-vri-es

Copy link
Copy Markdown
Author

@fujitatomoya: Is this an acceptable trade-off?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants