Environment
ROS2 rolling CI for Linux, Linux-aarch64, Linux-rhel and Windows
Description
test_executor_immediate_shutdown fails on CI with:
19:36:08 14: ============================= test session starts ==============================
19:36:08 14: platform linux -- Python 3.12.3, pytest-7.4.4, pluggy-1.4.0
19:36:08 14: cachedir: /home/jenkins-agent/workspace/ci_linux/ws/build/rclpy/ament_cmake_pytest/test_executor/.cache
19:36:08 14: rootdir: /home/jenkins-agent/workspace/ci_linux/ws/src/nadavelkabets/rclpy
19:36:08 14: configfile: pytest.ini
19:36:08 14: plugins: launch-testing-3.9.7, ament-mypy-0.20.4, ament-xmllint-0.20.4, ament-copyright-0.20.4, ament-pep257-0.20.4, ament-flake8-0.20.4, ament-lint-0.20.4, mock-3.12.0, rerunfailures-12.0, repeat-0.9.3, timeout-2.2.0, colcon-core-0.20.1, cov-4.1.0
19:36:08 14: collected 36 items
19:36:08 14:
19:36:08 14: ../../src/nadavelkabets/rclpy/rclpy/test/test_executor.py .............. [ 38%]
19:36:08 14: .F.................... [100%]
19:36:08 14:
19:36:08 14: =================================== FAILURES ===================================
19:36:08 14: ________________ TestExecutor.test_executor_immediate_shutdown _________________
19:36:08 14:
19:36:08 14: self = <test.test_executor.TestExecutor testMethod=test_executor_immediate_shutdown>
19:36:08 14:
19:36:08 14: def test_executor_immediate_shutdown(self) -> None:
19:36:08 14: self.assertIsNotNone(self.node.handle)
19:36:08 14: for cls in [SingleThreadedExecutor, EventsExecutor]:
19:36:08 14: with self.subTest(cls=cls):
19:36:08 14: executor = cls(context=self.context)
19:36:08 14: try:
19:36:08 14: got_callback = False
19:36:08 14:
19:36:08 14: def timer_callback() -> None:
19:36:08 14: nonlocal got_callback
19:36:08 14: got_callback = True
19:36:08 14:
19:36:08 14: timer_period = 1
19:36:08 14: tmr = self.node.create_timer(timer_period, timer_callback)
19:36:08 14:
19:36:08 14: self.assertTrue(executor.add_node(self.node))
19:36:08 14: t = threading.Thread(target=executor.spin, daemon=True)
19:36:08 14: start_time = time.perf_counter()
19:36:08 14: t.start()
19:36:08 14: executor.shutdown()
19:36:08 14: t.join()
19:36:08 14: end_time = time.perf_counter()
19:36:08 14:
19:36:08 14: self.node.destroy_timer(tmr)
19:36:08 14: > self.assertLess(end_time - start_time, timer_period / 2)
19:36:08 14: E AssertionError: 1.0002155330148526 not less than 0.5
19:36:08 14:
19:36:08 14: ../../src/nadavelkabets/rclpy/rclpy/test/test_executor.py:107: AssertionError
19:36:08 14: - generated xml file: /home/jenkins-agent/workspace/ci_linux/ws/build/rclpy/test_results/rclpy/test_executor.xunit.xml -
19:36:08 14: =========================== short test summary info ============================
19:36:08 14: FAILED ../../src/nadavelkabets/rclpy/rclpy/test/test_executor.py::TestExecutor::test_executor_immediate_shutdown
19:36:08 14: ======================== 1 failed, 35 passed in 26.79s =========================
19:36:08 14: -- run_test.py: return code 1
19:36:08 14: -- run_test.py: verify result file '/home/jenkins-agent/workspace/ci_linux/ws/build/rclpy/test_results/rclpy/test_executor.xunit.xml'
19:36:08 14/59 Test #14: test_executor .....................***Failed 27.54 sec
Reproduction
I built a rolling source checkout and ran the executor test suite 30 times sequentially with 0 failures.
I managed to reproduce the bug with 2 different code alterations.
Probable race condition - calling executor.shutdown() after entering _wait_for_ready_callbacks() but before adding the executor's guard condition to the wait_set.
Forced context switch
I added a time.sleep() call in _wait_for_ready_callbacks() just after entering the main while loop:
def _wait_for_ready_callbacks():
...
while not yielded_work and not self._is_shutdown and not condition():
time.sleep(0.001) # added line
With this change, the test failed every time.
Inducing more context switches
To simulate a fully loaded CI runner, I decreased the python context switch interval using sys.setswitchinterval(1e-6).
With this change, the test failed 5 times out of 30 sequential runs.
Environment
ROS2 rolling CI for Linux, Linux-aarch64, Linux-rhel and Windows
Description
test_executor_immediate_shutdownfails on CI with:Reproduction
I built a rolling source checkout and ran the executor test suite 30 times sequentially with 0 failures.
I managed to reproduce the bug with 2 different code alterations.
Probable race condition - calling executor.shutdown() after entering
_wait_for_ready_callbacks()but before adding the executor's guard condition to the wait_set.Forced context switch
I added a time.sleep() call in
_wait_for_ready_callbacks()just after entering the main while loop:With this change, the test failed every time.
Inducing more context switches
To simulate a fully loaded CI runner, I decreased the python context switch interval using
sys.setswitchinterval(1e-6).With this change, the test failed 5 times out of 30 sequential runs.