Skip to content

OK-5110 make stop synchronous#59

Merged
GerryMandell merged 2 commits into
mainfrom
gm/ok-5110-avd-stop-wait-for-exit
Apr 30, 2026
Merged

OK-5110 make stop synchronous#59
GerryMandell merged 2 commits into
mainfrom
gm/ok-5110-avd-stop-wait-for-exit

Conversation

@GerryMandell

@GerryMandell GerryMandell commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Make avd module's stop() action synchronous: wait for the run-avd process to actually exit before returning, instead of fire-and-forget on SIGTERM.

Problem

avd.py's stop() previously sent SIGTERM and returned immediately:

os.kill(running_avd["pid"], signal.SIGTERM)
self.result["changed"] = True
return self.result

Since /opt/orka/bin/run-avd is a bash script that manages child processes (the emulator and socat relay), it can take a few seconds to handle SIGTERM and exit cleanly. Callers that immediately query AVD state (via gather_avd_facts or pgrep) see the AVD as still "running" — even though the stop call succeeded.

This caused validation steps in our CI workflow to flake, and would surface as a confusing "AVD is running" state in any subsequent task that depends on a stop having taken effect.

Fix

After sending SIGTERM, poll the process with kill(pid, 0) until it exits or the 30s timeout expires. If the process doesn't exit in time, fail the task explicitly so callers know the stop didn't take effect.

Test plan

  • Run avd.yml -e desired_state=stopped against a running AVD — confirm task only returns after the process is actually gone (pgrep -f run-avd | grep <name> returns nothing immediately after)
  • Run avd.yml -e desired_state=stopped against a stuck process (e.g. one trapping SIGTERM) — confirm the task fails with a clear timeout message rather than reporting success
  • Run the full deploy → stop → validate cycle in CI — confirm validation no longer races

@GerryMandell GerryMandell marked this pull request as ready for review April 28, 2026 17:59
@GerryMandell GerryMandell requested a review from a team as a code owner April 28, 2026 17:59
Comment thread library/avd.py Outdated
deadline = time.monotonic() + STOP_WAIT_TIMEOUT
while time.monotonic() < deadline:
try:
# Sending signal 0 checks if the process exists without

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - I do not think this comment is needed. This is a common way to query if the process is running, in fact we use it in the run-avd script itself here

But it doesn't add too much noise, so I am fine to leave it, upto you

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed that comment.

@GerryMandell GerryMandell merged commit dbba5f4 into main Apr 30, 2026
2 checks passed
@GerryMandell GerryMandell deleted the gm/ok-5110-avd-stop-wait-for-exit branch April 30, 2026 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants