Skip to content

feat: [NODE-1810, NODE-1728] Hopefully resolve udev race#8368

Merged
Bownairo merged 6 commits intomasterfrom
eero/systemd-udev
Jan 23, 2026
Merged

feat: [NODE-1810, NODE-1728] Hopefully resolve udev race#8368
Bownairo merged 6 commits intomasterfrom
eero/systemd-udev

Conversation

@Bownairo
Copy link
Copy Markdown
Contributor

Override systemd-fsck for var to drop the BindsTo on the underlying crypt device. We suspect the device flapping on udev can lead to the mount locking up.

Override systemd-fsck for var to drop the BindsTo on the underlying
crypt device. We suspect the device flapping on udev can lead to the
mount locking up.
@github-actions github-actions Bot added the feat label Jan 15, 2026
@Bownairo Bownairo changed the title feat: Hopefully resolve udev race feat: [NODE-1810, NODE-1728] Hopefully resolve udev race Jan 15, 2026
@Bownairo Bownairo marked this pull request as ready for review January 15, 2026 07:54
@Bownairo Bownairo requested a review from a team as a code owner January 15, 2026 07:54
@github-actions github-actions Bot added the @node label Jan 15, 2026
Comment thread ic-os/components/guestos/init/setup-encryption/retry-var-failure.sh
Comment thread ic-os/components/guestos/init/setup-encryption/retry-var-failure.sh Outdated
Comment thread ic-os/components/guestos/init/setup-encryption/retry-var-failure.sh
@Bownairo Bownairo enabled auto-merge January 23, 2026 04:25
@Bownairo Bownairo added this pull request to the merge queue Jan 23, 2026
Merged via the queue into master with commit 9795661 Jan 23, 2026
37 checks passed
@Bownairo Bownairo deleted the eero/systemd-udev branch January 23, 2026 05:14
pull Bot pushed a commit to mikeyhodl/ic that referenced this pull request Apr 21, 2026
In as long as we've had the fix from
dfinity#8368, we have not seen the issue
reoccur. At the time, we also added a failsafe to reboot the node if the
issue is detected, which has not been used.

This removes the reboot failsafe, and leaves only the added fix.
basvandijk added a commit that referenced this pull request Apr 22, 2026
…e var.mount

The GuestOS mount-generator currently emits systemd-cryptsetup@var_crypt.service
with BindsTo=${SYSTEMD_DEVICE} on the underlying encrypted partition. A transient
udev removal/reappearance of that device during boot enqueues a stop of the
cryptsetup service which, via Conflicts=umount.target, cascades into stopping
var.mount right after the kernel has mounted the filesystem. This produces a
spurious 'Failed unmounting var.mount - /var.' during boot on
upgrade_downgrade_nns_subnet_test{,_head_nns} runs.

The systemd-fsck@dev-mapper-var_crypt.service override already drops BindsTo for
the same reason (commit 9795661, PR #8368). This change applies the same
treatment to systemd-cryptsetup@var_crypt.service. Ordering via After= is
sufficient; setup-var-encryption.sh will fail loudly if the device is absent.

Validated with: bazel test --runs_per_test=3 --jobs=3 \
  //rs/tests/consensus/upgrade:upgrade_downgrade_nns_subnet_test
All 3 runs passed (747-886 s, avg 815 s).

Created following the steps in .claude/skills/fix-flaky-tests/SKILL.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants