feat: [NODE-1810, NODE-1728] Hopefully resolve udev race#8368
Merged
feat: [NODE-1810, NODE-1728] Hopefully resolve udev race#8368
Conversation
Override systemd-fsck for var to drop the BindsTo on the underlying crypt device. We suspect the device flapping on udev can lead to the mount locking up.
frankdavid
reviewed
Jan 15, 2026
Bownairo
commented
Jan 15, 2026
andrewbattat
approved these changes
Jan 21, 2026
pull Bot
pushed a commit
to mikeyhodl/ic
that referenced
this pull request
Apr 21, 2026
In as long as we've had the fix from dfinity#8368, we have not seen the issue reoccur. At the time, we also added a failsafe to reboot the node if the issue is detected, which has not been used. This removes the reboot failsafe, and leaves only the added fix.
basvandijk
added a commit
that referenced
this pull request
Apr 22, 2026
…e var.mount
The GuestOS mount-generator currently emits systemd-cryptsetup@var_crypt.service
with BindsTo=${SYSTEMD_DEVICE} on the underlying encrypted partition. A transient
udev removal/reappearance of that device during boot enqueues a stop of the
cryptsetup service which, via Conflicts=umount.target, cascades into stopping
var.mount right after the kernel has mounted the filesystem. This produces a
spurious 'Failed unmounting var.mount - /var.' during boot on
upgrade_downgrade_nns_subnet_test{,_head_nns} runs.
The systemd-fsck@dev-mapper-var_crypt.service override already drops BindsTo for
the same reason (commit 9795661, PR #8368). This change applies the same
treatment to systemd-cryptsetup@var_crypt.service. Ordering via After= is
sufficient; setup-var-encryption.sh will fail loudly if the device is absent.
Validated with: bazel test --runs_per_test=3 --jobs=3 \
//rs/tests/consensus/upgrade:upgrade_downgrade_nns_subnet_test
All 3 runs passed (747-886 s, avg 815 s).
Created following the steps in .claude/skills/fix-flaky-tests/SKILL.md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Override systemd-fsck for var to drop the BindsTo on the underlying crypt device. We suspect the device flapping on udev can lead to the mount locking up.