Skip to content

OCPBUGS-78953: Fix bug with PV not getting provisioned after deletion#595

Open
gnufied wants to merge 3 commits intoopenshift:mainfrom
gnufied:fix-pv-deletion-recreation-bug
Open

OCPBUGS-78953: Fix bug with PV not getting provisioned after deletion#595
gnufied wants to merge 3 commits intoopenshift:mainfrom
gnufied:fix-pv-deletion-recreation-bug

Conversation

@gnufied
Copy link
Member

@gnufied gnufied commented Mar 19, 2026

if preferredSymlink changes then PV doesn't get re-provisioned on deletion.

https://redhat.atlassian.net/browse/OCPBUGS-78953

@gnufied gnufied changed the title Fix bug with PV not getting provisioned after deletion {WIP} Fix bug with PV not getting provisioned after deletion Mar 19, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 19, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a3e859a6-c789-43c5-8209-5de19c074d23

📥 Commits

Reviewing files that changed from the base of the PR and between 94edc48 and ef6f23b.

📒 Files selected for processing (3)
  • pkg/common/symlink_utils.go
  • test/e2e/hostudev.go
  • test/e2e/localvolume_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/e2e/hostudev.go
  • test/e2e/localvolume_test.go

Walkthrough

Adds a shared symlink lookup helper to pkg/common, updates disk-maker controllers to use it, and extends e2e test tooling to manipulate udev symlinks on nodes via Kubernetes Jobs while simplifying job node-handling to accept hostnames.

Changes

Cohort / File(s) Summary
Symlink utility + controller updates
pkg/common/symlink_utils.go, pkg/diskmaker/controllers/lv/reconcile.go, pkg/diskmaker/controllers/lvset/reconcile.go
Added exported GetSymlinkedForCurrentSC(symlinkDir string, currentDevice internal.BlockDevice) (string, error) that globs a per-SC symlink dir and returns the first path whose label matches the current device. Reconciler changes call this shared helper; lv reconciler prefers the resolved by-id path via blockDevice.GetPathByID(existingSymlink). Removed file-local helper from lvset reconciler.
Node-job API and call sites
test/e2e/jobs.go, test/e2e/cleanup_hostdirs.go, test/e2e/symlink_check.go
Changed newNodeJob signature to accept nodeHostname string (replacing corev1.Node) and updated callers to pass hostname strings; removed label-read/error path formerly in newNodeJob.
Udev symlink e2e helpers
test/e2e/hostudev.go
Added new e2e helper file providing addNewUdevSymlink, removeUdevSymlink and job constructors newAddUdevSymlinkJob, newRemoveUdevSymlinksJob to create/remove udev symlinks on a target node via Kubernetes Jobs running small bash scripts with env vars.
E2E test flow change
test/e2e/localvolume_test.go
Added a test step that determines the node hostname for a PV, computes the PV device symlink path, and calls addNewUdevSymlink(...) to switch the preferred symlink target before PV deletion checks. Added helper findNodeHostnameForPV.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can scan for known vulnerabilities in your dependencies using OSV Scanner.

OSV Scanner will automatically detect and report security vulnerabilities in your project's dependencies. No additional configuration is required.

@openshift-ci openshift-ci bot requested review from RomanBednar and tsmetana March 19, 2026 16:12
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 19, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 19, 2026
@gnufied gnufied changed the title {WIP} Fix bug with PV not getting provisioned after deletion OCPBUGS-78953: Fix bug with PV not getting provisioned after deletion Mar 19, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 19, 2026
@openshift-ci-robot
Copy link
Contributor

@gnufied: This pull request references Jira Issue OCPBUGS-78953, which is invalid:

  • expected the bug to target only the "4.22.0" version, but multiple target versions were set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

if preferredSymlink changes then PV doesn't get re-provisioned on deletion.

https://redhat.atlassian.net/browse/OCPBUGS-78953

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 19, 2026
@gnufied
Copy link
Member Author

gnufied commented Mar 19, 2026

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 19, 2026
@openshift-ci-robot
Copy link
Contributor

@gnufied: This pull request references Jira Issue OCPBUGS-78953, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/hostudev.go`:
- Around line 80-100: The ls command in newRemoveUdevSymlinkJobs' script will
cause the job to exit when PATTERN matches nothing because set -e is enabled;
make the script resilient by preventing ls from failing (e.g., change the ls
line to a tolerant form such as running ls -la "$PATTERN" || true or temporarily
disable errexit around that command) so the job continues to the rm step; update
the script string in newRemoveUdevSymlinkJobs (refer to the local variable
script and the NodeJobOptions that injects PATTERN) accordingly.

In `@test/e2e/localvolume_test.go`:
- Around line 167-174: The bug is that currentSymlink is left unset when
selectedDisk.id is empty and the code incorrectly assigns expectedPath to
filepath.Join("/dev", "name"); change that branch so currentSymlink is assigned
filepath.Join("/dev", selectedDisk.name) (use selectedDisk.name, not the literal
"name"), leaving expectedPath untouched, so downstream call addNewUdevSymlink
uses a correct currentSymlink value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4b8a6088-bc61-475f-b702-21e00021b57a

📥 Commits

Reviewing files that changed from the base of the PR and between d6c65c4 and 9aac475.

📒 Files selected for processing (8)
  • pkg/common/symlink_utils.go
  • pkg/diskmaker/controllers/lv/reconcile.go
  • pkg/diskmaker/controllers/lvset/reconcile.go
  • test/e2e/cleanup_hostdirs.go
  • test/e2e/hostudev.go
  • test/e2e/jobs.go
  • test/e2e/localvolume_test.go
  • test/e2e/symlink_check.go

@gnufied gnufied force-pushed the fix-pv-deletion-recreation-bug branch from 9aac475 to 94edc48 Compare March 20, 2026 13:50
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/e2e/localvolume_test.go (1)

167-174: ⚠️ Potential issue | 🔴 Critical

Bug: currentSymlink uses literal "name" instead of selectedDisk.name.

Line 172 uses the string literal "name" instead of selectedDisk.name, resulting in an incorrect path (/dev/name) when selectedDisk.id is empty.

Proposed fix
 		if selectedDisk.id != "" {
 			currentSymlink = filepath.Join("/dev/disk/by-id", selectedDisk.id)
 		} else {
-			currentSymlink = filepath.Join("/dev", "name")
+			currentSymlink = filepath.Join("/dev", selectedDisk.name)
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/localvolume_test.go` around lines 167 - 174, The test builds
currentSymlink incorrectly using the literal string "name" when selectedDisk.id
is empty; update the conditional that sets currentSymlink to use
selectedDisk.name (i.e., filepath.Join("/dev", selectedDisk.name)) instead of
the hardcoded "name" so the path reflects the disk's actual name; keep the
surrounding logic that only sets currentSymlink when len(pvs) > 0 and still
prefer selectedDisk.id when present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/e2e/localvolume_test.go`:
- Around line 167-174: The test builds currentSymlink incorrectly using the
literal string "name" when selectedDisk.id is empty; update the conditional that
sets currentSymlink to use selectedDisk.name (i.e., filepath.Join("/dev",
selectedDisk.name)) instead of the hardcoded "name" so the path reflects the
disk's actual name; keep the surrounding logic that only sets currentSymlink
when len(pvs) > 0 and still prefer selectedDisk.id when present.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 574583b4-dcfb-4820-8542-81b959c940e1

📥 Commits

Reviewing files that changed from the base of the PR and between d6c65c4 and 94edc48.

📒 Files selected for processing (8)
  • pkg/common/symlink_utils.go
  • pkg/diskmaker/controllers/lv/reconcile.go
  • pkg/diskmaker/controllers/lvset/reconcile.go
  • test/e2e/cleanup_hostdirs.go
  • test/e2e/hostudev.go
  • test/e2e/jobs.go
  • test/e2e/localvolume_test.go
  • test/e2e/symlink_check.go

}
}
newPreferredTarget := "/dev/disk/by-id/scsi-1-local-storage-e2e-test"
addNewUdevSymlink(t, ctx, nodeHostName, currentSymlink, newPreferredTarget)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an easy way how to test that the newly created PV many lines below actually used newPreferredTarget?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea isn't here that the newly created will use new preferredSymlink, it won't. The problem and bug was - no new PV was getting provisioned at all.

After my fix - the recreated PV will come up with same symlink it was using before. We aren't changing symlinks in this bug fix.

@gnufied
Copy link
Member Author

gnufied commented Mar 20, 2026

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2026

@gnufied: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants