Skip to content

DRPC API Extension: Add dryRun spec to extend TestFailover action#2416

Merged
BenamarMk merged 11 commits intoRamenDR:mainfrom
am-agrawa:7937-add-drpc-new-action
Apr 14, 2026
Merged

DRPC API Extension: Add dryRun spec to extend TestFailover action#2416
BenamarMk merged 11 commits intoRamenDR:mainfrom
am-agrawa:7937-add-drpc-new-action

Conversation

@am-agrawa
Copy link
Copy Markdown
Member

No description provided.

@am-agrawa am-agrawa marked this pull request as draft February 11, 2026 13:13
@am-agrawa am-agrawa force-pushed the 7937-add-drpc-new-action branch 4 times, most recently from 577b2df to dcaf8c5 Compare February 12, 2026 10:21
}
}

func (d *DRPCInstance) convertStateForTestIfNeeded(nextState rmn.DRState) rmn.DRState {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the function name to be something like adjustPhaseIfTestFailover

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return nextState
}

func getTestFailoverPhase(nextState rmn.DRState) rmn.DRState {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and how about this one? I would change it to mapPhaseForTestFailover.
In that case, you would get something like:
adjustPhaseIfTestFailover --> mapPhaseForTestFailover
One is conditional, the other one is basically do it

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@am-agrawa am-agrawa force-pushed the 7937-add-drpc-new-action branch 2 times, most recently from 24c4987 to 660c4b6 Compare February 16, 2026 11:26
ProgressionDeleting = ProgressionStatus("Deleting")
ProgressionDeleted = ProgressionStatus("Deleted")
ProgressionActionPaused = ProgressionStatus("Paused")
ProgressionTestFailover = ProgressionStatus("TestingFailover")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the same patter naming. The progression variable is named as: Progression + progression name.
so use ProgressionTestingFailover instead of ProgressionTestFailover.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@am-agrawa am-agrawa force-pushed the 7937-add-drpc-new-action branch 4 times, most recently from 42b70e4 to fda537b Compare February 20, 2026 17:42
@BenamarMk
Copy link
Copy Markdown
Member

BenamarMk commented Feb 22, 2026

@am-agrawa I pushed two commits. One fixes a bug I ran into, and the other adds support for the TestFailover action, including abort handling for Initial Deploy, Failover, and Relocate.

I tested the following order:

  1. Initial Deployment
  2. TestFailover
  3. Abort
  4. Failover
  5. TestFailover
  6. Abort
  7. Relocate
  8. TestFailover
  9. Abort
  10. Failover
  11. Failover
  12. TestFailover
  13. Abort

All of them return to the previous action cleanly after an Abort of the test, with no issues observed.
I turned off a few linter errors. We'll fix them later.

@BenamarMk BenamarMk force-pushed the 7937-add-drpc-new-action branch from 0b09a0a to bb49337 Compare February 23, 2026 21:59
@am-agrawa am-agrawa force-pushed the 7937-add-drpc-new-action branch from bb49337 to 0b09a0a Compare February 24, 2026 07:54
@BenamarMk BenamarMk force-pushed the 7937-add-drpc-new-action branch 2 times, most recently from bb49337 to db88f3c Compare February 24, 2026 13:38
@am-agrawa am-agrawa force-pushed the 7937-add-drpc-new-action branch 4 times, most recently from 1abfd99 to 77ce786 Compare March 25, 2026 08:01
rule: self == oldSelf
dryRun:
description: |-
DryRun when set to true, makes the action (Failover or Relocate) non-destructive.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DryRun when set to true, makes the action (Failover or Relocate) non-destructive.
DryRun when set to true, makes the Failover action non-destructive.

@am-agrawa am-agrawa force-pushed the 7937-add-drpc-new-action branch 3 times, most recently from 99427cd to 948d474 Compare March 28, 2026 09:54
@am-agrawa am-agrawa force-pushed the 7937-add-drpc-new-action branch 2 times, most recently from 5b7b4ee to 7604973 Compare April 2, 2026 05:03
@am-agrawa am-agrawa changed the title DRPC API Extension: Add TestFailover action DRPC API Extension: Add dryRun spec to extend TestFailover action Apr 2, 2026
@am-agrawa am-agrawa force-pushed the 7937-add-drpc-new-action branch from 7604973 to 9759580 Compare April 2, 2026 08:48
am-agrawa added a commit to am-agrawa/ramenctl that referenced this pull request Apr 6, 2026
Implements issue RamenDR#404 - Add commands to manage dry-run failover testing.

This adds two commands:
- `ramenctl failover dry-run`: Test failover to secondary cluster without
  affecting the primary application
- `ramenctl failover dry-run --abort`: Revert the dry-run and return to
  original state

The implementation:
- Uses Ramen's dryRun field in DRPlacementControlSpec
- Relies on last-action label for safe state restoration
- Supports all three pre-dry-run states: Deployed, FailedOver, Relocated
- Includes comprehensive error handling and user feedback

IMPORTANT: This code requires Ramen PR #2416 to be merged first. The
DryRun field is not yet available in the current Ramen API version.

Reference: RamenDR/ramen#2416
Closes: RamenDR#404

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
am-agrawa added a commit to am-agrawa/ramenctl that referenced this pull request Apr 6, 2026
Implements issue RamenDR#404 - Add commands to manage dry-run failover testing.

This adds two commands:
- `ramenctl failover dry-run`: Test failover to secondary cluster without
  affecting the primary application
- `ramenctl failover dry-run --abort`: Revert the dry-run and return to
  original state

The implementation:
- Uses Ramen's dryRun field in DRPlacementControlSpec
- Relies on last-action label for safe state restoration
- Supports all three pre-dry-run states: Deployed, FailedOver, Relocated
- Includes comprehensive error handling and user feedback

IMPORTANT: This code requires Ramen PR #2416 to be merged first. The
DryRun field is not yet available in the current Ramen API version.

Reference: RamenDR/ramen#2416
Closes: RamenDR#404

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@am-agrawa am-agrawa force-pushed the 7937-add-drpc-new-action branch 5 times, most recently from 3134fcd to faffdb6 Compare April 8, 2026 08:38
@am-agrawa am-agrawa marked this pull request as ready for review April 8, 2026 08:39
am-agrawa and others added 10 commits April 13, 2026 14:43
Signed-off-by: Aman Agrawal <aman_31dec@yahoo.in>
Signed-off-by: Aman Agrawal <aman_31dec@yahoo.in>
Signed-off-by: Aman Agrawal <aman_31dec@yahoo.in>
…preserve conditions

Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
Introduce non-destructive TestFailover action to verify secondary cluster
readiness without committing to failover.

- Add VRGActionTestFailover and update CRD enums/YAML
- Implement placement logic, cleanup, and action execution refactor
- Exclude test primaries from multi-primary checks
- Restore original placement decisions after test failover
- Treat TestFailover like Failover for resync and VolSync restore
- Skip LastAppDeploymentCluster updates during test failover
- Improve comments, readability, and lint compliance

Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
Replace the separate ActionTestFailover action type with a simpler attribute-based
approach using a DryRun boolean field. This cleaner design separates concerns:
- Action Failover indicates the operation to perform
- DryRun boolean indicates if the operation should be non-destructive/test mode
- Progression status (TestingFailover) continues to indicate test mode

Changes:
- Remove ActionTestFailover from DRAction and VRGAction enums
- Remove TestFailover and TestFailedOver DRState constants
- Add DryRun field to DRPlacementControlSpec and VolumeReplicationGroupSpec
- Update all references to ActionTestFailover to check DryRun flag instead

The progression status ProgressionTestingFailover is retained as it provides
a unified indicator of test mode across both DRPC and VRG resources.

Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
- Upload continues from the primary managed cluster to S3 stores on both sides
- Upload stops from the failoverCluster on both S3 when dryRun is set to True when action is Failover

Signed-off-by: Aman Agrawal <aman_31dec@yahoo.in>
…pport

- VolumeSnapshots are taken during dryRun for RBD PVCs

- VolumeSnapshots are cleaned when dryRun is reverted, or test failover is promoted to a real
  failover

- Use specific dry-run-snapshot label to identify and delete those snapshots

- Implement VRG-specific snapshot creation and cleanup during test failover
  with proper workload isolation using labels, ensuring snapshots are managed
  across all namespaces for multi-namespace deployments

- Enable autoResync on Primary VRG during real failover to allow data sync
  from new primary to old primary (now secondary) after promoting from test
  to real failover

- Add support for reverting test failover without FailoverCluster specified,
  keeping VRGs as Primary on both sides and preserving snapshots until
  FailoverCluster is provided for real failover promotion

Assisted by AI

Signed-off-by: Aman Agrawal <aman_31dec@yahoo.in>
VRG Tests (10 tests - vrg_volrep_dryrun_test.go):
- Snapshot creation during dry-run failover (Primary VRG only)
- Snapshot cleanup on abort (Secondary transition) and promotion (DryRun→false)
- CephFS PVC filtering
- Idempotency and label-based selection

DRPC Tests (6 tests - drplacementcontrol_dryrun_test.go):
- DryRun field defaults and transitions
- DryRun compatibility with Failover and Relocate actions

Test infrastructure includes VolumeReplicationClass, StorageID labels,
and PeerClasses for reliable execution in envtest.

Signed-off-by: Aman Agrawal <aman_31dec@yahoo.in>
…DRPC to report it after failover

Signed-off-by: Aman Agrawal <aman_31dec@yahoo.in>
@am-agrawa am-agrawa force-pushed the 7937-add-drpc-new-action branch from faffdb6 to cfdcdc4 Compare April 13, 2026 09:25
- Optimize dry-run snapshot cleanup to avoid expensive List() calls
- Fix lint issues and other failures

Assisted by AI

Signed-off-by: Aman Agrawal <aman_31dec@yahoo.in>
@am-agrawa am-agrawa force-pushed the 7937-add-drpc-new-action branch from cfdcdc4 to 2ea1128 Compare April 13, 2026 10:28
@BenamarMk
Copy link
Copy Markdown
Member

I approved, e2e is failing. We'll merge once fixed.

@BenamarMk BenamarMk merged commit d7767d5 into RamenDR:main Apr 14, 2026
25 of 26 checks passed
@am-agrawa am-agrawa deleted the 7937-add-drpc-new-action branch April 14, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants