Skip to content

Node Storage Cleanup #613

@defo89

Description

@defo89

Summary

When Server transitions from Reserved to Available metal-operator should support wiping the storage.

Current behaviour:

  • During the rolling update, ServerClaim is removed and Server transitions to Available.
  • Server was using disk install -> data on disk is persisted including possible operational data (kubelet certs, static IP address configuration).
  • Server is used by another ServerClaim but network boot fails:
    • Server might re-join the cluster without Cluster machinery (Gardener, Cluster-API) being aware of this.
    • Server operates with previously configured IP address which is already "released" by IPAM controller.
    • Cluster operations is disrupted because of two above issues (wrong scheduling, duplicate IP addressing).

Possible options:

  • Cleanup via Redfish/BMC
  • Cleanup via ServerBootConfiguration (risk of network boot failing -> same issues as described above)

There should be 2 policies but this is open for discussion:

  • Cleanup root disk only. This will help to redeploy rook/ceph clusters and persists OSDs on node redeployment.
  • Multi-tenancy: all storage disks will be wiped on Server and it can be safely used by next tenant.

Metadata

Metadata

Assignees

Projects

Status

Backlog

Status

No status

Relationships

None yet

Development

No branches or pull requests

Issue actions