Summary
When Server transitions from Reserved to Available metal-operator should support wiping the storage.
Current behaviour:
- During the rolling update,
ServerClaim is removed and Server transitions to Available.
- Server was using disk install -> data on disk is persisted including possible operational data (kubelet certs, static IP address configuration).
- Server is used by another
ServerClaim but network boot fails:
- Server might re-join the cluster without Cluster machinery (Gardener, Cluster-API) being aware of this.
- Server operates with previously configured IP address which is already "released" by IPAM controller.
- Cluster operations is disrupted because of two above issues (wrong scheduling, duplicate IP addressing).
Possible options:
- Cleanup via Redfish/BMC
- Cleanup via
ServerBootConfiguration (risk of network boot failing -> same issues as described above)
There should be 2 policies but this is open for discussion:
- Cleanup root disk only. This will help to redeploy rook/ceph clusters and persists OSDs on node redeployment.
- Multi-tenancy: all storage disks will be wiped on Server and it can be safely used by next tenant.
Summary
When
Servertransitions fromReservedtoAvailablemetal-operator should support wiping the storage.Current behaviour:
ServerClaimis removed and Server transitions toAvailable.ServerClaimbut network boot fails:Possible options:
ServerBootConfiguration(risk of network boot failing -> same issues as described above)There should be 2 policies but this is open for discussion: