[Feature] LXC cross-cluster replication: support for containers with capability feature flags (mknod, keyctl, fuse, nfs, etc.)

> [!NOTE]
> This limitation was discovered and documented during investigation of issues [#455](https://github.com/PegaProx/project-pegaprox/issues/455) and [#456](https://github.com/PegaProx/project-pegaprox/issues/456). Both of those reports include a `[!CAUTION]` callout referencing this forthcoming feature request.

> [!WARNING]
> **A note from a Security Architect:** I do this for a living, so security is always on my mind, and I want to be upfront that neither of the implementation paths I'm proposing here are what I'd call ideal from a security standpoint. They're the lesser of available evils, forced on us by Proxmox's decision to hardcode an identity check (`root@pam`) instead of building a proper permission-based model. The correct fix is Proxmox implementing a grantable permission! Something like `VM.ConfigFeatureFlags`  that can be assigned to specific users or tokens with a full audit trail. That's been sitting in their bugzilla for years with no resolution. Until that happens, PegaProx is stuck choosing between "silently fail on feature flags" or "use elevated auth with guardrails." I'm proposing the latter because at least it works; however, I'd be doing you a disservice if I didn't flag that it's not the architecture I'd design from scratch.

## Describe the feature

Cross-cluster LXC replication currently fails silently for any container with feature flags other than `nesting=1`. The failure happens at the config step with:

```
403 Permission check failed (changing feature flags (except nesting) is only allowed for root@pam)
```

> [!NOTE]
> This affects any container using `mknod`, `keyctl`, `fuse`, `mount`, `nfs`, `cifs`, or any other capability flag other than `nesting`, which covers the majority of real-world LXC deployments that are security-conscious enough to use feature flags instead of running privileged containers!

The restriction is hardcoded in Proxmox's `LXC.pm` as an identity check, not a permission check:

```perl
return 1 if $authuser eq 'root@pam';
```

No role, realm, token, or privilege level bypasses it. Only a `root@pam` session ticket (not a token) passes. This has been reported to Proxmox directly! See open feature requests at bugzilla.proxmox.com [#6614](https://bugzilla.proxmox.com/show_bug.cgi?id=6614) and [#2582](https://bugzilla.proxmox.com/show_bug.cgi?id=2582).

This feature request proposes two implementation paths to handle this gracefully, plus a minimum viable preflight warning.

## Use Case

Any homelab or production environment running LXC containers with capability flags needs cross-cluster DR to actually work. Specifically:

- **ZeroTier LXC** — requires `mknod=1` for TUN device
- **Docker-in-LXC** — requires `keyctl=1,nesting=1`
- **NFS mounts in LXC** — requires `nfs=1`
- **FUSE filesystems** — requires `fuse=1`

Without this feature, cross-cluster replication is non-functional for the majority of real-world LXC containers. Site Recovery auto-failover silently fails because the replica is missing its capability flags and can't perform its intended function after failover.

## Minimum Viable Fix:  **Preflight Warning**

At minimum, detect feature flags on LXC containers when a replication job is configured and warn the user:

> ⚠️ **Warning:** CT {vmid} has feature flags (`{flags}`) that cannot be restored via the Proxmox API due to a Proxmox hardcoded restriction. The replica will be missing these flags after replication unless feature flag restore is explicitly enabled in the replication job settings.

This prevents users from filing bug reports for a problem that is not PegaProx's fault.

## _**Preferred Viable Fix**_:  **Offer Two Paths**

### Path A — root@pam Ticket + SSH Key Auth

**How it works:**
1. PegaProx requests a `root@pam` session ticket from the target cluster using stored credentials
2. Uses the ticket for the specific `PUT /lxc/{vmid}/config` call to restore feature flags
3. Discards the ticket immediately — never stored, never reused
4. Logs the elevated operation explicitly in the audit trail

**Confirmed working via live test:**

```bash
# Get root@pam ticket
curl -s -X POST 'https://10.0.0.2:8006/api2/json/access/ticket' \
  -d 'username=root@pam&password=<password>'

# Restore feature flags using ticket (not token) — no 403
curl -s -X PUT 'https://10.0.0.2:8006/api2/json/nodes/pve-source/lxc/101/config' \
  -H 'Cookie: PVEAuthCookie=<ticket>' \
  -H 'CSRFPreventionToken: <csrf-token>' \
  -d 'features=mknod%3D1'
# Returns: {"data":null}  ← success
```

**Security requirements (non-negotiable):**
- SSH key authentication must be configured between PegaProx and Proxmox nodes — password auth must not be accepted for this path. Users deploying feature flags instead of privileged containers are security-conscious by definition. Requiring SSH key auth as a prerequisite is consistent with that posture.
- Feature flag restore must be explicitly opted-in per replication job — disabled by default
- Ticket must be requested fresh per operation, never cached or stored
- Ticket scope must be limited to one specific config call only
- Full audit log entry required: "Feature flags {flags} restored on CT {vmid} via root@pam ticket — replicated from source, not new privileges"
- UI must surface that elevated auth was used for this job

### Path B — vzdump + pct restore --unique 0

**How it works:**
Replace clone+migrate with backup+restore for containers with feature flags. `pct restore` runs as root natively on the target node — no API token, no ticket, no 403.

**Benefits:**
- Solves hostname, MAC address, AND feature flags in one operation
- No SSH key requirement
- No elevated API auth
- No ticket handling

**Trade-offs:**
- Slower — full backup + restore cycle
- Requires more temporary storage during restore
- Better suited for less frequent DR schedules

**Implementation suggestion:**
- Auto-detect feature flags on LXC containers when replication job is configured
- If feature flags present, offer user choice of Path A or Path B
- Default to Path B if SSH key auth is not configured
- Path A available only when SSH key auth is confirmed

## Alternatives Considered

- **Strip features before migration, restore after** — requires rebooting a running production container to apply the feature deletion. Not acceptable.
- **Patch Proxmox LXC.pm** — unsupported, overwritten on every Proxmox update.
- **Run container as privileged** — defeats the entire purpose of using feature flags for security.
- **SSH as root with password auth** — works but violates security best practices. Not recommended.

## How important is this for you?

**Blocking my use case.** Cross-cluster replication is non-functional for any LXC container with capability flags, which covers the majority of real-world LXC deployments. Site Recovery failover silently produces a broken replica.

## Checklist

- [x] I have searched existing issues and discussions to make sure this hasn't been requested before

> [!TIP]
> I'm happy to assist with backend implementation as I have done in previous bug reports. The Python patches for the ticket approach and audit logging are within my wheelhouse. Front-end UI work for the preflight warning and opt-in settings is outside my skill set, so that piece I'll have to leave to you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] LXC cross-cluster replication: support for containers with capability feature flags (mknod, keyctl, fuse, nfs, etc.) #457

Describe the feature

Use Case

Minimum Viable Fix: Preflight Warning

Preferred Viable Fix: Offer Two Paths

Path A — root@pam Ticket + SSH Key Auth

Path B — vzdump + pct restore --unique 0

Alternatives Considered

How important is this for you?

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feature] LXC cross-cluster replication: support for containers with capability feature flags (mknod, keyctl, fuse, nfs, etc.) #457

Description

Describe the feature

Use Case

Minimum Viable Fix: Preflight Warning

Preferred Viable Fix: Offer Two Paths

Path A — root@pam Ticket + SSH Key Auth

Path B — vzdump + pct restore --unique 0

Alternatives Considered

How important is this for you?

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions