Skip to content

[Feature] LXC cross-cluster replication: support for containers with capability feature flags (mknod, keyctl, fuse, nfs, etc.) #457

Description

@DarmokNoob

Note

This limitation was discovered and documented during investigation of issues #455 and #456. Both of those reports include a [!CAUTION] callout referencing this forthcoming feature request.

Warning

A note from a Security Architect: I do this for a living, so security is always on my mind, and I want to be upfront that neither of the implementation paths I'm proposing here are what I'd call ideal from a security standpoint. They're the lesser of available evils, forced on us by Proxmox's decision to hardcode an identity check (root@pam) instead of building a proper permission-based model. The correct fix is Proxmox implementing a grantable permission! Something like VM.ConfigFeatureFlags that can be assigned to specific users or tokens with a full audit trail. That's been sitting in their bugzilla for years with no resolution. Until that happens, PegaProx is stuck choosing between "silently fail on feature flags" or "use elevated auth with guardrails." I'm proposing the latter because at least it works; however, I'd be doing you a disservice if I didn't flag that it's not the architecture I'd design from scratch.

Describe the feature

Cross-cluster LXC replication currently fails silently for any container with feature flags other than nesting=1. The failure happens at the config step with:

403 Permission check failed (changing feature flags (except nesting) is only allowed for root@pam)

Note

This affects any container using mknod, keyctl, fuse, mount, nfs, cifs, or any other capability flag other than nesting, which covers the majority of real-world LXC deployments that are security-conscious enough to use feature flags instead of running privileged containers!

The restriction is hardcoded in Proxmox's LXC.pm as an identity check, not a permission check:

return 1 if $authuser eq 'root@pam';

No role, realm, token, or privilege level bypasses it. Only a root@pam session ticket (not a token) passes. This has been reported to Proxmox directly! See open feature requests at bugzilla.proxmox.com #6614 and #2582.

This feature request proposes two implementation paths to handle this gracefully, plus a minimum viable preflight warning.

Use Case

Any homelab or production environment running LXC containers with capability flags needs cross-cluster DR to actually work. Specifically:

  • ZeroTier LXC — requires mknod=1 for TUN device
  • Docker-in-LXC — requires keyctl=1,nesting=1
  • NFS mounts in LXC — requires nfs=1
  • FUSE filesystems — requires fuse=1

Without this feature, cross-cluster replication is non-functional for the majority of real-world LXC containers. Site Recovery auto-failover silently fails because the replica is missing its capability flags and can't perform its intended function after failover.

Minimum Viable Fix: Preflight Warning

At minimum, detect feature flags on LXC containers when a replication job is configured and warn the user:

⚠️ Warning: CT {vmid} has feature flags ({flags}) that cannot be restored via the Proxmox API due to a Proxmox hardcoded restriction. The replica will be missing these flags after replication unless feature flag restore is explicitly enabled in the replication job settings.

This prevents users from filing bug reports for a problem that is not PegaProx's fault.

Preferred Viable Fix: Offer Two Paths

Path A — root@pam Ticket + SSH Key Auth

How it works:

  1. PegaProx requests a root@pam session ticket from the target cluster using stored credentials
  2. Uses the ticket for the specific PUT /lxc/{vmid}/config call to restore feature flags
  3. Discards the ticket immediately — never stored, never reused
  4. Logs the elevated operation explicitly in the audit trail

Confirmed working via live test:

# Get root@pam ticket
curl -s -X POST 'https://10.0.0.2:8006/api2/json/access/ticket' \
  -d 'username=root@pam&password=<password>'

# Restore feature flags using ticket (not token) — no 403
curl -s -X PUT 'https://10.0.0.2:8006/api2/json/nodes/pve-source/lxc/101/config' \
  -H 'Cookie: PVEAuthCookie=<ticket>' \
  -H 'CSRFPreventionToken: <csrf-token>' \
  -d 'features=mknod%3D1'
# Returns: {"data":null}  ← success

Security requirements (non-negotiable):

  • SSH key authentication must be configured between PegaProx and Proxmox nodes — password auth must not be accepted for this path. Users deploying feature flags instead of privileged containers are security-conscious by definition. Requiring SSH key auth as a prerequisite is consistent with that posture.
  • Feature flag restore must be explicitly opted-in per replication job — disabled by default
  • Ticket must be requested fresh per operation, never cached or stored
  • Ticket scope must be limited to one specific config call only
  • Full audit log entry required: "Feature flags {flags} restored on CT {vmid} via root@pam ticket — replicated from source, not new privileges"
  • UI must surface that elevated auth was used for this job

Path B — vzdump + pct restore --unique 0

How it works:
Replace clone+migrate with backup+restore for containers with feature flags. pct restore runs as root natively on the target node — no API token, no ticket, no 403.

Benefits:

  • Solves hostname, MAC address, AND feature flags in one operation
  • No SSH key requirement
  • No elevated API auth
  • No ticket handling

Trade-offs:

  • Slower — full backup + restore cycle
  • Requires more temporary storage during restore
  • Better suited for less frequent DR schedules

Implementation suggestion:

  • Auto-detect feature flags on LXC containers when replication job is configured
  • If feature flags present, offer user choice of Path A or Path B
  • Default to Path B if SSH key auth is not configured
  • Path A available only when SSH key auth is confirmed

Alternatives Considered

  • Strip features before migration, restore after — requires rebooting a running production container to apply the feature deletion. Not acceptable.
  • Patch Proxmox LXC.pm — unsupported, overwritten on every Proxmox update.
  • Run container as privileged — defeats the entire purpose of using feature flags for security.
  • SSH as root with password auth — works but violates security best practices. Not recommended.

How important is this for you?

Blocking my use case. Cross-cluster replication is non-functional for any LXC container with capability flags, which covers the majority of real-world LXC deployments. Site Recovery failover silently produces a broken replica.

Checklist

  • I have searched existing issues and discussions to make sure this hasn't been requested before

Tip

I'm happy to assist with backend implementation as I have done in previous bug reports. The Python patches for the ticket approach and audit logging are within my wheelhouse. Front-end UI work for the preflight warning and opt-in settings is outside my skill set, so that piece I'll have to leave to you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions