Deployment Guide — policy-gate

Safety Action SA-022 — External Process Watchdog (DA-01) Safety Action SA-073 — Init Authorization Guard

This document specifies the mandatory deployment requirements for production use of policy-gate. It covers:

External process watchdog (DA-01)
Init authorization and token management (SA-073)
Runtime configuration and environment assumptions

Key distinction: The internal 50 ms watchdog (SA-004) detects FSM-internal hangs. It does not protect against OS preemption, VM live-migration, SIGSTOP, or OOM-kill, where the thread never resumes and the internal watchdog never fires. An external process watchdog is required for full DA-01 coverage.

0. Initialization Security (SA-073 — Critical)

The firewall uses OnceLock<INIT_RESULT> — the first successful init() caller "owns" the initialization. This is a security-critical deployment step.

Production Requirement

Set POLICY_GATE_INIT_TOKEN environment variable and use init_with_token():

# Generate a random token (store in secret manager)
export POLICY_GATE_INIT_TOKEN=$(openssl rand -hex 32)

# Application code
policy_gate::init_with_token(
    std::env::var("POLICY_GATE_INIT_TOKEN").expect("SA-073: POLICY_GATE_INIT_TOKEN required"),
    FirewallProfile::Default
).expect("Firewall init failed");

Deployment Checklist

POLICY_GATE_INIT_TOKEN set in production environment (not in code)
Token stored in secret manager (KMS, Vault, etc.) — not in config files
Init called from trusted, controlled initialization path
Token rotation procedure documented (if compromised)
Monitoring: Alert on FirewallInitError::UnauthorizedInit in logs

Anti-Pattern (Do Not Use in Production)

// ❌ DO NOT: Hardcoded token, predictable, no environment control
policy_gate::init(); // Only for development/testing

1. systemd Deployment (Linux)

Service Unit

# /etc/systemd/system/policy-gate.service
[Unit]
Description=policy-gate safety gate
After=network.target
Requires=network.target

[Service]
Type=notify
ExecStart=/usr/local/bin/your-app-using-firewall
Restart=on-failure
RestartSec=2s

# DA-01: External process watchdog — firewall must respond within 5 s.
# The application must call sd_notify(0, "WATCHDOG=1") at least every 2.5 s
# (WatchdogSec / 2 is the recommended ping interval per systemd documentation).
WatchdogSec=5s
NotifyAccess=main

# Resource limits — prevent OOM-triggered starvation (DA-05)
MemoryMax=512M
CPUQuota=80%

# Isolation
NoNewPrivileges=true
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Application Watchdog Ping (Node.js)

import { notify } from 'sd-notify'; // npm install sd-notify

// Ping systemd every 2 s (well under the 5 s WatchdogSec deadline)
const WATCHDOG_INTERVAL_MS = 2_000;

setInterval(() => {
  notify(false, 'WATCHDOG=1');
}, WATCHDOG_INTERVAL_MS);

Verify

systemctl start policy-gate
systemctl status policy-gate       # confirm WatchdogSec appears
journalctl -u policy-gate -f       # monitor watchdog events

2. Kubernetes Deployment

Liveness Probe

The application must expose a GET /health endpoint that returns HTTP 200 when init() has succeeded and the firewall is operational.

// Minimal health endpoint (Express example)
import express from 'express';
import { isInitialised } from './firewall'; // wrapper around firewall-core init()

const app = express();

app.get('/health', (_req, res) => {
  if (isInitialised()) {
    res.status(200).json({ status: 'ok' });
  } else {
    res.status(503).json({ status: 'initialising' });
  }
});

Deployment YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: policy-gate-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: policy-gate-app
  template:
    metadata:
      labels:
        app: policy-gate-app
    spec:
      containers:
        - name: app
          image: your-registry/policy-gate-app:latest
          ports:
            - containerPort: 3000

          # DA-01: Liveness probe — kills and restarts the pod if the process
          # stops responding. Covers OS-level starvation and deadlocks that the
          # internal Rust watchdog cannot detect.
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 10   # allow time for firewall init()
            periodSeconds: 5          # check every 5 s
            failureThreshold: 3       # restart after 3 consecutive failures (15 s)
            timeoutSeconds: 2

          # Readiness probe — only route traffic after init() succeeds (SR-006)
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 3
            failureThreshold: 2

          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "512Mi"     # DA-05: prevent OOM-triggered starvation
              cpu: "500m"

3. Docker Healthcheck (standalone)

FROM node:22-slim
WORKDIR /app
COPY . .
RUN npm ci && npm run build

# DA-01: Docker built-in healthcheck as minimal external watchdog.
# Not a replacement for systemd WatchdogSec or Kubernetes livenessProbe
# in production, but suitable for development and testing environments.
HEALTHCHECK --interval=10s --timeout=3s --start-period=15s --retries=3 \
  CMD curl -fs http://localhost:3000/health || exit 1

CMD ["node", "dist/index.js"]

4. Process Restart on Crash

Regardless of the watchdog mechanism, configure automatic restart:

Environment	Restart mechanism
systemd	`Restart=on-failure`, `RestartSec=2s`
Kubernetes	`restartPolicy: Always` (default)
Docker Compose	`restart: unless-stopped`
PM2 (Node.js)	`pm2 start app.js --watch`

5. Operational Checklist

Ref	Requirement	How to satisfy
DA-01	External process watchdog	systemd `WatchdogSec=5s` OR Kubernetes `livenessProbe`
DA-05	OOM protection	`MemoryMax` (systemd) / `resources.limits.memory` (k8s)
OC-01	`init()` before any `evaluate()`	Checked by `INIT_RESULT OnceLock` in `firewall-core` + napi guard
OC-03	Audit entries persisted	Application must persist `AuditEntry` before acting on verdict
OC-04	`DiagnosticDisagreement` alerting within 24 h	Wire `onDisagreement` callback to alerting infrastructure
OC-05	`DiagnosticAgreement` review within 72 h	Wire `onAudit` + filter by `verdict_kind == DiagnosticAgreement`
SR-006	Fail if `init()` returns error	Application must not start evaluation if `firewall_init()` fails

6. Safety Evidence

This deployment guidance closes:

SA-022 — External process watchdog specification
DC-GAP-04 — OS-level starvation not covered by internal watchdog
PFH-05 — External process watchdog specification (§9.6)

The internal 50 ms Rust watchdog (SA-004) remains active and provides FSM-internal hang detection orthogonal to the external watchdog. Both mechanisms are complementary.

See SAFETY_MANUAL.md §8.2 DA-01 and §8.2 DA-01 for the corresponding safety argumentation.

7. Egress Testing

The egress firewall includes comprehensive test coverage in crates/firewall-core/tests/egress_channel_tests.rs with 37 tests:

Channel E: FSM-based PII/Leakage Detection

Sliding window leakage detection (5 tests): System prompt leakage, partial token leakage, boundary conditions at response start/end, overlapping window matches
Contextual PII detection (8 PII types): Credit Cards, SSN, Email, US Phone, International Phone, IPv4, IPv6, IBAN
False positive prevention (2 tests): Boilerplate code, factual responses
Edge cases (3 tests): Short prompts, empty responses, minimal responses
Unicode/encoding variations (2 tests): International phone detection, normalization handling

Channel F: Rule-based Entropy/Framing Detection

Entropy detection (4 tests): Base64, Hex encoded data, context-aware detection
Framing detection (5 tests): "The system prompt", "hidden instructions", "secret key", "private_key =", "secret_key ="
Boundary cases (2 tests): Base64 threshold, multiple framing patterns
Pass cases (3 tests): Encoding discussions, system prompt explanations, code without secrets

Combined E + F Integration Tests

Channel interaction (3 tests): Combined PII+framing, framing-only detection, safe response verification

The 1oo2D voter requires both channels to agree on Pass, with either channel able to block independently (fail-closed).

8. Multi-Tenant Deployment (Pillar 5)

Production multi-tenant environments must use the directory-based registry for isolated policy management.

Directory Structure

Each tenant is defined in its own .toml file. The filename becomes the tenant_id by default.

/etc/policy-gate/tenants/
├── default.toml        # Baseline policy for anonymous requests
├── customer-a.toml     # Strict policy for Customer A
└── customer-b.toml     # Relaxed policy for Developer B

Initialization (Pillar 5)

Initialization from a directory is authorized using exactly the same POLICY_GATE_INIT_TOKEN as single-tenant mode:

policy_gate::init_multi_tenant_registry(
    std::env::var("POLICY_GATE_INIT_TOKEN").expect("SA-073: token required"),
    "/etc/policy-gate/tenants/"
).expect("Failed to initialize tenant registry");

Fail-Closed Isolation (SA-048)

The orchestrator enforces strict isolation:

No Config: If no .toml exists for a requested tenant_id, the request is blocked as UnknownTenant.
Anonymous Access: If the default.toml has allow_anonymous_tenants = false (the default), requests without a tenant_id are blocked.

Multi-Tenant Audit Trails (OC-03)

In multi-tenant deployments, the application MUST ensure that AuditEntry metadata (including tenant_id and sequence) is persisted to a data store that supports per-tenant query and isolation.

Audit Field	Requirement
`tenant_id`	Mandatory - Used for logical data separation in the audit log.
`sequence`	Mandatory - Monotonically increasing audit counter. Global monotonic process-wide ordering is recommended; tenant scoping is optional at the storage layer, not required by the HMAC chain implementation.

See SAFETY_MANUAL.md for detailed multi-tenant safety argumentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment Guide — policy-gate

0. Initialization Security (SA-073 — Critical)

Production Requirement

Deployment Checklist

Anti-Pattern (Do Not Use in Production)

1. systemd Deployment (Linux)

Service Unit

Application Watchdog Ping (Node.js)

Verify

2. Kubernetes Deployment

Liveness Probe

Deployment YAML

3. Docker Healthcheck (standalone)

4. Process Restart on Crash

5. Operational Checklist

6. Safety Evidence

7. Egress Testing

Channel E: FSM-based PII/Leakage Detection

Channel F: Rule-based Entropy/Framing Detection

Combined E + F Integration Tests

8. Multi-Tenant Deployment (Pillar 5)

Directory Structure

Initialization (Pillar 5)

Fail-Closed Isolation (SA-048)

Multi-Tenant Audit Trails (OC-03)

FilesExpand file tree

deployment.md

Latest commit

History

deployment.md

File metadata and controls

Deployment Guide — policy-gate

0. Initialization Security (SA-073 — Critical)

Production Requirement

Deployment Checklist

Anti-Pattern (Do Not Use in Production)

1. systemd Deployment (Linux)

Service Unit

Application Watchdog Ping (Node.js)

Verify

2. Kubernetes Deployment

Liveness Probe

Deployment YAML

3. Docker Healthcheck (standalone)

4. Process Restart on Crash

5. Operational Checklist

6. Safety Evidence

7. Egress Testing

Channel E: FSM-based PII/Leakage Detection

Channel F: Rule-based Entropy/Framing Detection

Combined E + F Integration Tests

8. Multi-Tenant Deployment (Pillar 5)

Directory Structure

Initialization (Pillar 5)

Fail-Closed Isolation (SA-048)

Multi-Tenant Audit Trails (OC-03)