Safety Action SA-022 — External Process Watchdog (DA-01) Safety Action SA-073 — Init Authorization Guard
This document specifies the mandatory deployment requirements for production use of policy-gate. It covers:
- External process watchdog (DA-01)
- Init authorization and token management (SA-073)
- Runtime configuration and environment assumptions
Key distinction: The internal 50 ms watchdog (SA-004) detects FSM-internal hangs. It does not protect against OS preemption, VM live-migration, SIGSTOP, or OOM-kill, where the thread never resumes and the internal watchdog never fires. An external process watchdog is required for full DA-01 coverage.
The firewall uses OnceLock<INIT_RESULT> — the first successful init() caller "owns" the initialization. This is a security-critical deployment step.
Set POLICY_GATE_INIT_TOKEN environment variable and use init_with_token():
# Generate a random token (store in secret manager)
export POLICY_GATE_INIT_TOKEN=$(openssl rand -hex 32)
# Application code
policy_gate::init_with_token(
std::env::var("POLICY_GATE_INIT_TOKEN").expect("SA-073: POLICY_GATE_INIT_TOKEN required"),
FirewallProfile::Default
).expect("Firewall init failed");-
POLICY_GATE_INIT_TOKENset in production environment (not in code) - Token stored in secret manager (KMS, Vault, etc.) — not in config files
- Init called from trusted, controlled initialization path
- Token rotation procedure documented (if compromised)
- Monitoring: Alert on
FirewallInitError::UnauthorizedInitin logs
// ❌ DO NOT: Hardcoded token, predictable, no environment control
policy_gate::init(); // Only for development/testing# /etc/systemd/system/policy-gate.service
[Unit]
Description=policy-gate safety gate
After=network.target
Requires=network.target
[Service]
Type=notify
ExecStart=/usr/local/bin/your-app-using-firewall
Restart=on-failure
RestartSec=2s
# DA-01: External process watchdog — firewall must respond within 5 s.
# The application must call sd_notify(0, "WATCHDOG=1") at least every 2.5 s
# (WatchdogSec / 2 is the recommended ping interval per systemd documentation).
WatchdogSec=5s
NotifyAccess=main
# Resource limits — prevent OOM-triggered starvation (DA-05)
MemoryMax=512M
CPUQuota=80%
# Isolation
NoNewPrivileges=true
PrivateTmp=true
[Install]
WantedBy=multi-user.targetimport { notify } from 'sd-notify'; // npm install sd-notify
// Ping systemd every 2 s (well under the 5 s WatchdogSec deadline)
const WATCHDOG_INTERVAL_MS = 2_000;
setInterval(() => {
notify(false, 'WATCHDOG=1');
}, WATCHDOG_INTERVAL_MS);systemctl start policy-gate
systemctl status policy-gate # confirm WatchdogSec appears
journalctl -u policy-gate -f # monitor watchdog eventsThe application must expose a GET /health endpoint that returns HTTP 200 when init() has succeeded and the firewall is operational.
// Minimal health endpoint (Express example)
import express from 'express';
import { isInitialised } from './firewall'; // wrapper around firewall-core init()
const app = express();
app.get('/health', (_req, res) => {
if (isInitialised()) {
res.status(200).json({ status: 'ok' });
} else {
res.status(503).json({ status: 'initialising' });
}
});apiVersion: apps/v1
kind: Deployment
metadata:
name: policy-gate-app
spec:
replicas: 2
selector:
matchLabels:
app: policy-gate-app
template:
metadata:
labels:
app: policy-gate-app
spec:
containers:
- name: app
image: your-registry/policy-gate-app:latest
ports:
- containerPort: 3000
# DA-01: Liveness probe — kills and restarts the pod if the process
# stops responding. Covers OS-level starvation and deadlocks that the
# internal Rust watchdog cannot detect.
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 10 # allow time for firewall init()
periodSeconds: 5 # check every 5 s
failureThreshold: 3 # restart after 3 consecutive failures (15 s)
timeoutSeconds: 2
# Readiness probe — only route traffic after init() succeeds (SR-006)
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 3
failureThreshold: 2
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "512Mi" # DA-05: prevent OOM-triggered starvation
cpu: "500m"FROM node:22-slim
WORKDIR /app
COPY . .
RUN npm ci && npm run build
# DA-01: Docker built-in healthcheck as minimal external watchdog.
# Not a replacement for systemd WatchdogSec or Kubernetes livenessProbe
# in production, but suitable for development and testing environments.
HEALTHCHECK --interval=10s --timeout=3s --start-period=15s --retries=3 \
CMD curl -fs http://localhost:3000/health || exit 1
CMD ["node", "dist/index.js"]Regardless of the watchdog mechanism, configure automatic restart:
| Environment | Restart mechanism |
|---|---|
| systemd | Restart=on-failure, RestartSec=2s |
| Kubernetes | restartPolicy: Always (default) |
| Docker Compose | restart: unless-stopped |
| PM2 (Node.js) | pm2 start app.js --watch |
| Ref | Requirement | How to satisfy |
|---|---|---|
| DA-01 | External process watchdog | systemd WatchdogSec=5s OR Kubernetes livenessProbe |
| DA-05 | OOM protection | MemoryMax (systemd) / resources.limits.memory (k8s) |
| OC-01 | init() before any evaluate() |
Checked by INIT_RESULT OnceLock in firewall-core + napi guard |
| OC-03 | Audit entries persisted | Application must persist AuditEntry before acting on verdict |
| OC-04 | DiagnosticDisagreement alerting within 24 h |
Wire onDisagreement callback to alerting infrastructure |
| OC-05 | DiagnosticAgreement review within 72 h |
Wire onAudit + filter by verdict_kind == DiagnosticAgreement |
| SR-006 | Fail if init() returns error |
Application must not start evaluation if firewall_init() fails |
This deployment guidance closes:
- SA-022 — External process watchdog specification
- DC-GAP-04 — OS-level starvation not covered by internal watchdog
- PFH-05 — External process watchdog specification (§9.6)
The internal 50 ms Rust watchdog (SA-004) remains active and provides FSM-internal hang detection orthogonal to the external watchdog. Both mechanisms are complementary.
See SAFETY_MANUAL.md §8.2 DA-01 and §8.2 DA-01 for the corresponding safety argumentation.
The egress firewall includes comprehensive test coverage in crates/firewall-core/tests/egress_channel_tests.rs with 37 tests:
- Sliding window leakage detection (5 tests): System prompt leakage, partial token leakage, boundary conditions at response start/end, overlapping window matches
- Contextual PII detection (8 PII types): Credit Cards, SSN, Email, US Phone, International Phone, IPv4, IPv6, IBAN
- False positive prevention (2 tests): Boilerplate code, factual responses
- Edge cases (3 tests): Short prompts, empty responses, minimal responses
- Unicode/encoding variations (2 tests): International phone detection, normalization handling
- Entropy detection (4 tests): Base64, Hex encoded data, context-aware detection
- Framing detection (5 tests): "The system prompt", "hidden instructions", "secret key", "private_key =", "secret_key ="
- Boundary cases (2 tests): Base64 threshold, multiple framing patterns
- Pass cases (3 tests): Encoding discussions, system prompt explanations, code without secrets
- Channel interaction (3 tests): Combined PII+framing, framing-only detection, safe response verification
The 1oo2D voter requires both channels to agree on Pass, with either channel able to block independently (fail-closed).
Production multi-tenant environments must use the directory-based registry for isolated policy management.
Each tenant is defined in its own .toml file. The filename becomes the tenant_id by default.
/etc/policy-gate/tenants/
├── default.toml # Baseline policy for anonymous requests
├── customer-a.toml # Strict policy for Customer A
└── customer-b.toml # Relaxed policy for Developer B
Initialization from a directory is authorized using exactly the same POLICY_GATE_INIT_TOKEN as single-tenant mode:
policy_gate::init_multi_tenant_registry(
std::env::var("POLICY_GATE_INIT_TOKEN").expect("SA-073: token required"),
"/etc/policy-gate/tenants/"
).expect("Failed to initialize tenant registry");The orchestrator enforces strict isolation:
- No Config: If no
.tomlexists for a requestedtenant_id, the request is blocked asUnknownTenant. - Anonymous Access: If the
default.tomlhasallow_anonymous_tenants = false(the default), requests without atenant_idare blocked.
In multi-tenant deployments, the application MUST ensure that AuditEntry metadata (including tenant_id and sequence) is persisted to a data store that supports per-tenant query and isolation.
| Audit Field | Requirement |
|---|---|
tenant_id |
Mandatory - Used for logical data separation in the audit log. |
sequence |
Mandatory - Monotonically increasing audit counter. Global monotonic process-wide ordering is recommended; tenant scoping is optional at the storage layer, not required by the HMAC chain implementation. |
See SAFETY_MANUAL.md for detailed multi-tenant safety argumentation.