Comprehensive API documentation for the agentic-sandbox management server.
The management server exposes three network interfaces:
| Port | Protocol | Purpose |
|---|---|---|
| 8120 | gRPC | Agent bidirectional communication |
| 8121 | WebSocket | Real-time output streaming for dashboard |
| 8122 | HTTP | REST API and web dashboard |
gRPC (Agents): Secure transport provisions authenticate agents with mTLS
client identity material generated during VM provisioning. Plain TCP has no
transport identity and is rejected; the legacy x-agent-secret compatibility
path was retired in #412.
HTTP/WebSocket: No authentication required for local-host operator access. Exception: the AIWG executor-contract route POST /api/v1/sessions/:id/dispatch requires Authorization: Bearer <token> where the token is issued by aiwg serve at executor registration. See AIWG Executor Contract for the full integration.
All HTTP endpoints return JSON. Error responses follow this structure:
{
"error": {
"code": "ERROR_CODE",
"message": "Human-readable error message"
}
}Base URL: http://localhost:8122 for the default loopback-only development
listener. Production remote access should use the TLS/admin listener or a
trusted tunnel; plaintext non-loopback management TCP is rejected unless the
operator sets an explicit unsafe override.
Simple liveness probe. Returns 200 if server is running.
Response: 200 OK with JSON body {"status":"alive"}
Example:
curl http://localhost:8122/healthzReadiness probe. Returns 200 if server is ready to accept traffic.
Response:
{
"ready": true,
"reason": "agents_connected"
}Status Codes:
200- Ready503- Not ready (returns reason)
Example:
curl http://localhost:8122/readyzDetailed health check with metrics.
Response:
{
"status": "healthy",
"uptime_seconds": 0,
"agent_count": 2,
"active_tasks": 0
}Example:
curl http://localhost:8122/healthz/deepBounded libvirt RPC health probe. Returns 200 when libvirt answers within
the read budget, or 503 when libvirt is down, slow, or the fail-fast circuit
is open.
Response:
{
"status": "healthy",
"libvirt": "alive"
}On timeout the response uses the same structured VM error body as
/api/v1/vms and includes Retry-After.
Example:
curl -i http://localhost:8122/healthz/libvirtPrometheus metrics endpoint.
Response: Prometheus text format
Example:
curl http://localhost:8122/metricsList all connected agents with their status and metrics.
Response:
{
"agents": [
{
"id": "agent-01",
"hostname": "agent-01",
"ip_address": "192.168.122.201",
"status": "Ready",
"connected_at": 1706572800000,
"last_heartbeat": 1706572830000,
"metrics": {
"cpu_percent": 2.3,
"memory_used_bytes": 536870912,
"memory_total_bytes": 8589934592,
"disk_used_bytes": 2147483648,
"disk_total_bytes": 53687091200,
"load_avg": [0.15, 0.20, 0.18],
"uptime_seconds": 3600
},
"system_info": {
"os": "Ubuntu 24.04",
"kernel": "6.8.0-generic",
"cpu_cores": 4,
"memory_bytes": 8589934592,
"disk_bytes": 53687091200
}
}
]
}Field Descriptions:
status:"Starting","Ready","Busy","Error","ShuttingDown","Stale","Disconnected"connected_at: Unix timestamp (milliseconds)last_heartbeat: Unix timestamp (milliseconds)metrics: Optional, current resource usagesystem_info: Optional, VM hardware information
Example:
curl http://localhost:8122/api/v1/agentsVM endpoints are QEMU-specific.
List all VMs managed by libvirt.
Query Parameters:
state(string, default: "all") - Filter by state:"running","stopped","all"prefix(string, default: "agent-") - Filter by name prefix. Use"*"for all VMs.
Response:
{
"vms": [
{
"name": "agent-01",
"state": "running",
"uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"vcpus": 4,
"memory_mb": 8192,
"ip_address": "192.168.122.201",
"uptime_seconds": null
}
],
"total": 1
}States:
"running","stopped","paused","shutdown","crashed","suspended","unknown"
Example:
# List all agent VMs
curl http://localhost:8122/api/v1/vms
# List only running VMs
curl http://localhost:8122/api/v1/vms?state=running
# List all VMs (including non-agent VMs)
curl http://localhost:8122/api/v1/vms?prefix=*Get detailed information about a specific VM.
Response:
{
"name": "agent-01",
"state": "running",
"uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"vcpus": 4,
"memory_mb": 8192,
"ip_address": "192.168.122.201",
"uptime_seconds": null,
"agent": {
"connected": true,
"connected_at": 1706572800000,
"hostname": "agent-01"
}
}Status Codes:
200- Success404- VM not found
Example:
curl http://localhost:8122/api/v1/vms/agent-01Create a new VM using the provisioning script.
Request Body:
{
"name": "agent-03",
"profile": "agentic-dev",
"vcpus": 4,
"memory_mb": 8192,
"disk_gb": 50,
"agentshare": true,
"start": true,
"ssh_key": "/home/user/.ssh/id_ed25519.pub"
}Field Descriptions:
name(string, required) - VM name (must match^agent-[a-z0-9-]+$)profile(string, default: "agentic-dev") - Provisioning profile:"agentic-dev","basic"vcpus(u32, default: 4) - Number of CPU coresmemory_mb(u64, default: 8192) - Memory in megabytesdisk_gb(u64, default: 50) - Disk size in gigabytesagentshare(bool, default: true) - Enable virtiofs shared storagestart(bool, default: true) - Start VM after provisioningssh_key(string, optional) - Dev/break-glass direct-runtime SSH public key. This bypasses gateway policy/audit guarantees and is not the managed-profile SSH direction. Managedagentic-devprovisioning omits unmanaged direct runtime SSH keys by default; setAGENTIC_ENABLE_DIRECT_RUNTIME_SSH=1only for explicit dev/break-glass access. SeeADR-029.
Response: 202 Accepted
{
"operation": {
"id": "op-12345678-1234-1234-1234-123456789abc",
"type": "vm_create",
"status": "pending",
"target": "agent-03",
"created_at": "2024-01-30T12:00:00Z",
"progress_percent": 0
},
"vm": null
}Status Codes:
202- Accepted (provisioning started)400- Invalid request (e.g., invalid VM name)409- VM already exists
Error Codes:
INVALID_VM_NAME- Name doesn't match required patternVM_ALREADY_EXISTS- VM with this name already existsPROVISIONING_ERROR- Provisioning script failed
Example:
curl -X POST http://localhost:8122/api/v1/vms \
-H "Content-Type: application/json" \
-d '{
"name": "agent-03",
"profile": "agentic-dev",
"vcpus": 4,
"memory_mb": 8192,
"disk_gb": 50,
"agentshare": true,
"start": true
}'
# Minimal request (uses all defaults)
curl -X POST http://localhost:8122/api/v1/vms \
-H "Content-Type: application/json" \
-d '{"name": "agent-04"}'Start a stopped VM.
Response:
{
"vm": {
"name": "agent-01",
"state": "running"
},
"message": null
}Status Codes:
200- Success (idempotent - returns 200 even if already running)
Example:
curl -X POST http://localhost:8122/api/v1/vms/agent-01/startGracefully stop a running VM (ACPI shutdown).
Response:
{
"vm": {
"name": "agent-01",
"state": "shutdown"
},
"message": "Graceful shutdown initiated"
}Status Codes:
200- Success (idempotent)
Example:
curl -X POST http://localhost:8122/api/v1/vms/agent-01/stopForce stop a running VM (immediate termination).
Response:
{
"vm": {
"name": "agent-01",
"state": "stopped"
},
"message": "VM destroyed"
}Status Codes:
200- Success (idempotent)
Example:
curl -X POST http://localhost:8122/api/v1/vms/agent-01/destroyRestart a running VM.
Request Body:
{
"mode": "graceful",
"timeout_seconds": 60
}Field Descriptions:
mode(string, default: "graceful") - Restart mode:"graceful"(ACPI shutdown) or"hard"(force destroy)timeout_seconds(u64, default: 60) - Timeout for graceful shutdown before forcing
Response: 202 Accepted
{
"operation": {
"id": "op-12345678-1234-1234-1234-123456789abc",
"type": "vm_restart",
"status": "pending",
"target": "agent-01",
"created_at": "2024-01-30T12:00:00Z",
"progress_percent": 0
},
"vm": null
}Status Codes:
202- Accepted404- VM not found409- VM not running
Example:
# Graceful restart with default timeout
curl -X POST http://localhost:8122/api/v1/vms/agent-01/restart \
-H "Content-Type: application/json" \
-d '{"mode": "graceful", "timeout_seconds": 60}'
# Hard restart (immediate)
curl -X POST http://localhost:8122/api/v1/vms/agent-01/restart \
-H "Content-Type: application/json" \
-d '{"mode": "hard"}'Delete a VM definition from libvirt.
Query Parameters:
delete_disk(bool, default: false) - Also delete VM disk imageforce(bool, default: false) - Force delete even if running
Response:
{
"deleted": true,
"name": "agent-01",
"disk_deleted": true
}Status Codes:
200- Success404- VM not found409- VM is running and force=false
Error Codes:
VM_NOT_FOUND- VM doesn't existVM_RUNNING- VM is running and force not set
Example:
# Delete VM (keep disk)
curl -X DELETE http://localhost:8122/api/v1/vms/agent-01
# Delete VM and disk
curl -X DELETE "http://localhost:8122/api/v1/vms/agent-01?delete_disk=true"
# Force delete running VM
curl -X DELETE "http://localhost:8122/api/v1/vms/agent-01?force=true&delete_disk=true"Deploy agent binary to a running VM.
Response: 202 Accepted
{
"operation": {
"id": "op-12345678-1234-1234-1234-123456789abc",
"type": "vm_create",
"status": "pending",
"target": "agent-01",
"created_at": "2024-01-30T12:00:00Z",
"progress_percent": 0
},
"vm": null
}Status Codes:
202- Accepted404- VM not found409- VM not running
Example:
curl -X POST http://localhost:8122/api/v1/vms/agent-01/deploy-agentLong-running operations (VM create, restart, deploy) return operation IDs that can be polled for status.
Get operation status.
Response:
{
"id": "op-12345678-1234-1234-1234-123456789abc",
"type": "vm_create",
"status": "completed",
"target": "agent-03",
"created_at": "2024-01-30T12:00:00Z",
"completed_at": "2024-01-30T12:05:00Z",
"progress_percent": 100,
"result": {
"vm": {
"name": "agent-03",
"state": "running"
}
}
}Field Descriptions:
type:"vm_create","vm_delete","vm_restart"status:"pending","running","completed","failed"progress_percent: 0-100result: Operation-specific result data (only on completion)
Failed Operation Response:
{
"id": "op-12345678-1234-1234-1234-123456789abc",
"type": "vm_create",
"status": "failed",
"error": "Provisioning script failed with exit code 1",
"target": "agent-03",
"created_at": "2024-01-30T12:00:00Z",
"completed_at": "2024-01-30T12:02:00Z",
"progress_percent": 20
}Status Codes:
200- Success404- Operation not found
Example:
curl http://localhost:8122/api/v1/operations/op-12345678-1234-1234-1234-123456789abcVM lifecycle and agent events are tracked and available for querying.
Receive event from the Rust vm-event-bridge service (internal use).
Request Body:
{
"event_type": "vm.started",
"vm_name": "agent-01",
"timestamp": "2024-01-30T12:00:00Z",
"details": {
"reason": "manual"
},
"agent_id": "agent-01",
"trace_id": null
}Response:
{
"received": true
}List recent events across all VMs and agents.
Response:
{
"events": [
{
"event_type": "vm.started",
"vm_name": "agent-01",
"timestamp": "2024-01-30T12:00:00Z",
"details": {
"reason": "manual"
},
"agent_id": "agent-01",
"trace_id": null
}
],
"total_count": 42,
"last_event_id": 42
}Event Types:
VM Lifecycle:
vm.started,vm.stopped,vm.crashed,vm.shutdown,vm.rebootedvm.suspended,vm.resumed,vm.defined,vm.undefined,vm.pmsuspended
Agent Events:
agent.connected,agent.disconnected,agent.registered,agent.heartbeatagent.command.started,agent.command.completedagent.pty.created,agent.pty.closed
Session Reconciliation:
session.query_sent,session.report_receivedsession.reconcile_started,session.reconcile_completesession.killed,session.preserved,session.reconcile_failed
Example:
curl http://localhost:8122/api/v1/eventsThe HTTP credential proxy is the ADR-028 backend for web/API integrations that
can use a broker instead of receiving raw upstream secrets. A workload submits
a lease reference, matching agent/instance/session scope, and target HTTP
request. The server validates the active lease and proxy_policy, injects the
credential only into the outbound upstream request, and redacts that credential
from returned headers and body.
See Credential Proxy for the full policy and runtime guidance.
Proxy one HTTP request through an active credential lease.
Request Body:
{
"lease_id": "lease_...",
"agent_id": "agent-01",
"instance_id": "agent-01",
"session_id": "session-01",
"method": "GET",
"url": "https://api.example.test/v1/resource",
"headers": {
"accept": "application/json"
},
"body": null
}Response:
{
"status": 200,
"headers": {
"content-type": "application/json"
},
"body": "{\"ok\":true}"
}The proxy denies missing, revoked, expired, scope-mismatched, or policyless leases. It also denies targets outside the lease policy's allowed hosts, path prefixes, methods, and workload-supplied header allowlist.
The gateway SSH lease API is the #531 credential contract for gateway-mediated SSH access. It requires an authenticated operator identity and issues short-lived, principal-scoped lease records for the SSH connector and CLI path. It does not proxy SSH bytes and it does not persist private keys, certificate bodies, command payloads, or transcript data.
When AGENTIC_GATEWAY_SSH_CA_KEY points at an OpenSSH CA private key, lease
issuance signs the submitted public key and returns the OpenSSH user
certificate in the POST response only. List/get/audit paths retain only
metadata and fingerprints.
Runtime trust is opt-in during VM provisioning. Set
AGENTIC_GATEWAY_SSH_CA_PUBLIC_KEY_HOST_PATH to the OpenSSH CA public key, or
set AGENTIC_GATEWAY_SSH_CA_KEY and keep the matching .pub file beside it.
Cloud-init writes only the public CA key into the guest, configures
TrustedUserCAKeys, and restricts accepted certificate principals through
AuthorizedPrincipalsFile /etc/ssh/agentic-authorized-principals/%u.
By default the provisioner authorizes only the service user principal
(agent) for the service account. Operators may override the target user with
AGENTIC_GATEWAY_SSH_AUTHORIZED_USER and the accepted certificate principals
with AGENTIC_GATEWAY_SSH_AUTHORIZED_PRINCIPALS (comma or space separated).
Private CA key material is never written into cloud-init user-data.
The gateway SSH connector is the #530 point-to-point byte-stream backend for gateway-mediated SSH. It is opt-in and separate from the terminal WebSocket and PTY fanout paths. The connector reads one newline-delimited JSON prelude from the client, resolves the requested instance to a configured runtime SSH endpoint, records session audit events, and then proxies the remaining SSH stream without retaining or rebroadcasting payload bytes.
Enable the listener with AGENTIC_GATEWAY_SSH_LISTEN, for example
127.0.0.1:8124. Provide explicit runtime targets with
AGENTIC_GATEWAY_SSH_TARGETS as a comma-separated map, and provide explicit
routing policy with AGENTIC_GATEWAY_SSH_ALLOWLIST as actor=instance rules.
The instance side may be * for a controlled break-glass actor, and the actor
side may be * for a controlled target-wide rule:
AGENTIC_GATEWAY_SSH_LISTEN=127.0.0.1:8124
AGENTIC_GATEWAY_SSH_TARGETS=agent-01=127.0.0.1:2222,agent-02=127.0.0.1:2223
AGENTIC_GATEWAY_SSH_ALLOWLIST=operator@example.test=agent-01The client prelude format is:
{"actor":"operator@example.test","instance_id":"agent-01","access_mode":"ssh"}The prelude is followed by a newline and then the raw SSH stream. Operators
normally use sandboxctl ssh, which hides this framing behind an OpenSSH
ProxyCommand and requests a short-lived gateway SSH lease when a local public
key is available:
sandboxctl ssh agent-01By default the CLI connects to the gateway connector at 127.0.0.1:8124, or
the address in AGENTIC_GATEWAY_SSH_CONNECT. Use --gateway to override it
per call. The connector prelude actor comes from --actor,
AGENTIC_GATEWAY_SSH_ACTOR, the active context role, or $USER, in that order,
and must match the connector allowlist. Lease API actor metadata is derived from
the authenticated operator identity rather than the request body.
Advanced OpenSSH tools can use generated config:
sandboxctl ssh-config agent-01 > /tmp/agent-01.ssh_config
ssh -F /tmp/agent-01.ssh_config agent-01
scp -F /tmp/agent-01.ssh_config ./artifact.txt agent-01:/tmp/artifact.txt
sftp -F /tmp/agent-01.ssh_config agent-01Generated config routes through sandboxctl ssh-proxy; it is point-to-point
SSH through the gateway connector, not pty-ws fanout, replay, observers, or
multi-controller session sharing.
Managed agentic-dev VM provisioning omits direct-runtime authorized_keys
by default. The basic profile remains the dev/break-glass direct SSH profile,
and operators may explicitly opt back into direct runtime SSH keys with
AGENTIC_ENABLE_DIRECT_RUNTIME_SSH=1.
Issue a metadata-only SSH access lease.
Request Body:
{
"actor": "operator@example.test",
"instance_id": "agent-01",
"principal": "agent",
"access_mode": "ssh",
"public_key": "ssh-ed25519 AAAA... operator@example.test",
"ttl_seconds": 900
}Response: 201 Created
{
"id": "sshlease_...",
"actor": "operator@example.test",
"instance_id": "agent-01",
"principal": "agent",
"access_mode": "ssh",
"public_key_sha256": "sha256:...",
"issued_at": "2026-06-22T01:00:00Z",
"expires_at": "2026-06-22T01:15:00Z",
"ttl_seconds": 900,
"state": "active",
"certificate_key_id": "sshlease_...",
"certificate_sha256": "sha256:...",
"certificate": "ssh-ed25519-cert-v01@openssh.com AAAA...",
"revoked_at": null,
"revocation_effect": "metadata_only_until_certificate_expiry"
}The submitted public_key is hashed to public_key_sha256; callers must not
expect the key body in any response. certificate is present only on
successful issuance when a gateway SSH CA key is configured; it is omitted from
list/get responses and is not written to audit records. Lease issue and revoke
operations emit gateway_ssh_lease security audit records when the audit
logger is configured.
Revocation marks the gateway lease metadata as revoked. OpenSSH certificates that have already been returned to clients remain governed by their short certificate validity window until runtime-enforced revocation, such as KRL or a policy-backed principals command, is added.
List gateway SSH lease metadata.
Get one gateway SSH lease.
Revoke a gateway SSH lease. Revoked records remain visible as metadata and
return revocation_effect: "metadata_only_until_certificate_expiry".
Task orchestration endpoints for submitting and managing Claude Code tasks.
Submit a new task from a manifest.
Request Body:
{
"manifest_yaml": "name: example-task\nrepository:\n url: https://github.com/user/repo\nprompt: 'Fix the bug in main.rs'"
}OR
{
"manifest": {
"name": "example-task",
"repository": {
"url": "https://github.com/user/repo"
},
"prompt": "Fix the bug in main.rs"
}
}Response: 202 Accepted
{
"task_id": "task-12345678-1234-1234-1234-123456789abc",
"accepted": true,
"error": null
}Status Codes:
202- Accepted400- Invalid manifest503- Orchestrator not available
Example:
curl -X POST http://localhost:8122/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"manifest": {
"name": "fix-bug",
"repository": {
"url": "https://github.com/user/repo"
},
"prompt": "Fix the authentication bug"
}
}'List all tasks with optional filtering.
Query Parameters:
state(string, optional) - Comma-separated states:pending,staging,provisioning,ready,running,completing,completed,failed,failed_preserved,cancelledlimit(usize, default: 50) - Max resultsoffset(usize, default: 0) - Pagination offset
Response:
{
"tasks": [
{
"id": "task-12345678-1234-1234-1234-123456789abc",
"name": "fix-bug",
"state": "running",
"state_message": "Claude Code executing",
"created_at": "2024-01-30T12:00:00Z",
"started_at": "2024-01-30T12:01:00Z",
"state_changed_at": "2024-01-30T12:01:30Z",
"vm_name": "agent-task-abc123",
"vm_ip": "192.168.122.220",
"exit_code": null,
"error": null,
"progress": {
"output_bytes": 4096,
"tool_calls": 5,
"current_tool": "bash",
"last_activity_at": "2024-01-30T12:05:00Z"
}
}
],
"total_count": 1
}Example:
# List all tasks
curl http://localhost:8122/api/v1/tasks
# List only running tasks
curl "http://localhost:8122/api/v1/tasks?state=running"
# List completed and failed tasks
curl "http://localhost:8122/api/v1/tasks?state=completed,failed"Get task status.
Response:
{
"id": "task-12345678-1234-1234-1234-123456789abc",
"name": "fix-bug",
"state": "completed",
"state_message": "Task completed successfully",
"created_at": "2024-01-30T12:00:00Z",
"started_at": "2024-01-30T12:01:00Z",
"state_changed_at": "2024-01-30T12:10:00Z",
"vm_name": "agent-task-abc123",
"vm_ip": "192.168.122.220",
"exit_code": 0,
"error": null,
"progress": {
"output_bytes": 102400,
"tool_calls": 23,
"current_tool": null,
"last_activity_at": "2024-01-30T12:10:00Z"
}
}Status Codes:
200- Success404- Task not found
Example:
curl http://localhost:8122/api/v1/tasks/task-12345678-1234-1234-1234-123456789abcCancel a running task.
Request Body:
{
"reason": "User cancelled via dashboard"
}Response:
{
"success": true,
"error": null
}Status Codes:
200- Success400- Cannot cancel (e.g., already completed)404- Task not found
Example:
curl -X DELETE http://localhost:8122/api/v1/tasks/task-12345678-1234-1234-1234-123456789abc \
-H "Content-Type: application/json" \
-d '{"reason": "User requested cancellation"}'Stream task logs via Server-Sent Events (SSE).
Response: SSE stream
Event Types:
stdout- Standard output from Claude Codestderr- Standard error from Claude Codeevent- Structured event (JSON)completed- Task finished (data: exit code)error- Task error (data: error message)
Status Codes:
200- Success (streaming)404- Task not found
Example:
curl -N http://localhost:8122/api/v1/tasks/task-12345678-1234-1234-1234-123456789abc/logsSSE Output:
event: stdout
data: Analyzing codebase...
event: stdout
data: Running tests...
event: completed
data: 0
List JSON artifacts persisted for an A2A task, including stdout/stderr chunks
captured from messages:send dispatch. This route reads the executor
TaskStore; it is separate from the legacy filesystem artifact route under
/api/v1/tasks/{id}/artifacts.
Response:
{
"task_id": "task-123",
"artifacts": [
{
"artifact_id": "task-123-stdout-0001",
"task_id": "task-123",
"created_at": "2026-05-21T00:00:00Z",
"artifact": {
"kind": "output_chunk",
"stream": "stdout",
"data": "hello\n",
"seq": 1
}
}
]
}Status Codes:
200- Success404- Task not found for that instance
Return one persisted A2A task artifact JSON blob.
Status Codes:
200- Success404- Task or artifact not found
List artifacts produced by a task.
Response:
{
"artifacts": [
{
"name": "summary.md",
"path": "summary.md",
"size_bytes": 2048,
"content_type": "text/markdown",
"checksum": ""
}
]
}Status Codes:
200- Success404- Task not found
Example:
curl http://localhost:8122/api/v1/tasks/task-12345678-1234-1234-1234-123456789abc/artifactsDownload a specific artifact.
Response: File download with appropriate Content-Type and Content-Disposition headers.
Status Codes:
200- Success404- Task or artifact not found
Example:
curl -O http://localhost:8122/api/v1/tasks/task-12345678-1234-1234-1234-123456789abc/artifacts/summary.mdAddress: localhost:8120
The gRPC API is used for bidirectional communication between agents and the management server. See proto/agent.proto for complete protocol definitions.
Establishes a persistent connection for agent-management communication.
Agent → Management Messages:
AgentRegistration- Initial registration with system infoHeartbeat- Periodic status updates (every 30s)OutputChunk- stdout/stderr/log streamsCommandResult- Command execution resultsMetrics- Resource usage snapshotsSessionReport- Active sessions for reconciliationSessionReconcileAck- Reconciliation confirmation
Management → Agent Messages:
RegistrationAck- Accept registrationCommandRequest- Execute commandConfigUpdate- Update configurationShutdownSignal- Graceful shutdown requestPing- KeepaliveStdinChunk- Input for running commandPtyControl- PTY resize/signalSessionQuery- Request session reportSessionReconcile- Session cleanup instructions
Agent authentication metadata: secure transport listeners bind the verified
peer identity to x-agent-instance-id. Plain TCP metadata-only authentication
is no longer accepted. For mTLS, the verified certificate's SPIFFE URI-SAN is
the peer identity, and the /agent/<instance_id> component must match
x-agent-instance-id.
Execute a one-shot command and stream output.
Request:
{
"agent_id": "agent-01",
"command": "ls",
"args": ["-la", "/tmp"],
"working_dir": "/home/agent",
"env": {"DEBUG": "1"},
"timeout_seconds": 60
}Response Stream:
{"stream": "STREAM_STDOUT", "data": "dG90YWwgNAo=", "exit_code": 0, "complete": false}
{"stream": "STREAM_STDOUT", "data": "ZHJ3eHJ3eHJ3eCA=", "exit_code": 0, "complete": false}
{"stream": "STREAM_STDOUT", "data": "", "exit_code": 0, "complete": true}Stream Types:
STREAM_STDOUT(1) - Standard outputSTREAM_STDERR(2) - Standard error
Example using grpcurl:
grpcurl -plaintext \
-d '{
"agent_id": "agent-01",
"command": "echo",
"args": ["Hello, World!"],
"timeout_seconds": 10
}' \
localhost:8120 agentic.sandbox.v1.AgentService/Execmessage AgentRegistration {
string agent_id = 1; // VM name (e.g., "agent-01")
string ip_address = 2; // Agent's IP
string hostname = 3; // Hostname
string profile = 4; // Profile used (basic, agentic-dev)
map<string, string> labels = 5;
SystemInfo system = 6;
}
message SystemInfo {
string os = 1; // e.g., "Ubuntu 24.04"
string kernel = 2; // e.g., "6.8.0-generic"
int32 cpu_cores = 3;
int64 memory_bytes = 4;
int64 disk_bytes = 5;
}message CommandRequest {
string command_id = 1; // Unique ID for correlation
string command = 2; // Command to execute
repeated string args = 3; // Arguments
string working_dir = 4; // Working directory
map<string, string> env = 5; // Environment variables
int32 timeout_seconds = 6; // Execution timeout (0 = no timeout)
bool capture_output = 7; // Stream stdout/stderr back
string run_as = 8; // User to run as (default: agent)
// PTY terminal options
bool allocate_pty = 9; // Spawn in pseudo-terminal
uint32 pty_cols = 10; // Terminal width (default: 80)
uint32 pty_rows = 11; // Terminal height (default: 24)
string pty_term = 12; // TERM env var (default: xterm-256color)
}message Heartbeat {
string agent_id = 1;
int64 timestamp_ms = 2;
AgentStatus status = 3; // STARTING, READY, BUSY, ERROR, SHUTTING_DOWN, STALE, DISCONNECTED
float cpu_percent = 4;
int64 memory_used_bytes = 5;
int64 uptime_seconds = 6;
}Used for post-restart session cleanup.
message SessionReport {
string agent_id = 1;
repeated ActiveSession sessions = 2;
int64 timestamp_ms = 3;
}
message ActiveSession {
string command_id = 1; // UUID assigned by server
string session_name = 2; // e.g., "main", "claude"
SessionType session_type = 3; // INTERACTIVE, HEADLESS, BACKGROUND
string command = 4; // Original command
int64 started_at_ms = 5;
int32 pid = 6;
bool is_pty = 7;
}
message SessionReconcile {
repeated string keep_session_ids = 1; // Sessions to keep
repeated string kill_session_ids = 2; // Sessions to terminate
bool kill_unrecognized = 3; // Kill all not in keep list
int32 grace_period_seconds = 4; // Grace period before SIGKILL
}Address: ws://localhost:8121 on the default loopback-only listener.
Real-time streaming of agent output, metrics, and events to dashboard clients.
Connect to ws://localhost:8121 using any WebSocket client on the local host.
Do not expose this legacy plaintext WebSocket listener on untrusted networks;
use the authenticated pty-ws/v1/WSS path or a trusted tunnel for remote
access.
Example (JavaScript):
const ws = new WebSocket('ws://localhost:8121');
ws.onopen = () => {
console.log('Connected to WebSocket');
};
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
console.log('Received:', message);
};
ws.onclose = () => {
console.log('Disconnected');
};Messages are JSON with a type field indicating the message type.
Stdout, stderr, and log streams from agents.
{
"type": "output",
"agent_id": "agent-01",
"stream_id": "cmd-12345",
"stream_type": "stdout",
"data": "SGVsbG8sIFdvcmxkIQo=",
"timestamp": 1706572800000
}Stream Types: "stdout", "stderr", "log"
Periodic resource usage updates.
{
"type": "metrics",
"agent_id": "agent-01",
"cpu_percent": 2.3,
"memory_used_bytes": 536870912,
"memory_total_bytes": 8589934592,
"disk_used_bytes": 2147483648,
"disk_total_bytes": 53687091200,
"load_avg": [0.15, 0.20, 0.18],
"timestamp": 1706572800000
}Agent connection state changes.
{
"type": "agent_status",
"agent_id": "agent-01",
"status": "Ready",
"timestamp": 1706572800000
}Status Values: "Starting", "Ready", "Busy", "Error", "ShuttingDown", "Stale", "Disconnected"
import requests
import json
from typing import Optional
class AgenticClient:
def __init__(self, base_url: str = "http://localhost:8122"):
self.base_url = base_url
self.session = requests.Session()
def list_agents(self):
"""List all connected agents."""
resp = self.session.get(f"{self.base_url}/api/v1/agents")
resp.raise_for_status()
return resp.json()["agents"]
def list_vms(self, state: str = "all"):
"""List VMs with optional state filter."""
resp = self.session.get(
f"{self.base_url}/api/v1/vms",
params={"state": state}
)
resp.raise_for_status()
return resp.json()["vms"]
def create_vm(
self,
name: str,
profile: str = "agentic-dev",
vcpus: int = 4,
memory_mb: int = 8192,
disk_gb: int = 50,
start: bool = True
):
"""Create a new VM."""
resp = self.session.post(
f"{self.base_url}/api/v1/vms",
json={
"name": name,
"profile": profile,
"vcpus": vcpus,
"memory_mb": memory_mb,
"disk_gb": disk_gb,
"agentshare": True,
"start": start
}
)
resp.raise_for_status()
return resp.json()["operation"]["id"]
def get_operation(self, op_id: str):
"""Poll operation status."""
resp = self.session.get(f"{self.base_url}/api/v1/operations/{op_id}")
resp.raise_for_status()
return resp.json()
def wait_for_operation(self, op_id: str, timeout: int = 300):
"""Poll until operation completes."""
import time
start = time.time()
while time.time() - start < timeout:
op = self.get_operation(op_id)
if op["status"] == "completed":
return op
elif op["status"] == "failed":
raise Exception(f"Operation failed: {op.get('error')}")
time.sleep(2)
raise TimeoutError("Operation timed out")
def start_vm(self, name: str):
"""Start a VM."""
resp = self.session.post(f"{self.base_url}/api/v1/vms/{name}/start")
resp.raise_for_status()
return resp.json()
def stop_vm(self, name: str):
"""Stop a VM gracefully."""
resp = self.session.post(f"{self.base_url}/api/v1/vms/{name}/stop")
resp.raise_for_status()
return resp.json()
def delete_vm(self, name: str, delete_disk: bool = False, force: bool = False):
"""Delete a VM."""
resp = self.session.delete(
f"{self.base_url}/api/v1/vms/{name}",
params={"delete_disk": delete_disk, "force": force}
)
resp.raise_for_status()
return resp.json()
# Usage
client = AgenticClient()
# List agents
agents = client.list_agents()
print(f"Connected agents: {len(agents)}")
# Create VM and wait for completion
op_id = client.create_vm("agent-05")
print(f"Provisioning started: {op_id}")
result = client.wait_for_operation(op_id)
print(f"VM created: {result['result']}")
# Start/stop VM
client.stop_vm("agent-05")
client.start_vm("agent-05")
# Delete VM
client.delete_vm("agent-05", delete_disk=True, force=True)const axios = require('axios');
class AgenticClient {
constructor(baseUrl = 'http://localhost:8122') {
this.baseUrl = baseUrl;
this.client = axios.create({ baseURL: baseUrl });
}
async listAgents() {
const resp = await this.client.get('/api/v1/agents');
return resp.data.agents;
}
async listVMs(state = 'all') {
const resp = await this.client.get('/api/v1/vms', {
params: { state }
});
return resp.data.vms;
}
async createVM(options) {
const {
name,
profile = 'agentic-dev',
vcpus = 4,
memoryMb = 8192,
diskGb = 50,
start = true
} = options;
const resp = await this.client.post('/api/v1/vms', {
name,
profile,
vcpus,
memory_mb: memoryMb,
disk_gb: diskGb,
agentshare: true,
start
});
return resp.data.operation.id;
}
async getOperation(opId) {
const resp = await this.client.get(`/api/v1/operations/${opId}`);
return resp.data;
}
async waitForOperation(opId, timeout = 300000) {
const start = Date.now();
while (Date.now() - start < timeout) {
const op = await this.getOperation(opId);
if (op.status === 'completed') {
return op;
} else if (op.status === 'failed') {
throw new Error(`Operation failed: ${op.error}`);
}
await new Promise(resolve => setTimeout(resolve, 2000));
}
throw new Error('Operation timed out');
}
async startVM(name) {
const resp = await this.client.post(`/api/v1/vms/${name}/start`);
return resp.data;
}
async stopVM(name) {
const resp = await this.client.post(`/api/v1/vms/${name}/stop`);
return resp.data;
}
async deleteVM(name, options = {}) {
const { deleteDisk = false, force = false } = options;
const resp = await this.client.delete(`/api/v1/vms/${name}`, {
params: { delete_disk: deleteDisk, force }
});
return resp.data;
}
}
// Usage
(async () => {
const client = new AgenticClient();
// List agents
const agents = await client.listAgents();
console.log(`Connected agents: ${agents.length}`);
// Create VM
const opId = await client.createVM({ name: 'agent-06' });
console.log(`Provisioning started: ${opId}`);
const result = await client.waitForOperation(opId);
console.log(`VM created:`, result.result);
})();# Health check
curl http://localhost:8122/healthz
# List agents
curl http://localhost:8122/api/v1/agents | jq
# List running VMs
curl "http://localhost:8122/api/v1/vms?state=running" | jq
# Get VM details
curl http://localhost:8122/api/v1/vms/agent-01 | jq
# Create VM
curl -X POST http://localhost:8122/api/v1/vms \
-H "Content-Type: application/json" \
-d '{"name":"agent-07"}' | jq
# Poll operation status
curl http://localhost:8122/api/v1/operations/op-12345 | jq
# Start VM
curl -X POST http://localhost:8122/api/v1/vms/agent-07/start | jq
# Stop VM
curl -X POST http://localhost:8122/api/v1/vms/agent-07/stop | jq
# Restart VM
curl -X POST http://localhost:8122/api/v1/vms/agent-07/restart \
-H "Content-Type: application/json" \
-d '{"mode":"graceful","timeout_seconds":60}' | jq
# Delete VM
curl -X DELETE "http://localhost:8122/api/v1/vms/agent-07?delete_disk=true&force=true" | jq
# List events
curl http://localhost:8122/api/v1/events | jq
# Submit task
curl -X POST http://localhost:8122/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"manifest": {
"name": "analyze-repo",
"repository": {"url": "https://github.com/user/repo"},
"prompt": "Analyze code quality"
}
}' | jq
# List tasks
curl http://localhost:8122/api/v1/tasks | jq
# Stream task logs
curl -N http://localhost:8122/api/v1/tasks/task-12345/logs| Code | Meaning |
|---|---|
| 200 | OK - Request successful |
| 202 | Accepted - Async operation started |
| 400 | Bad Request - Invalid input |
| 404 | Not Found - Resource doesn't exist |
| 409 | Conflict - Resource state conflict |
| 500 | Internal Server Error - Server error |
| 503 | Service Unavailable - Service not ready |
| Code | Description |
|---|---|
VM_NOT_FOUND |
VM doesn't exist in libvirt |
VM_RUNNING |
VM is running (when stopped required) |
VM_STOPPED |
VM is stopped (when running required) |
VM_NOT_RUNNING |
VM is not running |
VM_ALREADY_EXISTS |
VM name already in use |
INVALID_VM_NAME |
VM name doesn't match pattern |
PROVISIONING_ERROR |
VM provisioning failed |
LIBVIRT_ERROR |
libvirt operation failed |
OPERATION_NOT_FOUND |
Operation ID not found |
The following routes are wired up in management/src/http/server.rs but were
absent from the canonical reference. They are documented here in summary form
so callers can discover them; the reference sections above will absorb these
on the next documentation pass.
Triggers a reprovision of the named agent VM via reprovision-vm.sh.
Response: 202 Accepted with {"operation_id": "...", "status": "queued"}
Retired with the legacy shared-secret path in #412.
Response: 410 Gone with an error explaining that agents must use
transport identity credentials.
Returns current AIWG bridge connection state.
Response:
{ "connected": true, "session_count": 3, "last_event_secs": 12 }Forces a reconnect of the AIWG bridge.
Response: 200 OK with {"ok": true}
The companion manifest and aiwg exec routes remain wired for
dev/break-glass diagnostics only:
GET /api/v1/agents/{id}/manifests/{platform}GET /api/v1/agents/{id}/manifests/{platform}/{name}POST /api/v1/agents/{id}/manifests/{platform}/{name}POST /api/v1/agents/{id}/aiwg/exec
These routes shell out to direct runtime SSH and bypass the
gateway-mediated SSH policy/audit boundary from ADR-029 (now available via the
SSH certificate lease API above). They are disabled by
default and return 403 Forbidden unless
AGENTIC_ENABLE_DIRECT_SSH_AIWG_PROXY=1 is set for a dev/break-glass
diagnostic session. Managed-profile SSH access should use the
gateway-mediated path tracked by #531.
Creates a new interactive PTY session on the agent. The response preserves the legacy websocket fields for older clients and also includes the current v2 PTY and orchestrator attach metadata for #321-style TUI orchestration.
Request body:
{
"command": "bash",
"session_name": "codex-tui",
"session_backend": "tmux",
"session_class": "managed"
}All fields are optional. When omitted, the server launches bash with a
generated terminal-* session name using the current managed tmux session
host. session_backend is the #461 session-host selector. This endpoint
currently supports tmux only; screen, zellij, and native are rejected
with 501 session_backend.not_implemented until their backends land.
session_class currently supports managed only; direct ad-hoc session
control is exposed through the pty-ws/v1 capability surface first.
Response:
{
"session_id": "<stable-session-id>",
"instance_id": "<routable-a2a-instance-id>",
"command_id": "<agent-command-correlation-id>",
"session_name": "codex-tui",
"ws_endpoint": "ws://{host}:8121/",
"join_message": {
"type": "join_session",
"session_id": "<stable-session-id>",
"role": "controller"
},
"pty_ws_url": "wss://{host}/agents/<instance_id>/sessions/<session_id>/attach",
"pty_ws_subprotocol": "pty-ws.v1",
"orchestrator_observer_url": "/ws/sessions/<session_id>/orchestrate?role=observer",
"orchestrator_controller_url": "/ws/sessions/<session_id>/orchestrate?role=controller",
"default_role": "observer",
"controller_policy": "controller input is policy-gated",
"session_backend": "tmux",
"session_class": "managed",
"supported_session_backends": ["tmux"],
"supported_session_classes": ["managed"],
"observe_supported": true,
"drive_supported": true,
"reattach_supported": true
}For new orchestration clients, use default_role: observer first. Controller
attachment is intended only for policy-approved bounded input. The legacy
ws_endpoint / join_message fields remain for compatibility with older
path-agnostic websocket clients.
The pty-ws/v1 binding_hello frame also includes session_host capability
metadata from the active PTY bridge. The default no-op bridge reports
native/direct observe, drive, and reattach support. The real agent PTY bridge
reports native/direct plus screen/managed, zellij/managed, and tmux/managed
support. Clients select the backend on pty.join_session:
{
"op": "pty.join_session",
"payload": {
"session_backend": "native",
"session_class": "direct",
"argv": ["/bin/bash", "-l"],
"cwd": "/workspace",
"env": { "TERM": "xterm-256color" },
"terminal_size": { "cols": 132, "rows": 43 }
}
}Unsupported join selections fail closed with
session_backend.not_implemented or session_class.not_implemented before a
PTY bridge session is started.
Kills a session.
Query params:
signal(optional, defaultTERM) — one ofTERM | KILL | INT | HUP.
Response: 200 OK with {"killed": true}
Returns the curated agent-image catalog used to populate the dashboard's
Create Instance image picker (#179). The list mirrors the Dockerfiles under
images/container/ and is updated when new images land in CI.
Response:
{
"images": [
{ "ref": "agentic/claude:latest", "label": "Claude", "description": "Anthropic Claude Code agent", "default": true },
{ "ref": "agentic/codex:latest", "label": "Codex", "description": "OpenAI Codex agent" },
{ "ref": "agentic/opencode:latest","label": "OpenCode","description": "OpenCode agent" },
{ "ref": "agentic/automation-control:latest", "label": "Automation Control", "description": "Orchestrator-ready control image with Codex, Aider, dev tools, and credential-free probes" }
]
}AIWG aiwg serve calls this route to dispatch a mission to this sandbox.
See AIWG Executor Contract for the full integration
(registration, capabilities, event vocabulary, persistence, lifecycle).
Auth: Authorization: Bearer <token> — token issued at executor
registration. Constant-time comparison.
Request body:
{
"mission_id": "<UUID>",
"objective": "<command/prompt>",
"completion": "<optional completion criteria>",
"long_running": false,
"executor_filter": { "agent_id": "agent-01" },
"metadata": { }
}Response: 202 Accepted
{
"mission_id": "<echo>",
"executor_id": "<sandbox instance_id>",
"status": "assigned",
"estimated_start": "<RFC3339>"
}Failure: 401 (bad token), 404 (agent not found), 503 (no agents
available / executor not registered), 500 (dispatcher error — emits
mission.failed with reason).
These complement the upload/list endpoints already documented under Storage:
| Endpoint | Method | Description |
|---|---|---|
/api/v1/storage/global/_download |
GET | Stream a file from the read-only global share |
/api/v1/storage/inbox/{agent_id}/_download |
GET | Stream a file from a per-agent inbox |
/api/v1/storage/outbox/{task_id}/_download |
GET | Stream a file from a per-task outbox |
All three accept ?path=<relative-path> and respond with the raw file bytes
(Content-Type inferred from extension).
Currently no rate limiting is enforced. For production deployments, consider implementing:
- Per-IP rate limits on HTTP endpoints
- Connection limits on WebSocket
- gRPC flow control for agent streams
API version is included in the path: /api/v1/...
Current version: v1
Breaking changes will increment the version number. Legacy endpoints are maintained for backwards compatibility where possible.