Skip to content

Design Reconciliation Engine for Deployment Lifecycle #424

@ilackarms

Description

@ilackarms

Purpose

Replace the imperative deployment lifecycle with a KRT-driven reconciliation engine. The system moves from "API handler calls adapter synchronously on the request path" to "API writes intent to DB, async reconciler converges actual state toward desired state."

Architecture Overview

Three layers

┌─────────────────────────────────────────────────────┐
│  API Server — dumb CRUD + watch                     │
│  Writes intent to DB, reads state, streams changes. │
│  Zero business logic.                               │
└──────────────────────┬──────────────────────────────┘
                       │ reads/writes
                       ▼
┌─────────────────────────────────────────────────────┐
│  PostgreSQL — source of truth + event bus            │
│  pg LISTEN/NOTIFY triggers on all watched tables.   │
│  DB is the ONLY communication channel between       │
│  API server and reconciler.                         │
└──────────┬───────────────────────────┬──────────────┘
           │ NOTIFY                    │ NOTIFY
           ▼                           ▼
┌──────────────────────┐  ┌───────────────────────────┐
│  KRT Reconciler      │  │  Discovery Writers        │
│  All logic here.     │  │  Per-platform, async.     │
│  DB watches feed     │  │  Write raw actual state   │
│  KRT collections.    │  │  to platform tables.      │
│  Transformations     │  │  No normalization.        │
│  derive reconcile    │  │                           │
│  requests. Handlers  │  │                           │
│  execute adapters.   │  │                           │
└──────────────────────┘  └───────────────────────────┘

Data flow

User intent (API)
    │
    ▼
deployments table (desired_state: deployed/undeployed)
    │
    │ pg NOTIFY
    ▼
KRT StaticCollection[Deployment]
    │
    │ + StaticCollection[Provider]
    │ + StaticCollection[Agent/Server]  (dependency resolution)
    │ + StaticCollection[Discovered*]   (per-platform actual state)
    │
    ▼
KRT Transformation: desired vs actual → ReconcileRequest
    │
    │ (only items where desired ≠ actual)
    ▼
KRT Leaf Handler (Register)
    │
    ├─ Execute adapter.Deploy() / adapter.Undeploy()
    ├─ Write ReconcileEvent to DB
    ├─ Update deployment status in DB
    │
    │ pg NOTIFY (from DB writes)
    ▼
API watch clients see updated status

System diagram

flowchart TB
    subgraph api [API Server — dumb CRUD + watch]
        apiWrite[Write intent to DB]
        apiRead[Read state from DB]
        apiWatch[Watch via pg LISTEN/NOTIFY]
    end

    subgraph db [PostgreSQL — source of truth + event bus]
        deployments[(deployments)]
        reconcileEvents[(reconcile_events)]
        providers[(providers)]
        artifacts[(agents / servers)]
        discoveredLocal[(discovered_local)]
        discoveredK8s[(discovered_kubernetes)]
        pgNotify[pg LISTEN/NOTIFY]
    end

    subgraph reconciler [KRT Reconciler]
        collections[StaticCollections\nfed by DB watches]
        transforms[Transformations\ndesired vs actual]
        handlers[Leaf handlers\nexecute adapters]
    end

    subgraph discovery [Discovery Writers]
        localDisc[Local]
        k8sDisc[Kubernetes]
    end

    subgraph adapters [Platform Adapters]
        localAdapt[Local]
        k8sAdapt[Kubernetes]
    end

    apiWrite --> deployments
    apiRead --> deployments
    apiWatch --> pgNotify

    deployments --> pgNotify
    reconcileEvents --> pgNotify
    providers --> pgNotify
    artifacts --> pgNotify
    discoveredLocal --> pgNotify
    discoveredK8s --> pgNotify

    pgNotify --> collections
    collections --> transforms
    transforms --> handlers

    handlers --> localAdapt
    handlers --> k8sAdapt
    handlers --> reconcileEvents
    handlers --> deployments

    localDisc --> discoveredLocal
    k8sDisc --> discoveredK8s
Loading

What Exists Today

Current deploy flow (imperative, synchronous)

POST /v0/deployments
  → Handler validates request
  → Service.LaunchDeployment()
    → CreateManagedDeploymentRecord() → DB status="deploying"
    → ResolveDeploymentAdapterByProviderID()
    → adapter.Deploy(ctx, deployment)         ← BLOCKS on request path
    → ApplyDeploymentActionResult() → DB status="deployed"
  → Return deployment to client

Problems: no retry on failure, restart loses in-flight state, adapter failure = permanent failure, no separation of intent from execution.

Current discovery flow (implicit, on-demand)

GET /v0/deployments
  → Query DB for managed deployments
  → For each provider:
      → adapter.Discover(ctx, providerID)     ← BLOCKS on request path
  → Merge managed + discovered
  → Return to client

Problems: discovery is request-scoped (blocks every list call), discovered deployments are ephemeral (not persisted), no caching.

What Changes vs What Stays

Changes

Component Before After
Deploy request path Synchronous adapter execution Write intent, return 202
Deployment status Set directly from adapter result Derived from reconcile events
Discovery On-demand per ListDeployments call Async writers → per-platform tables
Discovered deployments Ephemeral (not persisted) Persisted in platform tables
Cancelation adapter.Cancel() on request path desired_state: undeployed
Business logic location Service layer (deployment/service.go) KRT reconciler
Event backbone None pg LISTEN/NOTIFY

Stays the same

Component Why
DeploymentPlatformAdapter interface Adapters still implement Deploy/Undeploy/Discover. Reconciler is the new caller instead of service layer.
ProviderPlatformAdapter interface Provider CRUD unchanged. Health check is additive.
Platform adapter implementations (local, k8s) Internal logic unchanged. Called by reconciler instead of service.
Deployment DB schema (mostly) Adding desired_state column. Existing columns stay.
API endpoints (mostly) Same routes, deploy returns 202 instead of blocking. Watch mode is additive.
Static resource services (agent, server, skill, prompt) No reconciliation needed. CRUD only.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions