OmoiOS Architecture: Dynamic Swarm Orchestration

This is the critical reference document for understanding OmoiOS. Read this before diving into any specific component.

Each section links to a deep-dive doc with full implementation details.

Executive Summary

OmoiOS is an autonomous engineering platform that orchestrates multiple AI agents through a spec-driven, discovery-enabled, self-adjusting workflow. The system handles three core capabilities:

Capability	System	Purpose
Plans	Spec-Sandbox State Machine	Convert feature ideas into structured requirements, designs, and atomic tasks
Discoveries	DiscoveryService + Analyzer	Detect new work during execution and spawn adaptive branch tasks
Readjustments	MonitoringLoop + Guardian + Conductor	Monitor agent trajectories and intervene when goals drift

Production URLs:

Frontend: https://omoios.dev
Backend API: https://api.omoios.dev

Architecture Overview

                                 ┌─────────────────────────────────────────┐
                                 │              User / Frontend            │
                                 │  (Specs, Tickets, Tasks, Monitoring)    │
                                 └────────────────────┬────────────────────┘
                                                      │
                                                      ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                                       API Layer                                          │
│                         FastAPI Routes + WebSocket Events                                │
└───────────────┬─────────────────────────────┬───────────────────────────┬───────────────┘
                │                             │                           │
                ▼                             ▼                           ▼
┌───────────────────────────┐   ┌─────────────────────────────┐   ┌─────────────────────────┐
│     PLANNING SYSTEM       │   │    EXECUTION SYSTEM         │   │   READJUSTMENT SYSTEM   │
│  ┌─────────────────────┐  │   │  ┌─────────────────────┐    │   │  ┌─────────────────────┐│
│  │  Spec-Sandbox       │  │   │  │ OrchestratorWorker  │    │   │  │   MonitoringLoop    ││
│  │  State Machine      │  │   │  │ (task dispatch)     │    │   │  │  ┌───────────────┐  ││
│  │                     │  │   │  └──────────┬──────────┘    │   │  │  │   Guardian    │  ││
│  │ EXPLORE → PRD →     │  │   │             │               │   │  │  │  (trajectory) │  ││
│  │ REQUIREMENTS →      │  │   │             ▼               │   │  │  └───────────────┘  ││
│  │ DESIGN → TASKS →    │  │   │  ┌─────────────────────┐    │   │  │  ┌───────────────┐  ││
│  │ SYNC               │  │   │  │  DaytonaSpawner     │    │   │  │  │   Conductor   │  ││
│  │                     │  │   │  │  (sandbox creation) │    │   │  │  │  (coherence)  │  ││
│  └─────────────────────┘  │   │  └──────────┬──────────┘    │   │  │  └───────────────┘  ││
│           │               │   │             │               │   │  └─────────────────────┘│
│           ▼               │   │             ▼               │   │            │            │
│  ┌─────────────────────┐  │   │  ┌─────────────────────┐    │   │            ▼            │
│  │  Phase Evaluators   │  │   │  │ ClaudeSandboxWorker │    │   │  ┌─────────────────────┐│
│  │  (quality gates)    │  │   │  │ + EventReporter     │    │   │  │  Steering           ││
│  └─────────────────────┘  │   │  │ + MessagePoller     │    │   │  │  Interventions      ││
│           │               │   │  │ + FileChangeTracker │    │   │  └─────────────────────┘│
│           ▼               │   │  └──────────┬──────────┘    │   └─────────────────────────┘
│  ┌─────────────────────┐  │   │             │               │
│  │  HTTPReporter       │  │   │             │ (discoveries) │
│  │  (event streaming)  │  │   │             ▼               │
│  └─────────────────────┘  │   │  ┌─────────────────────┐    │
└───────────────────────────┘   │  │  DiscoveryService   │    │
                                │  │  (adaptive branch)  │    │
                                │  └─────────────────────┘    │
                                └─────────────────────────────┘
                                              │
                                              ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                                   DATA LAYER                                             │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐     │
│  │  MemoryService  │  │  ContextService │  │ SynthesisService│  │  DAG Merge      │     │
│  │  (hybrid search)│  │  (phase context)│  │ (parallel merge)│  │  System         │     │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘  └─────────────────┘     │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐     │
│  │  SpecDedup      │  │ EmbeddingService│  │  BillingService │  │ CostTracking    │     │
│  │  (hash+semantic)│  │  (pgvector 1536)│  │  (Stripe+tiers) │  │ (per-task LLM)  │     │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘  └─────────────────┘     │
│                                                                                          │
│                          PostgreSQL + pgvector + Redis                                   │
└─────────────────────────────────────────────────────────────────────────────────────────┘

System Deep-Dives

1. Planning System

Convert a high-level feature idea into structured, executable work units through a 7-phase pipeline: EXPLORE → PRD → REQUIREMENTS → DESIGN → TASKS → SYNC → COMPLETE. Each phase has an LLM evaluator (quality gate) that scores output and retries on failure. All phases use incremental writing to prevent data loss.

Key components: Spec-Sandbox State Machine, Phase Evaluators, HTTPReporter

Deep-dive: Planning System →

2. Execution System

Execute tasks in isolated Daytona sandboxes with full audit trail. The OrchestratorWorker polls for tasks, spawns sandboxes via DaytonaSpawner, and ClaudeSandboxWorker runs inside the sandbox using the Claude Agent SDK. Three execution modes: exploration (read-only), implementation (full access), and validation (test execution). Background workers handle task dispatch, stale cleanup, idle sandbox termination, and health monitoring.

Key components: OrchestratorWorker, DaytonaSpawner, ClaudeSandboxWorker, ContinuousSandboxWorker, IdleSandboxMonitor

Deep-dive: Execution System →

3. Discovery System

Enable adaptive workflow branching when agents discover new requirements during execution. The Hephaestus pattern allows discovery-based branching to bypass normal phase transitions — a Phase 3 validation agent can spawn Phase 1 investigation tasks. The DiscoveryAnalyzerService uses LLM-powered analysis to find recurring patterns, predict blockers, and recommend agent types.

Key components: DiscoveryService, DiscoveryAnalyzerService

Deep-dive: Discovery System →

4. Readjustment System

Monitor agent trajectories and system coherence, intervening when agents drift from goals. The MonitoringLoop runs three background loops: Guardian (60s, per-agent trajectory analysis), Conductor (5min, system-wide coherence + duplicate detection), and Health Check (30s, critical state alerts). Steering interventions can redirect, refocus, or stop agents.

Key components: MonitoringLoop, IntelligentGuardian, ConductorService, ValidationOrchestrator

Deep-dive: Readjustment System → | Monitoring Architecture →

5. Frontend Architecture

Next.js 15 App Router with ShadCN UI, dual state management (Zustand + React Query), and real-time WebSocket integration. Route groups separate auth flows from the dashboard shell. ~67 pages covering organization management, spec workspaces, Kanban boards, dependency graphs, statistics dashboards, and agent management.

Key components: Route groups, Zustand middleware stack, React Query cache, WebSocket bridge

Deep-dive: Frontend Architecture → | Design: Frontend Architecture → | Page Architecture →

6. Real-Time Event System

Redis pub/sub EventBus for all inter-service communication with WebSocket forwarding to the frontend. Events cover agent lifecycle, task status, sandbox health, coordination, and monitoring updates. React Query cache invalidation and Zustand state sync both trigger from WebSocket events.

Key components: EventBusService, WebSocket server, Frontend event bridge

Deep-dive: Real-Time Events → | Design: React Query + WebSocket →

7. Authentication & Security

Self-hosted JWT authentication with bcrypt password hashing, refresh tokens, OAuth (GitHub/Google), and agent-scoped API keys. Organization-level RBAC with owner/admin/member/viewer roles. (Note: Supabase auth exists as a design alternative but was not implemented.)

Key components: AuthService, JWT middleware, OAuth routes, API key management

Deep-dive: Auth & Security → | Design: Auth System Plan →

8. Billing & Subscriptions

Full Stripe integration with 6 subscription tiers (Free/Pro/Team/Enterprise/Lifetime/BYO Platform), prepaid credit purchases, usage-based billing, automated dunning (3 retries before suspension), promo codes, and per-task LLM cost tracking. Workflow execution is gated by check_and_reserve_workflow() before sandbox spawning. Background tasks handle monthly invoice generation, payment reminders, and low credit warnings.

Key components: BillingService, StripeService, SubscriptionService, CostTrackingService, BudgetEnforcerService

Deep-dive: Billing & Subscriptions →

9. MCP Integration

Model Context Protocol integration with circuit breakers, retry logic, authorization enforcement, and vector-based tool discovery. Per server+tool circuit breakers with exponential backoff retries.

Key components: MCPIntegrationService, Tool Registry, Authorization Engine, Circuit Breaker

Deep-dive: MCP Integration → | Design: MCP Server Integration →

10. GitHub Integration

Branch management, commit tracking, pull request workflows, and webhook processing. Branches are created before sandbox spawning. Commits are automatically linked to tickets via regex matching in commit messages.

Key components: GitHubIntegrationService, DaytonaSpawner (branch creation), sandbox_git_operations

Deep-dive: GitHub Integration →

11. Database Schema

PostgreSQL 16 with pgvector for semantic search and deduplication, ~77 model classes across 61 model files, and 73 Alembic migrations. 5 tables use Vector(1536) columns with IVFFlat indexes for cosine similarity search; 3 additional tables use ARRAY(Float) for embedding storage. PostgreSQL tsvector provides full-text keyword search for hybrid retrieval. Models cover core resources, workflow state, agent execution, monitoring, auth/billing, version control, memory, and quality.

Key components: SQLAlchemy models, Alembic migrations, pgvector embeddings, tsvector full-text search

Deep-dive: Database Schema →

12. Configuration System

Dual configuration: YAML files (version-controlled) for application settings with Pydantic validation, .env files (gitignored) for secrets. Priority: init args > env vars > YAML defaults. All config classes extend OmoiBaseSettings with @lru_cache factories.

Key components: OmoiBaseSettings, YAML loaders, environment-specific overrides

Deep-dive: Configuration System → | Design: Configuration Architecture →

13. API Route Catalog

FastAPI REST API with 38 route files organized by domain. All protected routes require JWT authentication. Auto-generated Swagger UI at /docs and ReDoc at /redoc.

Key components: FastAPI routes, auth middleware, service injection

Deep-dive: API Route Catalog →

Memory & Context System

Learn from execution history and provide relevant context to new tasks using pgvector-powered semantic search combined with PostgreSQL full-text search.

Component	File	Purpose
MemoryService	`services/memory.py` (~910 lines)	Pattern learning + hybrid search (semantic + keyword with RRF)
EmbeddingService	`services/embedding.py`	Multi-provider embeddings (Fireworks, OpenAI, FastEmbed local)
ContextService	`services/context_service.py`	Cross-phase context aggregation
TaskContextBuilder	`services/task_context_builder.py`	Full context assembly for task execution
SynthesisService	`services/synthesis_service.py`	Parallel task result merging (combine/union/intersection)

Memory Types: ERROR_FIX, DECISION, LEARNING, WARNING, CODEBASE_KNOWLEDGE, DISCOVERY (LLM-classified with rule-based fallback)

Hybrid Search (Reciprocal Rank Fusion)

Memories are searched using three modes:

Semantic: pgvector cosine similarity on 1536-dim embeddings
Keyword: PostgreSQL tsvector full-text search with ts_rank
Hybrid (default): RRF fusion — score = 0.6 * (1/(60 + sem_rank)) + 0.4 * (1/(60 + kw_rank))

ACE Workflow

When a task completes, the ACE (Analyze-Curate-Extract) workflow runs:

Executor: Parse tool usage, classify memory type, generate embeddings, create TaskMemory
Reflector: Analyze feedback for errors, search playbook, extract insights
Curator: Propose playbook updates, generate deltas, validate and apply changes

Storage: TaskMemory (execution records), LearnedPattern (extracted patterns), PlaybookEntry (knowledge bullets per ticket)

Design: Memory System → | Requirements: Memory System →

Deduplication System

Prevent duplicate work across specs, requirements, tasks, and tickets using dual-path deduplication: SHA256 content hashing (fast exact match) + pgvector embedding similarity (semantic near-duplicate detection).

Component	File	Purpose
SpecDeduplicationService	`services/spec_dedup.py` (~855 lines)	Dedup specs, requirements, tasks within project/spec scope
TaskDeduplicationService	`services/task_dedup.py`	Prevent runaway diagnostic task spawning
TicketDeduplicationService	`services/ticket_dedup.py`	Global ticket duplicate detection
EmbeddingService	`services/embedding.py`	Shared embedding generation for all dedup services

Similarity Thresholds

Entity	Threshold	Scope	Method
Spec	0.92	Within project	Hash + semantic
Requirement	0.88	Within spec	Hash + semantic
Task (spec)	0.85	Within spec	Hash + semantic
Task (queue)	0.85	Within ticket	Semantic only
Ticket	0.85	Global	Semantic only
Acceptance Criteria	1.0 (exact)	Within requirement	Hash only

pgvector Usage

All vector columns use Vector(1536) with IVFFlat indexes (vector_cosine_ops, lists=100). Similarity queries use cosine distance: 1 - (embedding <=> query_vector). Default embedding provider: Fireworks (qwen3-embedding-8b), with OpenAI and local FastEmbed as alternatives. 5 tables use Vector(1536) for pgvector similarity search; 3 additional tables use ARRAY(Float) for embedding storage.

KS|| Table | Vector Column | Purpose | Notes | YW||-------|--------------|---------|-------| ZN|| `specs` | `embedding_vector` | Spec-level dedup within project | Vector(1536) | YY|| `spec_requirements` | `embedding_vector` | Requirement dedup within spec | Vector(1536) | HJ|| `spec_tasks` | `embedding_vector` | Task dedup within spec | Vector(1536) | RN|| `tasks` | `embedding_vector` | Queue task dedup within ticket | Vector(1536) | ZS|| `tickets` | `embedding_vector` | Global ticket dedup | Vector(1536) | NN|| `task_memories` | `context_embedding` | Memory similarity search | ARRAY(Float) | NT|| `learned_patterns` | `embedding` | Pattern matching | ARRAY(Float) | WK|| `playbook_entries` | `embedding` | Playbook entry search (ACE) | ARRAY(Float) |

DAG Merge System

Coordinate parallel work branches and merge them back together.

Component	File	Purpose
DependencyGraphService	`services/dependency_graph.py`	DAG visualization + critical path
CoordinationService	`services/coordination.py`	SYNC, SPLIT, JOIN, MERGE primitives
ConflictScorer	`services/conflict_scorer.py`	git merge-tree dry-run scoring
ConvergenceMergeService	`services/convergence_merge_service.py`	Multi-branch merge orchestration
AgentConflictResolver	`services/agent_conflict_resolver.py`	LLM-powered conflict resolution

See Coordination Patterns → for pattern definitions.

Service Initialization Map (CRITICAL REFERENCE)

OmoiOS has two separate service initialization points that don't share state:

Location	File	Function	Purpose	Services
PS		API Server	`api/main.py`	`lifespan()`
RV		Orchestrator Worker	`workers/orchestrator_worker.py`	`init_services()`

These run as separate processes. Services initialized in one are NOT available in the other.

Service Availability Matrix

Service	API Server	Orchestrator	Notes
DatabaseService	yes	yes	Both have own connections
EventBusService	yes	yes	Both have own Redis connections
TaskQueueService	yes	yes	Both have own instances
AgentRegistryService	yes	yes	Both have own instances
PhaseGateService	yes	yes	Both have own instances
PhaseManager	yes	no	API-only (routes use it)
RP		PhaseProgressionService	no
YQ		TaskRequirementsAnalyzer	no
YM		TicketWorkflowOrchestrator	no
TJ		MonitoringLoop	yes

See Integration Gaps → for the full gap analysis and fix instructions.

Testing Requirements

What Needs Testing

Spec-Sandbox state machine (EXPLORE through SYNC)
Phase evaluators and retry logic
DaytonaSpawner sandbox creation
ClaudeSandboxWorker execution modes
Full spec → tickets → tasks → execution flow
Discovery branching during execution
MonitoringLoop with real agents
Parallel task execution with SynthesisService
DAG merge with conflict resolution

Glossary

Term	Definition
Spec	A feature specification that goes through EXPLORE → SYNC phases
Ticket	A work grouping (TKT-NNN) containing multiple tasks
Task	An atomic work unit (TSK-NNN) executable by an agent
Discovery	New work found during execution that spawns branch tasks
Trajectory	An agent's conversation and tool usage history
Alignment	How well an agent's trajectory matches its goal
Coherence	System-wide measure of agent coordination
Synthesis	Merging results from parallel predecessor tasks
Phase Gate	Quality evaluation that must pass to proceed
EARS	"Easy Approach to Requirements Syntax" format

Additional Documentation

Document	Content
Agent System Overview	Multi-agent orchestration framework, agent lifecycle, heartbeat protocol
App Overview	User-centric product perspective, 12-step user flow, page catalog
Spec-Sandbox Subsystem Strategy	Extraction strategy for independent spec-sandbox subsystem
Architecture: Current vs Target	Gap analysis between implemented and designed architecture
Integration Gaps & Known Issues	Orphaned services, event gaps, DAG wiring, TODOs, fix instructions
LLM Service Layer	LLMService, PydanticAIService, structured outputs, Fireworks.ai
Service Catalog	Complete catalog of all ~94 backend services by domain

Next Steps

Fix Integration Gaps: Address critical gaps first — DAG wiring complete, services initialized in orchestrator
Verify Event Flow: Use checklist in Integration Gaps doc to verify each event; ~135 orphaned events remain
Test DAG System: Services are wired; needs integration tests with parallel tasks
Complete Live Preview: ~80% done, needs HMR WebSocket connection (see #87)
Add Frontend Tests: No test infrastructure exists yet (see #86)
Implement Alerting Channels: Slack/email routers are stubs (see #78)
Update This Document: Keep the Service Initialization Map current as services change

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OmoiOS Architecture: Dynamic Swarm Orchestration

Executive Summary

Architecture Overview

System Deep-Dives

1. Planning System

2. Execution System

3. Discovery System

4. Readjustment System

5. Frontend Architecture

6. Real-Time Event System

7. Authentication & Security

8. Billing & Subscriptions

9. MCP Integration

10. GitHub Integration

11. Database Schema

12. Configuration System

13. API Route Catalog

Memory & Context System

Hybrid Search (Reciprocal Rank Fusion)

ACE Workflow

Deduplication System

Similarity Thresholds

pgvector Usage

DAG Merge System

Service Initialization Map (CRITICAL REFERENCE)

Service Availability Matrix

Testing Requirements

What Needs Testing

Glossary

Additional Documentation

Next Steps

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

OmoiOS Architecture: Dynamic Swarm Orchestration

Executive Summary

Architecture Overview

System Deep-Dives

1. Planning System

2. Execution System

3. Discovery System

4. Readjustment System

5. Frontend Architecture

6. Real-Time Event System

7. Authentication & Security

8. Billing & Subscriptions

9. MCP Integration

10. GitHub Integration

11. Database Schema

12. Configuration System

13. API Route Catalog

Memory & Context System

Hybrid Search (Reciprocal Rank Fusion)

ACE Workflow

Deduplication System

Similarity Thresholds

pgvector Usage

DAG Merge System

Service Initialization Map (CRITICAL REFERENCE)

Service Availability Matrix

Testing Requirements

What Needs Testing

Glossary

Additional Documentation

Next Steps