Skip to content

Latest commit

 

History

History
451 lines (323 loc) · 24 KB

File metadata and controls

451 lines (323 loc) · 24 KB

Hermes Agent vs OpenClaw: Technical Architecture Comparison

A Side-by-Side Breakdown for Software Architects


Source Metadata

Agent Repo Commit Branch Clone date
OpenClaw github.com/openclaw/openclaw 8b2d24b main 2026-04-01
Hermes Agent github.com/NousResearch/hermes-agent HEAD at review main 2026-04-01

1. Executive Summary

OpenClaw and Hermes Agent solve the same fundamental problem — making stateless LLMs behave like persistent, tool-using assistants — but they diverge sharply in implementation language, storage architecture, memory philosophy, extensibility model, and security posture. OpenClaw is a TypeScript/Node.js system built around a WebSocket gateway daemon with file-based memory and Docker sandboxing. Hermes is a Python system built around a synchronous agent loop with SQLite-backed sessions, bounded tool-managed memory, and multiple terminal backends. Both support multi-channel messaging, skills, sub-agents, cron scheduling, and context compression — but the internals are architecturally distinct.

This document is a technical comparison. It assumes familiarity with both systems or the companion OpenClaw Architecture Deep Dive.


2. Foundational Differences

Dimension OpenClaw Hermes Agent
Language TypeScript / Node.js Python 3.11+
Runtime model Single async daemon (Gateway) Synchronous agent loop + async gateway
Primary storage Markdown files (JSONL transcripts) SQLite (WAL mode) + Markdown memory files
Package npm (openclaw) pip/uv (hermes-agent[all])
Binary distribution Node.js required Python required; Node.js optional (for gateway platforms)
License Source-available (custom) MIT
Origin Independent project Nous Research (Hermes is an explicit OpenClaw-inspired fork/reimplementation)

3. Agent Loop Architecture

OpenClaw: Gateway-Mediated, Queue-Serialized

OpenClaw's agent loop is mediated by the Gateway daemon. Messages flow through a lane-aware command queue before reaching the embedded pi-mono runtime:

Inbound → Channel Bridge → Session Resolution → Command Queue → pi-mono Runtime → LLM
                                                    │
                                    ┌───────────────┴───────────────┐
                                    │ Lane-based FIFO               │
                                    │  • Global: maxConcurrent=4    │
                                    │  • Per-session: concurrency=1 │
                                    │  • Sub-agent: concurrency=8   │
                                    │  • Cron: parallel lane         │
                                    └───────────────────────────────┘

Queue modes (collect, steer, followup, steer-backlog) control how inbound messages interact with active runs. The Gateway is the single writer for each session — no concurrent writes to transcripts, deterministic ordering guaranteed.

Hermes: Direct Synchronous Loop

Hermes uses a direct synchronous orchestration pattern. AIAgent.run_conversation() is the core loop:

User message → AIAgent.run_conversation()
                → Build system prompt (cached)
                → Maybe preflight-compress
                → Build api_messages
                → Inject ephemeral layers
                → Apply prompt caching
                → Interruptible API call
                → If tool_calls: execute → append results → loop
                → If final text: persist → return

The loop is explicitly interruptible — the CLI or gateway can cancel mid-flight. Tool execution uses either sequential or concurrent strategies depending on the tool mix.

Key Difference

OpenClaw separates orchestration (Gateway) from inference (pi-mono runtime). The Gateway owns all state and connection management; the runtime is an embedded component. Hermes combines both into a single AIAgent class — the gateway is a separate process that dispatches to AIAgent instances but doesn't own the agent loop itself.

Concurrency model: OpenClaw's lane-based queue is a formal concurrency primitive. Hermes handles write contention at the SQLite level with application-level retries, random jitter, and BEGIN IMMEDIATE transactions.


4. Session Storage

OpenClaw: File-Based (JSONL + JSON)

~/.openclaw/agents/<agentId>/sessions/
├── <sessionId>.jsonl          # Full transcript (append-only)
└── sessions.json              # Session key → {sessionId, updatedAt, ...} map
  • Transcripts are append-only JSONL — each line is a message/event
  • Compaction summaries are persisted in the JSONL (permanent)
  • Session metadata stored in a flat JSON file
  • No database dependency (SQLite used only for vector search)
  • Migration = git clone + config update

Hermes: SQLite Database (WAL Mode)

~/.hermes/state.db (SQLite, WAL mode)
├── sessions          # Metadata, token counts, billing, lineage
├── messages          # Full message history per session
├── messages_fts      # FTS5 virtual table for full-text search
└── schema_version    # Migration tracking
  • Full relational schema with token/cost accounting per session
  • FTS5 full-text search across all sessions (no external vector DB)
  • Session lineage via parent_session_id chains (compression-triggered splits)
  • Schema migrations tracked and applied incrementally (currently v6)
  • Write contention: 15 retries, 20-150ms random jitter, BEGIN IMMEDIATE

Key Difference

OpenClaw's file approach is simpler, human-readable, and git-backable. Hermes' SQLite approach enables structured queries (cost analytics, session lineage, FTS5 search) without external tools. OpenClaw needs a separate vector index (sqlite-vec) for semantic search; Hermes gets full-text search for free via FTS5 triggers but uses a separate approach for semantic recall (session_search with LLM summarization vs. OpenClaw's embedding-based hybrid BM25+vector).


5. Memory Architecture

This is the most significant architectural divergence.

OpenClaw: Unbounded File-Based Memory

┌─────────────────────────────────────────┐
│ Layer 4: Semantic Vector Search         │ ← BM25 + vector hybrid (70/30)
│          (SQLite + embeddings)          │
├─────────────────────────────────────────┤
│ Layer 3: MEMORY.md (curated long-term)  │ ← No size limit, manually maintained
├─────────────────────────────────────────┤
│ Layer 2: memory/YYYY-MM-DD.md (daily)   │ ← Append-only daily notes
├─────────────────────────────────────────┤
│ Layer 1: Session context (JSONL)        │ ← Current context window
└─────────────────────────────────────────┘
  • No hard size limits on MEMORY.md — grows organically
  • Memory is plain Markdown the model reads/writes via read/write/edit tools
  • Daily logs provide temporal structure
  • Vector search (BM25 + embeddings) enables semantic recall across all files
  • Pre-compaction memory flush: silent agentic turn writes durable notes before context is summarized
  • Memory search exposed as memory_search + memory_get tools

Hermes: Bounded Tool-Managed Memory

┌─────────────────────────────────────────┐
│ Layer 3: Session Search (FTS5 + LLM)    │ ← Cross-session recall with summarization
├─────────────────────────────────────────┤
│ Layer 2: MEMORY.md (~800 tokens, 2.2K)  │ ← Hard-capped, tool-managed
│          USER.md (~500 tokens, 1.4K)    │ ← Hard-capped, tool-managed
├─────────────────────────────────────────┤
│ Layer 1: Session context (SQLite)       │ ← Current context window
└─────────────────────────────────────────┘
│
│  + Optional: Honcho (dialectic user modeling)
  • Hard character limits: MEMORY.md = 2,200 chars (~800 tokens), USER.md = 1,375 chars (~500 tokens)
  • Memory managed via a dedicated memory tool (add/replace/remove actions)
  • Frozen snapshot pattern: injected at session start, never mutated mid-session
  • Capacity shown in prompt header (67% — 1,474/2,200 chars) so the model self-manages
  • Cross-session recall via session_search (FTS5 + Gemini Flash summarization)
  • Optional Honcho integration for dialectic user modeling across sessions
  • Security scanning on memory writes (injection/exfiltration detection)

Key Difference

OpenClaw treats memory as an unbounded knowledge base — write anything, search semantically later. The model uses standard file tools. Hermes treats memory as a bounded cache — fixed capacity forces the model to curate aggressively. The model uses a dedicated memory tool with add/replace/remove semantics and substring matching.

OpenClaw's approach scales better for knowledge-heavy agents but risks prompt bloat. Hermes' approach guarantees bounded token cost (~1,300 tokens/session) but requires the model to make curation decisions. Both do pre-compaction memory flush; OpenClaw uses a silent agentic turn, Hermes uses gateway-level session hygiene.

For cross-session recall: OpenClaw uses vector embeddings (BM25+vector hybrid, 70/30 weighting); Hermes uses FTS5 keyword search + LLM summarization. Different tradeoffs — vector catches semantic matches but needs an embedding pipeline; FTS5 is zero-setup but misses paraphrased recall.


6. Prompt Assembly

OpenClaw: Dynamic Assembly Per-Turn

System prompt assembled from multiple sources every turn:

  • Tooling descriptions + Safety guardrails + Skills metadata
  • Workspace files: AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md
  • Runtime metadata (host/OS/model/thinking level)
  • Bootstrap files truncated at bootstrapMaxChars (default 20K)
  • Time is timezone-only (no dynamic clock) for cache stability

Hermes: Cached + Ephemeral Split

Hermes deliberately separates cached and ephemeral prompt layers:

Cached layers (stable across turns):

  1. Agent identity (SOUL.md or default)
  2. Tool-aware behavior guidance
  3. Honcho static block
  4. Frozen MEMORY snapshot
  5. Frozen USER profile snapshot
  6. Skills index
  7. Context files (AGENTS.md, .cursorrules, etc.)
  8. Timestamp + platform hint

Ephemeral layers (API-call-time only, never persisted):

  • ephemeral_system_prompt
  • Prefill messages
  • Gateway session context overlays
  • Honcho recall injected into current-turn user message

Anthropic Prompt Caching (Hermes)

Hermes implements Anthropic's cache_control breakpoints:

  • System prompt = breakpoint 1 (stable)
  • 3rd/2nd/last messages = rolling breakpoints 2-4
  • ~75% input token cost reduction on multi-turn conversations
  • TTL configurable: 5m default, 1h for long sessions

OpenClaw's time-as-timezone-only design achieves similar cache stability but without explicit provider-level caching primitives.


7. Context Compression

OpenClaw: Compaction + Pruning (Two Mechanisms)

  • Compaction: summarizes older messages, persists summary in JSONL transcript (permanent)
  • Pruning: trims old tool results in-memory per request (non-destructive)
  • Pre-compaction memory flush via silent agentic turn

Hermes: Dual-Layer Compression

Two independent compression systems:

Gateway Session Hygiene (85% threshold) → Safety net, rough estimate
         ↓
Agent ContextCompressor (50% threshold) → Primary, real token counts

4-phase algorithm:

  1. Prune old tool results (cheap, no LLM call — replaces >200 char outputs)
  2. Determine boundaries (head/middle/tail with tool-group alignment)
  3. Generate structured summary (using auxiliary LLM, template-based)
  4. Assemble compressed messages (head + summary + tail, orphan cleanup)

Iterative re-compression: previous summary passed to LLM with "update, don't re-summarize" instructions. Items move from "In Progress" to "Done" across compressions.

Session lineage: compression triggers a new session ID linked via parent_session_id, creating traceable compression chains.

Key Difference

Both use LLM-generated summaries, but Hermes' structured summary template (Goal/Progress/Decisions/Files/Next Steps) produces more consistently useful compressions than OpenClaw's freeform approach. Hermes' dual-layer (gateway safety net + agent compressor) provides defense in depth against sessions that escape normal compression. OpenClaw's pruning (trim tool results without LLM) is cheaper for the common case of verbose tool output.


8. Tool Systems

OpenClaw: Policy-Layered, Plugin-Extensible

Core tools built into the runtime:

  • File ops (read/write/edit), Shell (exec/process), Browser (CDP), Web, Message, Cron, Memory, Sessions, Nodes, Canvas, TTS, Gateway self-management

Tool policy: layered allow/deny system:

Global deny → Per-agent deny → Global allow → Per-agent allow → Default

Extensibility via plugins (in-process TypeScript modules with lifecycle hooks) and hooks (event-driven scripts).

Hermes: Registry-Based, Self-Registering

Tools self-register via registry.register() at import time. Central dispatch through ToolRegistry:

registry.register(
    name="terminal", toolset="terminal",
    schema={...}, handler=handle_terminal,
    check_fn=check_terminal,        # Availability gate
    requires_env=["SOME_VAR"],      # For UI display
    is_async=False,
)

Toolset system: named bundles resolved via platform presets (hermes-cli, hermes-telegram, etc.) + explicit enable/disable.

40+ tools organized into toolsets:

  • terminal, file, web, browser, vision, image_generation, skills, cron, tts, todo, memory, session_search, delegate, send_message, honcho, homeassistant, code_execution, clarify, etc.

DANGEROUS_PATTERNS approval system: regex-based detection of destructive commands → interactive approval (CLI) or async approval (gateway). Smart approval via auxiliary LLM for false positives. Per-session state + permanent allowlist.

Key Difference

OpenClaw's tool policy is declarative (JSON config) with a formal layered resolution order. Hermes' tool system is imperative (Python registration with check functions) with a toolset/preset resolution model. OpenClaw has richer extensibility (plugins with lifecycle hooks, in-process); Hermes has richer built-in tools (vision, image generation, Honcho, Home Assistant, code execution sandbox).

OpenClaw's exec approval is sender-based (authorized senders can approve); Hermes has a more sophisticated approval flow with pattern-based detection, smart LLM approval, and permanent allowlisting.


9. Skills System

Both implement the agentskills.io open standard with near-identical patterns:

Aspect OpenClaw Hermes
Format SKILL.md with YAML frontmatter SKILL.md with YAML frontmatter
Loading Lazy (model reads SKILL.md on demand) Progressive disclosure (list → view → reference)
Discovery Workspace → Managed → Bundled Local ~/.hermes/skills/ (single source) + external dirs
Gating requires.bins, requires.env, requires.config, os platforms, fallback_for_toolsets, requires_toolsets
Self-creation Agent can write skills via file tools Agent creates skills autonomously after complex tasks
Self-improvement Manual updates Skills self-improve during use (learning loop)
Hub ClawhHub (clawhub.ai) Skills Hub (agentskills.io)
Slash commands No Yes (/skill-name triggers skill loading)

Key Difference

Hermes has autonomous skill creation — after solving a complex task, it can automatically create a skill from the experience. It also has a learning loop where skills self-improve during use. OpenClaw's skills are static unless manually edited. Hermes also has conditional activation (fallback skills that appear only when premium tools are unavailable) and secure setup on load (prompts for API keys only when the skill is accessed).


10. Messaging & Gateway

OpenClaw: WebSocket-First Daemon

  • Single long-lived Node.js daemon
  • Typed WebSocket API (JSON frames with req/res/event protocol)
  • All clients connect over one WebSocket: macOS app, CLI, web UI, mobile nodes
  • Idempotency keys for side-effecting methods
  • Channel bridges: WhatsApp (Baileys), Telegram (grammY), Discord, Slack, Signal, iMessage, WebChat
  • Device pairing with challenge-nonce trust model
  • DM scope options: main, per-peer, per-channel-peer

Hermes: Orchestration Process

  • Long-running Python process (hermes gateway)
  • Platform adapters (Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant)
  • Multi-source config: env vars + gateway.json + bridged values from config.yaml
  • DM pairing flows with platform-specific authorization
  • Session routing by platform + user/chat identity + thread/topic
  • Delivery routing: home channel, explicit targets, mirroring

Key Difference

OpenClaw's WebSocket-first protocol enables richer client integration (real-time streaming, device pairing, node system). Hermes' gateway is simpler — it's a dispatch layer that routes to AIAgent instances. OpenClaw's node system (camera, screen, location, commands on paired devices) has no Hermes equivalent.


11. Sub-Agent Architecture

Aspect OpenClaw Hermes
Isolation Dedicated session (subagent:<uuid>) Spawned via delegate_task
Nesting Prohibited (no fan-out) Shared iteration budget across parent/child
Tool restrictions No session tools by default Budget pressure hints near limit
Result delivery Announce pattern (post back to requester) Return to parent context
Concurrency Dedicated lane (max 8) Thread-based parallel execution
Model selection Can use cheaper models Can use different providers
ACP integration Spawn coding agents (Codex, Claude Code) via sessions_spawn ACP editor integration (stdio/JSON-RPC)

12. Terminal / Execution Environments

OpenClaw: Host + Docker Sandbox

  • Modes: off (host), non-main (sandbox non-main sessions), all
  • Scope: per-session, per-agent, or shared container
  • Workspace access: none, read-only, or read-write mount
  • Elevated execution bypass for authorized senders

Hermes: Six Terminal Backends

  • Local: direct host execution
  • Docker: containerized execution
  • SSH: remote host execution
  • Daytona: serverless persistence (hibernates when idle)
  • Modal: serverless cloud execution
  • Singularity: HPC container execution

Key Difference

Hermes offers significantly more execution environments. Daytona and Modal provide serverless persistence — the agent's environment hibernates between sessions, costing nothing when idle. OpenClaw's sandboxing is focused on security isolation; Hermes' backends are focused on deployment flexibility.


13. Security Model

Layer OpenClaw Hermes
Auth Gateway token (all connections) Platform allowlists + DM pairing
Device trust Challenge-nonce pairing, device tokens Platform-specific pairing flows
Tool control Layered allow/deny policy (JSON) Toolset enable/disable + per-tool check_fn
Exec approval Sender-based authorization Pattern-based detection + smart LLM approval + permanent allowlist
Sandbox Docker (off/non-main/all) Six backends (Docker, Daytona, Modal, etc.)
Outbound Send policy (outbound gates) Not explicitly documented
Prompt safety Advisory guardrails in system prompt Advisory guardrails in system prompt
Memory safety MEMORY.md only in private sessions Security scanning on memory writes (injection/exfiltration)

Key Difference

OpenClaw has a 7-layer formal security model with explicit outbound send policy. Hermes has more sophisticated command approval (regex patterns, smart LLM approval, session state, permanent allowlist). Hermes scans memory writes for injection attempts; OpenClaw protects memory by scope (private sessions only).


14. Research & Training (Hermes Only)

Hermes includes infrastructure that has no OpenClaw equivalent:

  • Batch trajectory generation for SFT data
  • Atropos RL environments for reinforcement learning
  • Trajectory compression for training tool-calling models
  • Environment framework for benchmarks and evaluation

This reflects Hermes' origin at Nous Research — the agent is both a product and a research platform for improving the next generation of models.


15. Summary Matrix

Capability OpenClaw Hermes
Language/Runtime TypeScript / Node.js Python
Session Storage JSONL files SQLite (WAL)
Memory Model Unbounded files + vector search Bounded tool-managed + FTS5
Memory Recall BM25 + vector embeddings (hybrid) FTS5 + LLM summarization
Prompt Caching Cache-stable timestamps Anthropic cache_control breakpoints
Compression Compaction + pruning Dual-layer (gateway + agent) with structured templates
Tool Extension Plugins (in-process) + Hooks (event) Registry (self-registering) + Toolsets
Exec Approval Sender-based Pattern + smart LLM + allowlist
Terminal Backends Local + Docker Local + Docker + SSH + Daytona + Modal + Singularity
Skills Self-Creation No Yes (autonomous + self-improving)
Node/Device Control Yes (camera, screen, location, run) No
User Modeling USER.md (manual) USER.md + Honcho (AI dialectic)
RL/Training No Atropos + trajectories + benchmarks
OpenClaw Migration N/A Built-in (hermes claw migrate)
Gateway Protocol Typed WebSocket (req/res/event) Platform adapters (dispatch)
Concurrent Writes Lane-based command queue SQLite WAL + application-level retries

16. Architectural Philosophy

OpenClaw is infrastructure-first: a robust daemon that owns all state, with formal concurrency primitives, a typed wire protocol, and defense-in-depth security. It's optimized for always-on deployment with rich client integration (desktop apps, mobile nodes, device pairing). The workspace-as-git-repo philosophy makes everything human-readable and version-controllable.

Hermes is research-first: a flexible agent loop designed for experimentation, with multiple execution backends, autonomous skill creation, RL training infrastructure, and user modeling. It's optimized for the developer who wants to customize everything — switch models, add terminal backends, generate training data, run benchmarks. The bounded memory system forces efficient curation.

Both are converging: Hermes has an explicit OpenClaw migration path, compatible skills format, and similar prompt assembly patterns. The architectures are different enough to represent genuine alternative approaches to the same problem — not just language ports.


Comparison authored by Loki@FastStart ⚡ — April 2026 Sources: OpenClaw Architecture Deep Dive, Hermes Agent developer documentation (hermes-agent.nousresearch.com/docs)