GBrain

Your AI agent is smart but it doesn't know anything about your life. GBrain fixes that. Meetings, emails, tweets, calendar events, voice calls, original ideas... all of it flows into a searchable knowledge base that your agent reads before every response and writes to after every conversation. The agent gets smarter every day.

~30 minutes to a fully working brain. Your agent does the work. Database ready in 2 seconds (PGLite, no server). Schema, import, embeddings, and integrations take 15-30 minutes depending on brain size. You just answer questions about API keys.

Requires a frontier model. Tested with Claude Opus 4.6 and GPT-5.4 Thinking. Likely to break with smaller models.

Need an AI agent first?

GBrain is designed to be installed and operated by an AI agent. If you don't have one running yet:

OpenClaw — Deploy AlphaClaw on Render (one click, requires 8GB+ RAM instance)
Hermes Agent — Deploy on Railway using this template (one click)

Start here

https://github.com/garrytan/gbrain — clone this repo into your agent's filesystem. It's home base for docs, skills, upgrades, and recipes. To upgrade later: git pull origin main && bun install.

Copy this block into OpenClaw, Hermes, or any persistent AI agent. The agent reads the docs, does the work, and asks you for API keys. ~30 minutes.

INSTALL:

  git clone https://github.com/garrytan/gbrain.git ~/gbrain && cd ~/gbrain
  curl -fsSL https://bun.sh/install | bash
  export PATH="$HOME/.bun/bin:$PATH"
  bun install && bun link
  Verify: gbrain --version
  (If gbrain is not found, restart your shell or add the PATH export
  to your shell profile.)

API KEYS — ask the user for these:

  export OPENAI_API_KEY=sk-...          # required for vector search
  export ANTHROPIC_API_KEY=sk-ant-...   # optional, improves search quality
  Save to shell profile or .env. Without OpenAI, keyword search still
  works. Without Anthropic, search works but skips query expansion.

SET UP THE BRAIN:

  gbrain init                           # PGLite, no server needed
  Verify: gbrain doctor --json          # all checks pass

  The user's markdown files (notes, docs, brain repo) are SEPARATE from
  this tool repo. Ask the user where their files are, or create a new
  brain repo: mkdir -p ~/brain && cd ~/brain && git init

  Read ~/gbrain/docs/GBRAIN_RECOMMENDED_SCHEMA.md — set up the MECE
  directory structure (people/, companies/, concepts/, etc.) inside the
  user's brain repo, NOT inside ~/gbrain.

  gbrain import ~/brain/ --no-embed     # import markdown files
  gbrain embed --stale                  # generate vector embeddings
  gbrain query "key themes across these documents?"

GO LIVE — read and permanently adopt these docs from ~/gbrain/:

  docs/GBRAIN_SKILLPACK.md — the full agent playbook. Save to your
    memory: brain-first lookup on every message, entity detection,
    back-linking. This changes how you behave permanently.

  Set up recurring jobs using your platform's scheduler (crontab,
  OpenClaw cron, Railway cron, or a process manager):
    - Live sync (every 15 min): gbrain sync --repo ~/brain && gbrain embed --stale
    - Auto-update (daily): gbrain check-update --json (tell user, never auto-install)
    - Dream cycle (nightly): read docs/guides/cron-schedule.md for the
      full protocol — entity sweep, citation fixes, memory consolidation.
      This is what makes the brain compound. Do not skip it.
    - Weekly: gbrain doctor --json && gbrain embed --stale

  docs/GBRAIN_VERIFY.md — run all 6 verification checks. #4 (live sync
    actually works) is the most important.

INTEGRATIONS — run gbrain integrations list. Each recipe in
  ~/gbrain/recipes/ is a self-contained installer: it tells you what
  credentials to ask for, how to validate, and what cron to register.
  Ask the user which integrations they want (email, calendar, voice,
  Twitter). Set up the ones they have credentials for.
  Verify: gbrain integrations doctor (after at least one is configured)

UPGRADE: cd ~/gbrain && git pull origin main && bun install

Without an agent (standalone CLI)

git clone https://github.com/garrytan/gbrain.git && cd gbrain && bun install && bun link
gbrain init                     # local brain, ready in 2 seconds
gbrain import ~/notes/          # index your markdown
gbrain query "what themes show up across my notes?"

Getting Data In

Once GBrain is installed, your agent needs data flowing in. GBrain ships integration recipes that your agent sets up for you. It reads the recipe, asks for API keys, validates each one, and runs a smoke test. Markdown is code... the recipe IS the installer.

Recipe	Requires	What It Does
Public Tunnel	—	Fixed URL for MCP + voice (ngrok Hobby $8/mo)
Credential Gateway	—	Gmail + Calendar access (ClawVisor or Google OAuth)
Voice-to-Brain	ngrok-tunnel	Phone calls → brain pages (Twilio + OpenAI Realtime)
Email-to-Brain	credential-gateway	Gmail → entity pages (deterministic collector)
X-to-Brain	—	Twitter → brain pages (timeline + mentions + deletions)
Calendar-to-Brain	credential-gateway	Google Calendar → searchable daily pages
Meeting Sync	—	Circleback transcripts → brain pages with attendees

Run gbrain integrations to see status. Dependencies resolve automatically. See Getting Data In for the full guide.

The Compounding Thesis

Most tools help you find things. GBrain makes you smarter over time.

Signal arrives (meeting, email, tweet, link)
  → Agent detects entities (people, companies, ideas)
  → READ: check the brain first (gbrain search, gbrain get)
  → Respond with full context
  → WRITE: update brain pages with new information
  → Sync: gbrain indexes changes for next query

Every cycle through this loop adds knowledge. The agent enriches a person page after a meeting. Next time that person comes up, the agent already has context. You never start from zero.

An agent without this loop answers from stale context. An agent with it gets smarter every conversation. The difference compounds daily.

"Who should I invite to dinner who knows both Pedro and Diana?" — cross-references the social graph across 3,000+ people pages

"What have I said about the relationship between shame and founder performance?" — searches YOUR thinking, not the internet

"Prep me for my meeting with Jordan in 30 minutes" — pulls dossier, shared history, recent activity, open threads

Voice: "Her" Out of the Box

The voice integration is the strongest demonstration of why a personal brain matters. Call a phone number. Your AI answers. It knows who's calling, pulls their full context from thousands of people pages, references your last meeting, and responds like someone who actually knows your world. When the call ends, a structured brain page appears with the transcript, entity detection, and cross-references.

This isn't a demo. It runs on a real phone number, screens unknown callers, and gets smarter with every call. Your agent picks its own name and personality. WebRTC works in a browser tab with zero setup. A real phone number is optional.

See it in action

The voice recipe ships with GBrain: Voice-to-Brain. Your agent installs it, sets up the voice server, and you have a working AI phone line in 30 minutes. 25 production patterns from a real deployment included.

How this happened

I was setting up my OpenClaw agent and started a markdown brain repo. One page per person, one page per company, compiled truth on top, append-only timeline on the bottom. The agent got smarter the more it knew, so I kept feeding it. Within a week I had 10,000+ markdown files, 3,000+ people with compiled dossiers, 13 years of calendar data, 280+ meeting transcripts, and 300+ captured original ideas.

The agent runs while I sleep. The dream cycle scans every conversation, enriches missing entities, fixes broken citations, and consolidates memory. I wake up and the brain is smarter than when I went to sleep. See the cron schedule guide for setup.

PGLite runs locally by default. gbrain init gives you embedded Postgres with pgvector, hybrid search, and all 37 operations. No server, no subscription. When your brain outgrows local (1000+ files, multi-device access, remote MCP), gbrain migrate --to supabase moves everything to managed Postgres.

Architecture

┌──────────────────┐    ┌───────────────┐    ┌──────────────────┐
│   Brain Repo     │    │    GBrain     │    │    AI Agent      │
│   (git)          │    │  (retrieval)  │    │  (read/write)    │
│                  │    │               │    │                  │
│  markdown files  │───>│  Postgres +   │<──>│  skills define   │
│  = source of     │    │  pgvector     │    │  HOW to use the  │
│    truth         │    │               │    │  brain           │
│                  │<───│  hybrid       │    │                  │
│  human can       │    │  search       │    │  entity detect   │
│  always read     │    │  (vector +    │    │  enrich          │
│  & edit          │    │   keyword +   │    │  ingest          │
│                  │    │   RRF)        │    │  brief           │
└──────────────────┘    └───────────────┘    └──────────────────┘

The repo is the system of record. GBrain is the retrieval layer. The agent reads and writes through both. Human always wins — you can edit any markdown file directly and gbrain sync picks up the changes.

What a Production Agent Looks Like

The numbers above aren't theoretical. They come from a real deployment documented in GBRAIN_SKILLPACK.md — a reference architecture for how a production AI agent uses gbrain as its knowledge backbone.

Read the skillpack. It's the most important doc in this repo. It tells your agent HOW to use gbrain, not just what commands exist:

The brain-agent loop — the read-write cycle that makes knowledge compound
Entity detection — spawn on every message, capture people/companies/original ideas
Enrichment pipeline — 7-step protocol with tiered API spend
Meeting ingestion — transcript to brain pages with entity propagation
Source attribution — every fact traceable to where it came from
Reference cron schedule — 20+ recurring jobs that keep the brain alive

Without the skillpack, your agent has tools but no playbook. With it, the agent knows when to read, when to write, how to enrich, and how to keep the brain alive autonomously. It's a pattern book, not a tutorial. "Here's what works, here's why."

How gbrain fits with OpenClaw/Hermes

GBrain is world knowledge — people, companies, deals, meetings, concepts, your original thinking. It's the long-term memory of what you know about the world.

OpenClaw agent memory (memory_search) is operational state — preferences, decisions, session context, how the agent should behave.

They're complementary:

Layer	What it stores	How to query
gbrain	People, companies, meetings, ideas, media	`gbrain search`, `gbrain query`, `gbrain get`
Agent memory	Preferences, decisions, operational config	`memory_search`
Session context	Current conversation	(automatic)

All three should be checked. GBrain for facts about the world. Memory for agent config. Session for immediate context. Install via openclaw skills install gbrain.

The compounding effect

The real value isn't search. It's what happens after a few weeks of use.

You take a meeting with someone. The agent writes a brain page for them, links it to their company, tags it with the deal. Next week someone mentions that company in a different context. The agent already has the full picture: who you talked to, what you discussed, what threads are open. You didn't do anything. The brain already had it.

Install

Prerequisites

Zero-config start (PGLite). gbrain init creates a local embedded Postgres brain. No accounts, no server, no API keys. Keyword search works immediately. Add API keys later for vector search and LLM-powered features.

For production scale (Supabase). When your brain outgrows local, gbrain migrate --to supabase moves everything to managed Postgres:

Dependency	What it's for	How to get it
Supabase account	Postgres + pgvector database	supabase.com (Pro tier, $25/mo for 8GB)
OpenAI API key	Embeddings (text-embedding-3-large)	platform.openai.com/api-keys
Anthropic API key	Multi-query expansion + LLM chunking (Haiku)	console.anthropic.com

Set the API keys as environment variables:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

The Supabase connection URL is configured during gbrain init --supabase. The OpenAI and Anthropic SDKs read their keys from the environment automatically.

Without an OpenAI key, search still works (keyword only, no vector search). Without an Anthropic key, search still works (no multi-query expansion, no LLM chunking).

GBrain without OpenClaw

GBrain works with any AI agent, any MCP client, or no agent at all. Three paths:

Standalone CLI

Install globally and use gbrain from the terminal:

bun add -g github:garrytan/gbrain
gbrain init                     # PGLite (local, no server needed)
gbrain import ~/git/brain/      # index your markdown
gbrain query "what themes show up across my notes?"

Run gbrain --help for the full list of commands.

MCP server (Claude Code, Cursor, Windsurf, etc.)

GBrain exposes 30 MCP tools via stdio. Add this to your MCP client config:

Claude Code (~/.claude/server.json):

{
  "mcpServers": {
    "gbrain": {
      "command": "gbrain",
      "args": ["serve"]
    }
  }
}

Cursor (Settings > MCP Servers):

{
  "gbrain": {
    "command": "gbrain",
    "args": ["serve"]
  }
}

This gives your agent get_page, put_page, search, query, add_link, traverse_graph, sync_brain, file_upload, and 22 more tools. All generated from the same operation definitions as the CLI.

Remote MCP Server (Claude Desktop, Cowork, Perplexity)

Access your brain from any device, any AI client. Run gbrain serve behind an HTTP server with a public tunnel:

# Set up a public tunnel (see recipes/ngrok-tunnel.md)
ngrok http 8787 --url your-brain.ngrok.app

# Create a bearer token for your client
bun run src/commands/auth.ts create "claude-desktop"

Then add to your AI client:

Claude Code: claude mcp add gbrain -t http https://your-brain.ngrok.app/mcp -H "Authorization: Bearer TOKEN"
Claude Desktop: Settings > Integrations > Add (NOT JSON config, details)
Perplexity: Settings > Connectors > Add remote MCP (details)

Per-client setup guides: docs/mcp/

ChatGPT support requires OAuth 2.1 (not yet implemented). Self-hosted alternatives (Tailscale Funnel, ngrok) documented in docs/mcp/ALTERNATIVES.md.

The tools are not enough. Your agent also needs the playbook: read GBRAIN_SKILLPACK.md and paste the relevant sections into your agent's system prompt or project instructions. The skillpack tells the agent WHEN and HOW to use each tool: read before responding, write after learning, detect entities on every message, back-link everything.

The skill markdown files in skills/ are standalone instruction sets. Copy them into your agent's context:

Skill file	What the agent learns
`skills/ingest/SKILL.md`	How to import meetings, docs, articles
`skills/query/SKILL.md`	3-layer search with synthesis and citations
`skills/maintain/SKILL.md`	Periodic health: stale pages, orphans, dead links
`skills/enrich/SKILL.md`	Enrich pages from external APIs
`skills/briefing/SKILL.md`	Daily briefing with meeting prep
`skills/migrate/SKILL.md`	Migrate from Obsidian, Notion, Logseq, etc.

As a TypeScript library

bun add github:garrytan/gbrain

import { createEngine } from 'gbrain';

// PGLite (local, no server)
const engine = createEngine('pglite');
await engine.connect({ database_path: '~/.gbrain/brain.pglite' });
await engine.initSchema();

// Or Postgres (Supabase / self-hosted)
// const engine = createEngine('postgres');
// await engine.connect({ database_url: process.env.DATABASE_URL });
// await engine.initSchema();

// Search
const results = await engine.searchKeyword('startup growth');

// Read
const page = await engine.getPage('people/pedro-franceschi');

// Write
await engine.putPage('concepts/superlinear-returns', {
  type: 'concept',
  title: 'Superlinear Returns',
  compiled_truth: 'Paul Graham argues that returns in many fields are superlinear...',
  timeline: '- 2023-10-01: Published on paulgraham.com',
});

The BrainEngine interface is pluggable. createEngine() accepts 'pglite' or 'postgres'. See docs/ENGINES.md for details.

PGLite (default) requires no external database. For production scale (7K+ pages, multi-device, remote MCP), use Supabase Pro ($25/mo).

Upgrade

cd ~/gbrain && git pull origin main && bun install

Then run gbrain init to apply any schema migrations (idempotent, safe to re-run).

Setup details

gbrain init defaults to PGLite (embedded Postgres 17.5 via WASM). No accounts, no server. Config saved to ~/.gbrain/config.json.

gbrain init                     # PGLite (default)
gbrain init --supabase          # guided wizard for Supabase
gbrain init --url <conn>        # any Postgres with pgvector

Import is idempotent. Re-running skips unchanged files (SHA-256 content hash). ~30s for text import of 7,000 files, ~10-15 min for embedding.

File storage and migration

Brain repos accumulate binary files: images, PDFs, audio recordings, raw API responses. A repo with 3,000 markdown pages might have 2GB of binaries making git clone painful.

GBrain has a three-stage migration lifecycle that moves binaries to cloud storage while preserving every reference:

Local files in git repo
  │
  ▼  gbrain files mirror <dir>
Cloud copy exists, local files untouched
  │
  ▼  gbrain files redirect <dir>
Local files replaced with .redirect breadcrumbs (tiny YAML pointers)
  │
  ▼  gbrain files clean <dir>
Breadcrumbs removed, cloud is the only copy

Every stage is reversible until clean:

# Stage 1: Copy to cloud (git repo unchanged)
gbrain files mirror ~/git/brain/attachments/ --dry-run   # preview first
gbrain files mirror ~/git/brain/attachments/

# Stage 2: Replace local files with breadcrumbs
gbrain files redirect ~/git/brain/attachments/ --dry-run
gbrain files redirect ~/git/brain/attachments/
# Your git repo just dropped from 2GB to 50MB

# Undo: download everything back from cloud
gbrain files restore ~/git/brain/attachments/

# Stage 3: Remove breadcrumbs (irreversible, cloud is the only copy)
gbrain files clean ~/git/brain/attachments/ --yes

Storage backends: S3-compatible (AWS S3, Cloudflare R2, MinIO), Supabase Storage, or local filesystem. Configured during gbrain init.

Additional file commands:

gbrain files list [slug]           # list files for a page (or all)
gbrain files upload <file> --page <slug>  # upload file linked to page
gbrain files sync <dir>            # bulk upload directory
gbrain files verify                # verify all uploads match local
gbrain files status                # show migration status of directories
gbrain files unmirror <dir>        # remove mirror marker (files stay in cloud)

The file resolver (src/core/file-resolver.ts) handles fallback automatically: if a local file is missing, it checks for a .redirect breadcrumb, then a .supabase marker, and resolves to the cloud URL. Code that references files by path keeps working after migration.

The knowledge model

Every page in the brain follows the compiled truth + timeline pattern:

---
type: concept
title: Do Things That Don't Scale
tags: [startups, growth, pg-essay]
---

Paul Graham's argument that startups should do unscalable things early on.
The most common: recruiting users manually, one at a time. Airbnb went
door to door in New York photographing apartments. Stripe manually
installed their payment integration for early users.

The key insight: the unscalable effort teaches you what users actually
want, which you can't learn any other way.

---

- 2013-07-01: Published on paulgraham.com
- 2024-11-15: Referenced in batch W25 kickoff talk
- 2025-02-20: Cited in discussion about AI agent onboarding strategies

Above the --- separator: compiled truth. Your current best understanding. Gets rewritten when new evidence changes the picture. Below: timeline. Append-only evidence trail. Never edited, only added to.

The compiled truth is the answer. The timeline is the proof.

How search works

Query: "when should you ignore conventional wisdom?"
         |
    Multi-query expansion (Claude Haiku)
    "contrarian thinking startups", "going against the crowd"
         |
    +----+----+
    |         |
  Vector    Keyword
  (HNSW     (tsvector +
  cosine)    ts_rank)
    |         |
    +----+----+
         |
    RRF Fusion: score = sum(1/(60 + rank))
         |
    4-Layer Dedup
    1. Best chunk per page
    2. Cosine similarity > 0.85
    3. Type diversity (60% cap)
    4. Per-page chunk cap
         |
    Stale alerts (compiled truth older than latest timeline)
         |
    Results

Keyword search alone misses conceptual matches. "Ignore conventional wisdom" won't find an essay titled "The Bus Ticket Theory of Genius" even though it's exactly about that. Vector search alone misses exact phrases when the embedding is diluted by surrounding text. RRF fusion gets both right. Multi-query expansion catches phrasings you didn't think of.

Database schema

10 tables in Postgres + pgvector:

pages                    The core content table
  slug (UNIQUE)          e.g. "concepts/do-things-that-dont-scale"
  type                   person, company, deal, yc, civic, project, concept, source, media
  title, compiled_truth, timeline
  frontmatter (JSONB)    Arbitrary metadata
  search_vector          Trigger-based tsvector (title + compiled_truth + timeline + timeline_entries)
  content_hash           SHA-256 for import idempotency

content_chunks           Chunked content with embeddings
  page_id (FK)           Links to pages
  chunk_text             The chunk content
  chunk_source           'compiled_truth' or 'timeline'
  embedding (vector)     1536-dim from text-embedding-3-large
  HNSW index             Cosine similarity search

links                    Cross-references between pages
  from_page_id, to_page_id
  link_type              knows, invested_in, works_at, founded, references, etc.

tags                     page_id + tag (many-to-many)

timeline_entries         Structured timeline events
  page_id, date, source, summary, detail (markdown)

page_versions            Snapshot history for compiled_truth
  compiled_truth, frontmatter, snapshot_at

raw_data                 Sidecar JSON from external APIs
  page_id, source, data (JSONB)

files                    Binary attachments in Supabase Storage
  page_slug (FK)         Links to pages (ON UPDATE CASCADE)
  storage_path, content_hash, mime_type, metadata (JSONB)

ingest_log               Audit trail of import/ingest operations

config                   Brain-level settings (embedding model, chunk strategy, sync state)

Indexes: B-tree on slug/type, GIN on frontmatter/search_vector, HNSW on embeddings, pg_trgm on title for fuzzy slug resolution.

Chunking

Three strategies, dispatched by content type:

Recursive (timeline, bulk import): 5-level delimiter hierarchy (paragraphs, lines, sentences, clauses, words). 300-word chunks with 50-word sentence-aware overlap. Fast, predictable, lossless.

Semantic (compiled truth): Embeds each sentence, computes adjacent cosine similarities, applies Savitzky-Golay smoothing to find topic boundaries. Falls back to recursive on failure. Best quality for intelligence assessments.

LLM-guided (high-value content, on request): Pre-splits into 128-word candidates, asks Claude Haiku to identify topic shifts in sliding windows. 3 retries per window. Most expensive, best results.

Commands

SETUP
  gbrain init [--supabase|--url <conn>]     Create brain (PGLite default, or Supabase)
  gbrain migrate --to supabase|pglite       Migrate between engines (bidirectional)
  gbrain upgrade                            Self-update

PAGES
  gbrain get <slug>                         Read a page (supports fuzzy slug matching)
  gbrain put <slug> [< file.md]             Write/update a page (auto-versions)
  gbrain delete <slug>                      Delete a page
  gbrain list [--type T] [--tag T] [-n N]   List pages with filters

SEARCH
  gbrain search <query>                     Keyword search (tsvector)
  gbrain query <question>                   Hybrid search (vector + keyword + RRF + expansion)

IMPORT/EXPORT
  gbrain import <dir> [--no-embed]          Import markdown directory (idempotent)
  gbrain sync [--repo <path>] [flags]       Git-to-brain incremental sync
  gbrain export [--dir ./out/]              Export to markdown (round-trip)

FILES
  gbrain files list [slug]                  List stored files
  gbrain files upload <file> --page <slug>  Upload file to storage
  gbrain files sync <dir>                   Bulk upload directory
  gbrain files verify                       Verify all uploads

EMBEDDINGS
  gbrain embed [<slug>|--all|--stale]       Generate/refresh embeddings

LINKS + GRAPH
  gbrain link <from> <to> [--type T]        Create typed link
  gbrain unlink <from> <to>                 Remove link
  gbrain backlinks <slug>                   Incoming links
  gbrain graph <slug> [--depth N]           Traverse link graph (recursive CTE, default depth 5)

TAGS
  gbrain tags <slug>                        List tags
  gbrain tag <slug> <tag>                   Add tag
  gbrain untag <slug> <tag>                 Remove tag

TIMELINE
  gbrain timeline [<slug>]                  View timeline entries
  gbrain timeline-add <slug> <date> <text>  Add timeline entry

ADMIN
  gbrain doctor [--json]                    Health checks (pgvector, RLS, schema, embeddings)
  gbrain stats                              Brain statistics
  gbrain health                             Health dashboard (embed coverage, stale, orphans)
  gbrain history <slug>                     Page version history
  gbrain revert <slug> <version-id>         Revert to previous version
  gbrain config [get|set] <key> [value]     Brain config
  gbrain serve                              MCP server (stdio, local)
  gbrain upgrade                            Self-update with feature discovery
  bun run src/commands/auth.ts              Token management (create/list/revoke/test)
  gbrain call <tool> '<json>'               Raw tool invocation
  gbrain --tools-json                       Tool discovery (JSON)

Library and MCP details

See GBrain without OpenClaw above for library usage examples, MCP server config, and skill file loading.

The BrainEngine interface is pluggable. See docs/ENGINES.md for how to add backends. 30 MCP tools are generated from the contract-first operations.ts. Parity tests verify structural identity between CLI, MCP, and tools-json.

Skills

Fat markdown files that tell AI agents HOW to use gbrain. No skill logic in the binary.

Skill	What it does
ingest	Ingest meetings, docs, articles. Updates compiled truth (rewrite, not append), appends timeline, creates cross-reference links across all mentioned entities.
query	3-layer search (keyword + vector + structured) with synthesis and citations. Says "the brain doesn't have info on X" rather than hallucinating.
maintain	Periodic health: find contradictions, stale compiled truth, orphan pages, dead links, tag inconsistency, missing embeddings, overdue threads.
enrich	Enrich pages from external APIs. Raw data stored separately, distilled highlights go to compiled truth.
briefing	Daily briefing: today's meetings with participant context, active deals with deadlines, time-sensitive threads, recent changes.
migrate	Universal migration from Obsidian (wikilinks to gbrain links), Notion (stripped UUIDs), Logseq (block refs), plain markdown, CSV, JSON, Roam.
setup	Set up GBrain from scratch: auto-provision Supabase via CLI, AGENTS.md injection, import, sync. Target TTHW < 2 min.

Engine Architecture

CLI / MCP Server
     (thin wrappers, identical operations)
              |
      BrainEngine interface
       (pluggable backend)
              |
      engine-factory.ts
       (dynamic imports)
              |
     +--------+--------+
     |                  |
PGLiteEngine       PostgresEngine
  (ships v0.7)       (ships v0)
     |                  |
~/.gbrain/brain.pglite  Supabase Pro ($25/mo)
  embedded PG 17.5    Postgres + pgvector + pg_trgm
  via @electric-sql    connection pooling via Supavisor
  /pglite

     gbrain migrate --to supabase/pglite
         (bidirectional migration)

Embedding, chunking, and search fusion are engine-agnostic. Only raw keyword search (searchKeyword) and raw vector search (searchVector) are engine-specific. RRF fusion, multi-query expansion, and 4-layer dedup run above the engine on SearchResult[] arrays. Both engines use the same SQL (PGLite runs real Postgres, not a separate dialect).

Storage estimates

For a brain with ~7,500 pages:

Component	Size
Page text (compiled_truth + timeline)	~150MB
JSONB frontmatter + indexes	~70MB
Content chunks (~22K, text)	~80MB
Embeddings (22K x 1536 floats)	~134MB
HNSW index overhead	~270MB
Links, tags, timeline, versions	~50MB
Total	~750MB

Supabase free tier (500MB) won't fit a large brain. Supabase Pro ($25/mo, 8GB) is the starting point.

Initial embedding cost: ~$4-5 for 7,500 pages via OpenAI text-embedding-3-large.

Docs

For agents:

GBRAIN_SKILLPACK.md -- Start here. Index of all patterns, skills, and integrations
Individual guides -- 17 standalone guides broken out from the skillpack
Getting Data In -- Integration recipes, credential setup, data flow patterns
GBRAIN_VERIFY.md -- Installation verification runbook

For humans:

GBRAIN_RECOMMENDED_SCHEMA.md -- Brain repo directory structure
Infrastructure Layer -- How import, chunking, embedding, and search work
Thin Harness, Fat Skills -- Architecture philosophy
Homebrew for Personal AI -- Why markdown is code

Reference:

GBRAIN_V0.md -- Full product spec, all architecture decisions
ENGINES.md -- Pluggable engine interface: PGLite (default) + Postgres, capability matrix, migration

Contributing

See CONTRIBUTING.md. Run bun test for unit tests. For E2E tests against real Postgres+pgvector: docker compose -f docker-compose.test.yml up -d then DATABASE_URL=postgresql://postgres:postgres@localhost:5434/gbrain_test bun run test:e2e.

Welcome PRs for:

New enrichment API integrations
Performance optimizations
Docker Compose for self-hosted Postgres
Additional engine backends (DuckDB, Turso, etc.)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github		.github
docs		docs
recipes		recipes
scripts		scripts
skills		skills
src		src
test		test
.env.testing.example		.env.testing.example
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TODOS.md		TODOS.md
VERSION		VERSION
bun.lock		bun.lock
docker-compose.test.yml		docker-compose.test.yml
openclaw.plugin.json		openclaw.plugin.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

GBrain

Need an AI agent first?

Start here

Without an agent (standalone CLI)

Getting Data In

The Compounding Thesis

Voice: "Her" Out of the Box

How this happened

Architecture

What a Production Agent Looks Like

How gbrain fits with OpenClaw/Hermes

The compounding effect

Install

Prerequisites

GBrain without OpenClaw

Standalone CLI

MCP server (Claude Code, Cursor, Windsurf, etc.)

Remote MCP Server (Claude Desktop, Cowork, Perplexity)

As a TypeScript library

Upgrade

Setup details

File storage and migration

The knowledge model

How search works

Database schema

Chunking

Commands

Library and MCP details

Skills

Engine Architecture

Storage estimates

Docs

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 17

Languages

Packages