Skip to content

saml212/harness-builder

Repository files navigation

harness-builder

Scaffold a Claude Code (or Codex) harness that lets one LLM agent work unsupervised for hours and refuse to ship slop.

harness-builder is a Claude Code skill that bootstraps the infrastructure underneath an autonomous coding session — hooks, gates, capture mechanisms, fresh-context evaluators — distilled from production harnesses for coding, research, and hybrid workflows.

It doesn't orchestrate teams of agents (see revfactory/harness). It doesn't give you a Plan→Work→Review cycle (see Chachamaru127/claude-code-harness). It builds the substrate underneath either: the boring discipline that makes long autonomous runs not end in slop committed at 3am.

=== session-report ===

Conventions (non-negotiable):
  - Author: human author on every commit. NO AI trailers. No --no-verify.
  - Sub-agents: cascade, one at a time, never parallel.
  - Backpressure: after non-trivial changes, invoke /backpressure (subagent
    evaluates as the user) before declaring done.

Recent learnings:
  - fresh-context evaluation > self-evaluation; backpressure must be a separate context
  - record.sh must use pwd -P; macOS /tmp symlinks /private/tmp
  - codify-learnings should write to project_root, not pwd

That's the SessionStart orientation a fresh agent gets in 60 seconds, after harness-builder scaffolds.

Why this exists

Three failure modes collapse autonomous agent work into slop:

  1. Conventions in prose decay. A 200-line CLAUDE.md the agent reads at session start is forgotten three turns later. Hooks make rules non-bypassable.
  2. Self-evaluation drifts. An agent grading its own diff is invested. Fresh-context subagents grade with no skin in the game.
  3. Mid-session insights vanish. The agent learns something useful and never writes it down. [LEARN] blocks captured automatically; deliberately codified at session end.

The 5 principles, mapped to enforcement

Principle Enforced by
Programmatic > remembered hooks fail-close on git commit, Write, Edit
Fresh-context evaluation > self-evaluation /backpressure skill + user-perspective-evaluator agent
Capture continuously, codify deliberately Stop hooks scan for [LEARN] / [DEAD-END] blocks → memory/*.jsonl
Cascade subagents, never parallel convention in CLAUDE.md, audited at backpressure-time
Substrate-portable means scripts+schema, not prompts hooks + scripts in templates/; CLAUDE.md is why, not what

Full reasoning: references/principles.md.

60-second install

# 1. Clone into your skills directory.
git clone https://github.com/<your-org>/harness-builder ~/.claude/skills/harness-builder

# 2. Verify it's discoverable.
ls ~/.claude/skills/harness-builder/SKILL.md   # should exist

# 3. From any project where you want a harness:
cd /path/to/your/project
bash ~/.claude/skills/harness-builder/scripts/scaffold.sh "$(pwd)" coding

Or invoke /harness-builder from inside a Claude Code session — it walks you through the 4 classification questions interactively.

Dependencies: bash, jq, git. Optional but recommended: complexipy (Python), gocognit (Go), eslint-plugin-sonarjs (TS/JS) for the complexity gate.

What gets scaffolded

A coding harness for coding+claude-code+no-long-running+blocking:

your-project/
├── CLAUDE.md                              # 76-line process doc
├── HARNESS-PLAN.md                        # what was decided + dogfood checklist
├── memory/MEMORY.md                       # the index — entries grow over time
└── .claude/
    ├── settings.json                      # registers all hooks
    ├── hooks/                             # 8 hooks that enforce the rules
    │   ├── enforce-author.sh              #   no AI trailers, no --no-verify, no --amend
    │   ├── pre-commit-quality-gate.sh     #   blocks commit unless backpressure ran
    │   ├── complexity-gate.sh             #   cognitive-complexity per language
    │   ├── doc-bloat-gate.sh              #   warns on new docs/*.md
    │   ├── codify-learnings.sh            #   Stop: extracts [LEARN] → memory/learnings.jsonl
    │   ├── deadend-capture.sh             #   Stop: extracts [DEAD-END] → memory/dead-ends.jsonl
    │   ├── codify-prompt.sh               #   Stop: nudges /codify-learnings on substantive sessions
    │   ├── session-report.sh              #   SessionStart: orientation in <60s
    │   └── lib/git-helpers.sh             #   shared is_git_commit / extract_git_C_path
    ├── skills/
    │   ├── backpressure/                  # the load-bearing skill
    │   ├── clean/                         # slop reduction
    │   ├── codify-learnings/              # session-end loop closer
    │   └── propose-harness-change/        # for editing the harness itself
    └── agents/
        ├── user-perspective-evaluator.md  # fresh-context backpressure subagent
        └── learnings-extractor.md         # fresh-context codify-learnings subagent

Plus a HARNESS-PLAN.md at your project root with the classification, the patterns installed, and the 6-step dogfood checklist you should run immediately to verify the harness actually works. (If you skip dogfood, you've built theatre. Don't.)

Worked example

See examples/coding-quickstart/ for a before/after walkthrough showing the harness blocking a slop commit, demanding /backpressure, and accepting the rewrite.

Audit mode

For an existing repo where you're not sure if the harness is complete:

bash ~/.claude/skills/harness-builder/scripts/audit.sh /path/to/repo

Writes HARNESS-AUDIT.md with a per-pattern checklist (✅/⚠️/❌) and remediation hints. Read-only — never modifies your repo.

Comparison

harness-builder revfactory/harness Chachamaru127/claude-code-harness
What it builds the substrate underneath any agent workflow a team of specialized agents a Plan→Work→Review cycle
Optimizes for unsupervised hours-long sessions, slop refusal multi-agent decomposition single-agent disciplined iteration
Hooks 8 install-and-go (commit, complexity, capture, orientation) mostly skills a few + PreCompact handler
Backpressure fresh-context subagent, hook-blocked at commit n/a n/a
Memory [LEARN]/[DEAD-END] capture + active codification n/a per-session
Substrate Claude Code (Codex coming) Claude Code Claude Code

This skill is the boring infrastructure underneath either of those — drop it in first, then layer their team-orchestration or P→W→R cycle on top.

Limitations (be honest)

  • Codex substrate is stubbed — scaffold prints an actionable error pointing you at references/harness-types.md for manual translation. Auto-generation is v2.
  • Research-harness extensions (budget gate, autopilot, calibration, ZCM) are described in references/pattern-catalog.md but not bundled. They're harder to ship as turnkey templates because they integrate with provider APIs (RunPod, Vast, etc.). v1 ships a coding-harness; research is 🔜.
  • Client-side hooks are bypassable by a determined adversary (git -c core.hooksPath=/dev/null, git commit-tree, shell aliases). For defense in depth, mirror the rules in CI. The hooks stop the common case (forgetful agent, hurried commit) — not malicious circumvention.
  • No interactive /harness-builder UI yet. The skill works via the script directly today. Interactive classification (read 4 answers, scaffold) is a thin wrapper away.

Contributing — and why this matters

This skill is a learning artifact. Every person who uses it sees something the next person should inherit. The contribution loop is wired into the skill.

If you used harness-builder and discovered a better hook, a missing pattern, a portability bug, or a doc that confused your fresh agent — the upstream wants the fix. From your local clone:

cd ~/.claude/skills/harness-builder
bash scripts/contribute.sh <short-slug>

contribute.sh forks the repo to your account, creates an improvement/<slug> branch off main, and prints the next-step commands (edit, commit, push, gh pr create). It doesn't try to guess what you want to fix — it removes the ceremony so you can spend the time on the actual change.

When /codify-learnings runs at session end and extracts a generalizable pattern from your work, the skill points you at this script. Your [LEARN] block becomes the upstream PR's Provenance section — that's the trust signal future readers will use to evaluate your contribution.

Full contribution philosophy + what counts as generalizable: see references/contributing.md.

Quality bar: shellcheck-clean, dogfood-receipt-attached, scaffold-and-audit-itself green. The skill audits to 15/15 on its own templates — that's what your contribution should preserve.

License

MIT. See LICENSE.

Credits

Distilled from production harnesses for autonomous LLM work — coding, research, and hybrid. The patterns are battle-tested; the OSS extraction sands off the project-specific edges.

About

Scaffold a Claude Code (or Codex) harness that lets one LLM agent work unsupervised for hours and refuse to ship slop.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages