Scaffold a Claude Code (or Codex) harness that lets one LLM agent work unsupervised for hours and refuse to ship slop.
harness-builder is a Claude Code skill that bootstraps the infrastructure underneath an autonomous coding session — hooks, gates, capture mechanisms, fresh-context evaluators — distilled from production harnesses for coding, research, and hybrid workflows.
It doesn't orchestrate teams of agents (see revfactory/harness). It doesn't give you a Plan→Work→Review cycle (see Chachamaru127/claude-code-harness). It builds the substrate underneath either: the boring discipline that makes long autonomous runs not end in slop committed at 3am.
=== session-report ===
Conventions (non-negotiable):
- Author: human author on every commit. NO AI trailers. No --no-verify.
- Sub-agents: cascade, one at a time, never parallel.
- Backpressure: after non-trivial changes, invoke /backpressure (subagent
evaluates as the user) before declaring done.
Recent learnings:
- fresh-context evaluation > self-evaluation; backpressure must be a separate context
- record.sh must use pwd -P; macOS /tmp symlinks /private/tmp
- codify-learnings should write to project_root, not pwd
That's the SessionStart orientation a fresh agent gets in 60 seconds, after harness-builder scaffolds.
Three failure modes collapse autonomous agent work into slop:
- Conventions in prose decay. A 200-line CLAUDE.md the agent reads at session start is forgotten three turns later. Hooks make rules non-bypassable.
- Self-evaluation drifts. An agent grading its own diff is invested. Fresh-context subagents grade with no skin in the game.
- Mid-session insights vanish. The agent learns something useful and never writes it down.
[LEARN]blocks captured automatically; deliberately codified at session end.
| Principle | Enforced by |
|---|---|
| Programmatic > remembered | hooks fail-close on git commit, Write, Edit |
| Fresh-context evaluation > self-evaluation | /backpressure skill + user-perspective-evaluator agent |
| Capture continuously, codify deliberately | Stop hooks scan for [LEARN] / [DEAD-END] blocks → memory/*.jsonl |
| Cascade subagents, never parallel | convention in CLAUDE.md, audited at backpressure-time |
| Substrate-portable means scripts+schema, not prompts | hooks + scripts in templates/; CLAUDE.md is why, not what |
Full reasoning: references/principles.md.
# 1. Clone into your skills directory.
git clone https://github.com/<your-org>/harness-builder ~/.claude/skills/harness-builder
# 2. Verify it's discoverable.
ls ~/.claude/skills/harness-builder/SKILL.md # should exist
# 3. From any project where you want a harness:
cd /path/to/your/project
bash ~/.claude/skills/harness-builder/scripts/scaffold.sh "$(pwd)" codingOr invoke /harness-builder from inside a Claude Code session — it walks you through the 4 classification questions interactively.
Dependencies: bash, jq, git. Optional but recommended: complexipy (Python), gocognit (Go), eslint-plugin-sonarjs (TS/JS) for the complexity gate.
A coding harness for coding+claude-code+no-long-running+blocking:
your-project/
├── CLAUDE.md # 76-line process doc
├── HARNESS-PLAN.md # what was decided + dogfood checklist
├── memory/MEMORY.md # the index — entries grow over time
└── .claude/
├── settings.json # registers all hooks
├── hooks/ # 8 hooks that enforce the rules
│ ├── enforce-author.sh # no AI trailers, no --no-verify, no --amend
│ ├── pre-commit-quality-gate.sh # blocks commit unless backpressure ran
│ ├── complexity-gate.sh # cognitive-complexity per language
│ ├── doc-bloat-gate.sh # warns on new docs/*.md
│ ├── codify-learnings.sh # Stop: extracts [LEARN] → memory/learnings.jsonl
│ ├── deadend-capture.sh # Stop: extracts [DEAD-END] → memory/dead-ends.jsonl
│ ├── codify-prompt.sh # Stop: nudges /codify-learnings on substantive sessions
│ ├── session-report.sh # SessionStart: orientation in <60s
│ └── lib/git-helpers.sh # shared is_git_commit / extract_git_C_path
├── skills/
│ ├── backpressure/ # the load-bearing skill
│ ├── clean/ # slop reduction
│ ├── codify-learnings/ # session-end loop closer
│ └── propose-harness-change/ # for editing the harness itself
└── agents/
├── user-perspective-evaluator.md # fresh-context backpressure subagent
└── learnings-extractor.md # fresh-context codify-learnings subagent
Plus a HARNESS-PLAN.md at your project root with the classification, the patterns installed, and the 6-step dogfood checklist you should run immediately to verify the harness actually works. (If you skip dogfood, you've built theatre. Don't.)
See examples/coding-quickstart/ for a before/after walkthrough showing the harness blocking a slop commit, demanding /backpressure, and accepting the rewrite.
For an existing repo where you're not sure if the harness is complete:
bash ~/.claude/skills/harness-builder/scripts/audit.sh /path/to/repoWrites HARNESS-AUDIT.md with a per-pattern checklist (✅/
| harness-builder | revfactory/harness | Chachamaru127/claude-code-harness | |
|---|---|---|---|
| What it builds | the substrate underneath any agent workflow | a team of specialized agents | a Plan→Work→Review cycle |
| Optimizes for | unsupervised hours-long sessions, slop refusal | multi-agent decomposition | single-agent disciplined iteration |
| Hooks | 8 install-and-go (commit, complexity, capture, orientation) | mostly skills | a few + PreCompact handler |
| Backpressure | fresh-context subagent, hook-blocked at commit | n/a | n/a |
| Memory | [LEARN]/[DEAD-END] capture + active codification |
n/a | per-session |
| Substrate | Claude Code (Codex coming) | Claude Code | Claude Code |
This skill is the boring infrastructure underneath either of those — drop it in first, then layer their team-orchestration or P→W→R cycle on top.
- Codex substrate is stubbed — scaffold prints an actionable error pointing you at
references/harness-types.mdfor manual translation. Auto-generation is v2. - Research-harness extensions (budget gate, autopilot, calibration, ZCM) are described in
references/pattern-catalog.mdbut not bundled. They're harder to ship as turnkey templates because they integrate with provider APIs (RunPod, Vast, etc.). v1 ships a coding-harness; research is 🔜. - Client-side hooks are bypassable by a determined adversary (
git -c core.hooksPath=/dev/null,git commit-tree, shell aliases). For defense in depth, mirror the rules in CI. The hooks stop the common case (forgetful agent, hurried commit) — not malicious circumvention. - No interactive
/harness-builderUI yet. The skill works via the script directly today. Interactive classification (read 4 answers, scaffold) is a thin wrapper away.
This skill is a learning artifact. Every person who uses it sees something the next person should inherit. The contribution loop is wired into the skill.
If you used harness-builder and discovered a better hook, a missing pattern, a portability bug, or a doc that confused your fresh agent — the upstream wants the fix. From your local clone:
cd ~/.claude/skills/harness-builder
bash scripts/contribute.sh <short-slug>contribute.sh forks the repo to your account, creates an improvement/<slug> branch off main, and prints the next-step commands (edit, commit, push, gh pr create). It doesn't try to guess what you want to fix — it removes the ceremony so you can spend the time on the actual change.
When /codify-learnings runs at session end and extracts a generalizable pattern from your work, the skill points you at this script. Your [LEARN] block becomes the upstream PR's Provenance section — that's the trust signal future readers will use to evaluate your contribution.
Full contribution philosophy + what counts as generalizable: see references/contributing.md.
Quality bar: shellcheck-clean, dogfood-receipt-attached, scaffold-and-audit-itself green. The skill audits to 15/15 on its own templates — that's what your contribution should preserve.
MIT. See LICENSE.
Distilled from production harnesses for autonomous LLM work — coding, research, and hybrid. The patterns are battle-tested; the OSS extraction sands off the project-specific edges.