harness-builder

Scaffold a Claude Code (or Codex) harness that lets one LLM agent work unsupervised for hours and refuse to ship slop.

harness-builder is a Claude Code skill that bootstraps the infrastructure underneath an autonomous coding session — hooks, gates, capture mechanisms, fresh-context evaluators — distilled from production harnesses for coding, research, and hybrid workflows.

It doesn't orchestrate teams of agents (see revfactory/harness). It doesn't give you a Plan→Work→Review cycle (see Chachamaru127/claude-code-harness). It builds the substrate underneath either: the boring discipline that makes long autonomous runs not end in slop committed at 3am.

=== session-report ===

Conventions (non-negotiable):
  - Author: human author on every commit. NO AI trailers. No --no-verify.
  - Sub-agents: cascade, one at a time, never parallel.
  - Backpressure: after non-trivial changes, invoke /backpressure (subagent
    evaluates as the user) before declaring done.

Recent learnings:
  - fresh-context evaluation > self-evaluation; backpressure must be a separate context
  - record.sh must use pwd -P; macOS /tmp symlinks /private/tmp
  - codify-learnings should write to project_root, not pwd

That's the SessionStart orientation a fresh agent gets in 60 seconds, after harness-builder scaffolds.

Why this exists

Three failure modes collapse autonomous agent work into slop:

Conventions in prose decay. A 200-line CLAUDE.md the agent reads at session start is forgotten three turns later. Hooks make rules non-bypassable.
Self-evaluation drifts. An agent grading its own diff is invested. Fresh-context subagents grade with no skin in the game.
Mid-session insights vanish. The agent learns something useful and never writes it down. [LEARN] blocks captured automatically; deliberately codified at session end.

The 5 principles, mapped to enforcement

Principle	Enforced by
Programmatic > remembered	hooks fail-close on `git commit`, `Write`, `Edit`
Fresh-context evaluation > self-evaluation	`/backpressure` skill + `user-perspective-evaluator` agent
Capture continuously, codify deliberately	Stop hooks scan for `[LEARN]` / `[DEAD-END]` blocks → `memory/*.jsonl`
Cascade subagents, never parallel	convention in CLAUDE.md, audited at backpressure-time
Substrate-portable means scripts+schema, not prompts	hooks + scripts in `templates/`; CLAUDE.md is `why`, not `what`

Full reasoning: references/principles.md.

60-second install

# 1. Clone into your skills directory.
git clone https://github.com/<your-org>/harness-builder ~/.claude/skills/harness-builder

# 2. Verify it's discoverable.
ls ~/.claude/skills/harness-builder/SKILL.md   # should exist

# 3. From any project where you want a harness:
cd /path/to/your/project
bash ~/.claude/skills/harness-builder/scripts/scaffold.sh "$(pwd)" coding

Or invoke /harness-builder from inside a Claude Code session — it walks you through the 4 classification questions interactively.

Dependencies: bash, jq, git. Optional but recommended: complexipy (Python), gocognit (Go), eslint-plugin-sonarjs (TS/JS) for the complexity gate.

What gets scaffolded

A coding harness for coding+claude-code+no-long-running+blocking:

your-project/
├── CLAUDE.md                              # 76-line process doc
├── HARNESS-PLAN.md                        # what was decided + dogfood checklist
├── memory/MEMORY.md                       # the index — entries grow over time
└── .claude/
    ├── settings.json                      # registers all hooks
    ├── hooks/                             # 8 hooks that enforce the rules
    │   ├── enforce-author.sh              #   no AI trailers, no --no-verify, no --amend
    │   ├── pre-commit-quality-gate.sh     #   blocks commit unless backpressure ran
    │   ├── complexity-gate.sh             #   cognitive-complexity per language
    │   ├── doc-bloat-gate.sh              #   warns on new docs/*.md
    │   ├── codify-learnings.sh            #   Stop: extracts [LEARN] → memory/learnings.jsonl
    │   ├── deadend-capture.sh             #   Stop: extracts [DEAD-END] → memory/dead-ends.jsonl
    │   ├── codify-prompt.sh               #   Stop: nudges /codify-learnings on substantive sessions
    │   ├── session-report.sh              #   SessionStart: orientation in <60s
    │   └── lib/git-helpers.sh             #   shared is_git_commit / extract_git_C_path
    ├── skills/
    │   ├── backpressure/                  # the load-bearing skill
    │   ├── clean/                         # slop reduction
    │   ├── codify-learnings/              # session-end loop closer
    │   └── propose-harness-change/        # for editing the harness itself
    └── agents/
        ├── user-perspective-evaluator.md  # fresh-context backpressure subagent
        └── learnings-extractor.md         # fresh-context codify-learnings subagent

Plus a HARNESS-PLAN.md at your project root with the classification, the patterns installed, and the 6-step dogfood checklist you should run immediately to verify the harness actually works. (If you skip dogfood, you've built theatre. Don't.)

Worked example

See examples/coding-quickstart/ for a before/after walkthrough showing the harness blocking a slop commit, demanding /backpressure, and accepting the rewrite.

Audit mode

For an existing repo where you're not sure if the harness is complete:

bash ~/.claude/skills/harness-builder/scripts/audit.sh /path/to/repo

Writes HARNESS-AUDIT.md with a per-pattern checklist (✅/⚠️/❌) and remediation hints. Read-only — never modifies your repo.

Comparison

	harness-builder	revfactory/harness	Chachamaru127/claude-code-harness
What it builds	the substrate underneath any agent workflow	a team of specialized agents	a Plan→Work→Review cycle
Optimizes for	unsupervised hours-long sessions, slop refusal	multi-agent decomposition	single-agent disciplined iteration
Hooks	8 install-and-go (commit, complexity, capture, orientation)	mostly skills	a few + PreCompact handler
Backpressure	fresh-context subagent, hook-blocked at commit	n/a	n/a
Memory	`[LEARN]`/`[DEAD-END]` capture + active codification	n/a	per-session
Substrate	Claude Code (Codex coming)	Claude Code	Claude Code

This skill is the boring infrastructure underneath either of those — drop it in first, then layer their team-orchestration or P→W→R cycle on top.

Limitations (be honest)

Codex substrate is stubbed — scaffold prints an actionable error pointing you at references/harness-types.md for manual translation. Auto-generation is v2.
Research-harness extensions (budget gate, autopilot, calibration, ZCM) are described in references/pattern-catalog.md but not bundled. They're harder to ship as turnkey templates because they integrate with provider APIs (RunPod, Vast, etc.). v1 ships a coding-harness; research is 🔜.
Client-side hooks are bypassable by a determined adversary (git -c core.hooksPath=/dev/null, git commit-tree, shell aliases). For defense in depth, mirror the rules in CI. The hooks stop the common case (forgetful agent, hurried commit) — not malicious circumvention.
No interactive /harness-builder UI yet. The skill works via the script directly today. Interactive classification (read 4 answers, scaffold) is a thin wrapper away.

Contributing — and why this matters

This skill is a learning artifact. Every person who uses it sees something the next person should inherit. The contribution loop is wired into the skill.

If you used harness-builder and discovered a better hook, a missing pattern, a portability bug, or a doc that confused your fresh agent — the upstream wants the fix. From your local clone:

cd ~/.claude/skills/harness-builder
bash scripts/contribute.sh <short-slug>

contribute.sh forks the repo to your account, creates an improvement/<slug> branch off main, and prints the next-step commands (edit, commit, push, gh pr create). It doesn't try to guess what you want to fix — it removes the ceremony so you can spend the time on the actual change.

When /codify-learnings runs at session end and extracts a generalizable pattern from your work, the skill points you at this script. Your [LEARN] block becomes the upstream PR's Provenance section — that's the trust signal future readers will use to evaluate your contribution.

Full contribution philosophy + what counts as generalizable: see references/contributing.md.

Quality bar: shellcheck-clean, dogfood-receipt-attached, scaffold-and-audit-itself green. The skill audits to 15/15 on its own templates — that's what your contribution should preserve.

License

MIT. See LICENSE.

Credits

Distilled from production harnesses for autonomous LLM work — coding, research, and hybrid. The patterns are battle-tested; the OSS extraction sands off the project-specific edges.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

harness-builder

Why this exists

The 5 principles, mapped to enforcement

60-second install

What gets scaffolded

Worked example

Audit mode

Comparison

Limitations (be honest)

Contributing — and why this matters

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
examples/coding-quickstart		examples/coding-quickstart
references		references
scripts		scripts
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Folders and files

Latest commit

History

Repository files navigation

harness-builder

Why this exists

The 5 principles, mapped to enforcement

60-second install

What gets scaffolded

Worked example

Audit mode

Comparison

Limitations (be honest)

Contributing — and why this matters

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages