bc1plainview · bc1plainview · Mar 13, 2026 · Mar 13, 2026 · Mar 13, 2026
diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
   "name": "buidl",
-  "version": "3.6.0",
-  "description": "Full dev lifecycle for OP_NET Bitcoin L1 projects: idea → challenge → spec → build → review → ship. Self-learning across sessions with pattern extraction, agent performance scoring, score-based finding routing, project-type profiles, cross-layer validation, and starter templates. Includes shell-enforced E2E testing gates, frontend runtime smoke checks, PUA problem-solving methodology, and the OP_NET Bible (2000+ lines). Agents get smarter with every project.",
+  "version": "4.0.0",
+  "description": "Full dev lifecycle for OP_NET Bitcoin L1 projects: idea → challenge → spec → build → review → ship. Self-learning across sessions with pattern extraction, agent performance scoring, score-based finding routing, project-type profiles, cross-layer validation, and starter templates. Includes shell-enforced E2E testing gates, frontend runtime smoke checks, PUA problem-solving methodology, the OP_NET Bible (2000+ lines), agent self-critique, incremental audits, dry-run mode, execution tracing, and dynamic re-planning from learned patterns. Agents get smarter with every project.",
   "author": {
     "name": "dannyplainview + bob"
   }

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,25 @@
 # Changelog
 
+## [4.0.0] - 2026-03-13
+
+### Added
+- **Agent self-critique (Reflexion)**: All 4 builder agents (opnet-contract-dev, opnet-frontend-dev, opnet-backend-dev, loop-builder) now re-read their changes against `requirements.md` before declaring done. Each writes a `self-critique.md` artifact with spec compliance checklist, issues found and fixed, and remaining concerns. Any unmet criterion blocks completion until fixed.
+- **Incremental audit mode**: On cycle 2+, the auditor receives a `git diff` of changes since the last audit plus previous findings, instead of re-scanning the entire codebase. Focuses on the diff, blast radius, and verifying previous findings are resolved.
+- **Dry-run mode** (`--dry-run` flag): Challenge, Specify, and Explore phases run normally. Phase 4 prints the full execution plan (agents, knowledge, tasks, max_turns) without dispatching any agents, then stops.
+- **Agent execution tracing** (`scripts/trace-event.sh`): Appends structured JSONL events (dispatch, complete, route, finding, error, replan, checkpoint, state) to `artifacts/trace.jsonl`. New `/buidl-trace` command renders the trace as a formatted timeline.
+- **Dynamic re-planning** (`scripts/query-pattern.sh`): When an agent fails after retry, queries `learning/patterns.yaml` for known fix patterns matching the failure category. If found, presents a 5th option ("Apply known fix") alongside the existing 4 error-handling options.
+- **Trace command** (`commands/buidl-trace.md`): New `/buidl-trace` slash command that reads `trace.jsonl` and renders agent dispatch timeline, grouped by cycle.
+
+### Changed
+- **Orchestrator error handling** (`commands/buidl.md`): Agent failure flow now queries `query-pattern.sh` before presenting options. If a matching pattern exists, 5 options are shown (apply known fix, retry differently, skip, amend spec, cancel). Otherwise the existing 4 options are shown.
+- **Orchestrator Phase 4 Step 2** (`commands/buidl.md`): Each agent dispatch and completion now logs trace events via `trace-event.sh`. Phase transitions log checkpoint trace events. Review findings and routing decisions are traced.
+- **Auditor Step 2c** (`commands/buidl.md`): Cycle 2+ audits now pass `git diff` and previous findings to the auditor with incremental audit instructions.
+- **Auditor agent** (`agents/opnet-auditor.md`): New "Incremental Audit Mode" section documents the diff-based review process for cycle 2+.
+- **Plugin version**: 3.6.0 -> 4.0.0
+
+### Why
+Five features that close gaps in the agent intelligence loop. Self-critique catches spec drift before the reviewer does, saving entire review cycles. Incremental audits avoid re-scanning unchanged code, cutting audit time on fix cycles. Dry-run mode lets users preview the execution plan before committing to a full build. Execution tracing provides observability into agent dispatch ordering and timing. Dynamic re-planning applies lessons from past failures automatically instead of requiring manual intervention.
+
 ## [3.6.0] - 2026-03-13
 
 ### Added

diff --git a/README.md b/README.md
@@ -67,6 +67,7 @@ alias claudeyproj="claude --dangerously-skip-permissions --plugin-dir /path/to/b
 | `/buidl-cancel` | Cancel a running loop (preserves worktree for manual work) |
 | `/buidl-resume` | Resume an interrupted loop from last checkpoint |
 | `/buidl-clean` | Cancel + remove worktree and branch |
+| `/buidl-trace` | Show agent execution trace timeline for the current session |
 
 ### Flags
 
@@ -78,6 +79,7 @@ alias claudeyproj="claude --dangerously-skip-permissions --plugin-dir /path/to/b
 | `--builder-model opus\|sonnet` | inherit | Override model for builder agents |
 | `--reviewer-model opus\|sonnet` | inherit | Override model for reviewer agent |
 | `--max-tokens N` | unlimited | Token budget with advisory enforcement |
+| `--dry-run` | off | Run Challenge + Specify + Explore, print execution plan, stop |
 
 ## Agents
 
@@ -306,15 +308,20 @@ If the loop is interrupted (context exhaustion, wall-clock timeout, manual cance
 | E2E hard gate | v3.4 | Shell-level enforcement: loop cannot exit until on-chain tests pass. |
 | Frontend smoke check | v3.4 | Playwright runtime verification before declaring frontend success. |
 | Pre-flight scan | v3.4 | 10 anti-pattern grep checks block completion on known bad patterns. |
+| Agent self-critique | v4.0 | Builder agents re-check output against spec before declaring done. Writes self-critique.md artifact. |
+| Incremental audit | v4.0 | Cycle 2+ audits focus on git diff + blast radius instead of full codebase re-scan. |
+| Dry-run mode | v4.0 | Preview execution plan without dispatching agents. |
+| Execution tracing | v4.0 | JSONL trace log of all agent dispatches, completions, routing, and errors. |
+| Dynamic re-planning | v4.0 | Queries learned patterns for known fixes when agents fail. |
 
 ## Project Structure
 
 ```
 buidl/
 +-- .claude-plugin/
-|   +-- plugin.json              # Plugin manifest (v3.6.0)
+|   +-- plugin.json              # Plugin manifest (v4.0.0)
 +-- agents/                      # 12 agent definitions (incl. cross-layer-validator)
-+-- commands/                    # 7 slash commands
++-- commands/                    # 8 slash commands (incl. buidl-trace)
 +-- hooks/                       # Stop hook + state guards
 |   +-- scripts/
 +-- knowledge/                   # OPNet reference + domain slices
@@ -323,14 +330,14 @@ buidl/
 |   +-- patterns.yaml            # Structured pattern store (auto-updated)
 |   +-- agent-scores.yaml        # Agent performance metrics (auto-updated)
 |   +-- profiles/                # Auto-generated project-type profiles
-+-- scripts/                     # Setup + state writer + learning + routing scripts
++-- scripts/                     # Setup + state writer + learning + routing + tracing scripts
 +-- skills/                      # 3 triggerable skills
 |   +-- audit-from-bugs/
 |   +-- loop-guide/
 |   +-- pua/
 +-- templates/                   # Domain agent, knowledge slice, starter templates
 |   +-- starters/                # Project scaffolds (op20-token, more planned)
-+-- tests/                       # 303 structural + functional + integration tests
++-- tests/                       # 330+ structural + functional + integration tests
 ```
 
 ## Testing
@@ -339,7 +346,7 @@ buidl/
 bash tests/plugin-tests.sh
 ```
 
-303 tests across 28 categories:
+330+ tests across 34 categories:
 
 | Category | What it checks |
 |----------|----------------|
@@ -371,6 +378,12 @@ bash tests/plugin-tests.sh
 | Starter templates | Template manifest, contract template, frontend template, hook files |
 | Score-based routing | Taxonomy, keyword matching, candidate validation, functional routing tests |
 | Project-type profiles | Schema, threshold generation, profile YAML validation, functional profile tests |
+| Self-critique | Self-Critique step in all 4 builder agents, self-critique.md artifact reference |
+| Incremental audit | Incremental Audit Mode in auditor, git diff in buidl.md cycle 2 section |
+| Dry-run mode | --dry-run flag parsing, execution plan output |
+| Agent tracing | trace-event.sh exists, syntax, executable, functional JSON append test |
+| Dynamic re-planning | query-pattern.sh exists, syntax, executable, functional pattern query test |
+| Version 4.0.0 | plugin.json version matches CHANGELOG first entry |
 
 Tests run automatically on every push and PR via GitHub Actions.
 

diff --git a/agents/loop-builder.md b/agents/loop-builder.md
@@ -218,6 +218,29 @@ If you detect that you've used most of your context window (responses getting tr
 - Write a clear summary of what's done and what remains to the session artifacts
 - A partial summary that enables clean resumption is more valuable than one more half-finished step
 
+### Step 3.7: Self-Critique (Reflexion)
+
+Before declaring your build complete, re-read ALL your changes against the requirements:
+
+1. Read `requirements.md` from the spec directory
+2. For each acceptance criterion, verify your implementation satisfies it
+3. Write `self-critique.md` to your artifacts directory:
+   ```markdown
+   # Self-Critique: loop-builder
+
+   ## Spec Compliance
+   - [x/space] [acceptance criterion 1] -- [status/notes]
+   - [x/space] [acceptance criterion 2] -- [status/notes]
+
+   ## Issues Found and Fixed
+   - [what was caught and fixed, or "None"]
+
+   ## Remaining Concerns
+   - [anything uncertain -- reviewer should check]
+   ```
+4. If any criterion is NOT met: fix it now, re-run verify, update self-critique.md
+5. Only proceed to export/completion after all criteria are checked
+
 ### Step 4: Addressing Reviewer Findings (cycles 2+)
 When you receive findings from a previous review cycle:
 - Address EVERY critical and major finding explicitly.

diff --git a/agents/opnet-auditor.md b/agents/opnet-auditor.md
@@ -55,9 +55,24 @@ Before auditing ANY code:
 - For every finding: verify it by reading the actual code. No false positives.
 - After the 27-pattern scan, proactively check: are there patterns NOT in the checklist that this specific codebase is vulnerable to?
 
+## Incremental Audit Mode (Cycle 2+)
+
+When you are dispatched on cycle 2 or later, the orchestrator provides:
+1. A `git diff` of changes since the last audit
+2. Previous audit findings from `artifacts/audit/findings.md`
+
+In incremental mode:
+- **Focus on the diff + blast radius.** Prioritize reviewing changed lines and any code they interact with.
+- **Verify previous findings resolved.** For each CRITICAL/HIGH finding from the previous audit, confirm the fix is correct and complete.
+- **Check for regressions.** Fixes sometimes introduce new issues -- scan the blast radius of each change.
+- **Still run the full 27-pattern scan** on changed files only (not the entire codebase).
+- **Output format is the same** as a full audit -- VERDICT, findings by severity, audit summary.
+
+If the diff is empty or trivial, state that and issue a PASS verdict with a note that no material changes were found.
+
 ## Process
 
-### Step 1: Real-Bug Pattern Scan (MANDATORY — 27 Checks)
+### Step 1: Real-Bug Pattern Scan (MANDATORY -- 27 Checks)
 
 Before any domain-specific audit, systematically scan ALL code against these 27 confirmed vulnerability patterns from real OPNet bugs. For each finding, cite the pattern ID and the original bug PR.
 

diff --git a/agents/opnet-backend-dev.md b/agents/opnet-backend-dev.md
@@ -120,6 +120,29 @@ If any step fails:
 ### Context Budget Awareness
 If context is running low (responses truncating, tool calls slowing): STOP and write a summary of done vs remaining to session artifacts. Partial summary > half-finished step.
 
+### Step 4.7: Self-Critique (Reflexion)
+
+Before writing build-result.json, re-read ALL your changes against the requirements:
+
+1. Read `requirements.md` from the spec directory
+2. For each acceptance criterion, verify your implementation satisfies it
+3. Write `self-critique.md` to your artifacts directory:
+   ```markdown
+   # Self-Critique: opnet-backend-dev
+
+   ## Spec Compliance
+   - [x/space] [acceptance criterion 1] -- [status/notes]
+   - [x/space] [acceptance criterion 2] -- [status/notes]
+
+   ## Issues Found and Fixed
+   - [what was caught and fixed, or "None"]
+
+   ## Remaining Concerns
+   - [anything uncertain -- reviewer should check]
+   ```
+4. If any criterion is NOT met: fix it now, re-run verify, update self-critique.md
+5. Only proceed to export artifacts after all criteria are checked
+
 ### Step 5: Export Artifacts
 After successful build:
 - Write `build-result.json` with: `{ "status": "success", "buildDir": "dist/", "port": 3000 }`

diff --git a/agents/opnet-contract-dev.md b/agents/opnet-contract-dev.md
@@ -134,6 +134,29 @@ If any step fails:
 ### Context Budget Awareness
 If context is running low (responses truncating, tool calls slowing): STOP and write a summary of done vs remaining to session artifacts. Partial summary > half-finished step.
 
+### Step 5.7: Self-Critique (Reflexion)
+
+Before writing build-result.json, re-read ALL your changes against the requirements:
+
+1. Read `requirements.md` from the spec directory
+2. For each acceptance criterion, verify your implementation satisfies it
+3. Write `self-critique.md` to your artifacts directory:
+   ```markdown
+   # Self-Critique: opnet-contract-dev
+
+   ## Spec Compliance
+   - [x/space] [acceptance criterion 1] -- [status/notes]
+   - [x/space] [acceptance criterion 2] -- [status/notes]
+
+   ## Issues Found and Fixed
+   - [what was caught and fixed, or "None"]
+
+   ## Remaining Concerns
+   - [anything uncertain -- reviewer should check]
+   ```
+4. If any criterion is NOT met: fix it now, re-run verify, update self-critique.md
+5. Only proceed to export artifacts after all criteria are checked
+
 ### Step 6: Export Artifacts
 After successful build:
 - ABI JSON is generated by the compiler -- copy to the artifacts directory

diff --git a/agents/opnet-frontend-dev.md b/agents/opnet-frontend-dev.md
@@ -269,6 +269,29 @@ If any FAIL item is found: fix it, re-run build, re-run smoke check, re-run pre-
 ### Context Budget Awareness
 If context is running low (responses truncating, tool calls slowing): STOP and write a summary of done vs remaining to session artifacts. Partial summary > half-finished step.
 
+### Step 6.9: Self-Critique (Reflexion)
+
+Before writing build-result.json, re-read ALL your changes against the requirements:
+
+1. Read `requirements.md` from the spec directory
+2. For each acceptance criterion, verify your implementation satisfies it
+3. Write `self-critique.md` to your artifacts directory:
+   ```markdown
+   # Self-Critique: opnet-frontend-dev
+
+   ## Spec Compliance
+   - [x/space] [acceptance criterion 1] -- [status/notes]
+   - [x/space] [acceptance criterion 2] -- [status/notes]
+
+   ## Issues Found and Fixed
+   - [what was caught and fixed, or "None"]
+
+   ## Remaining Concerns
+   - [anything uncertain -- reviewer should check]
+   ```
+4. If any criterion is NOT met: fix it now, re-run verify, update self-critique.md
+5. Only proceed to export artifacts after all criteria are checked
+
 ### Step 7: Export Artifacts
 After successful build:
 - Write `build-result.json` with: `{ "status": "success", "buildDir": "dist/", "devPort": 5173 }`

diff --git a/commands/buidl-trace.md b/commands/buidl-trace.md
@@ -0,0 +1,41 @@
+---
+description: "Show agent execution trace for the current loop session"
+allowed-tools: ["Bash(bash:*)"]
+---
+
+# The Loop -- Trace
+
+Show the execution trace for the current or most recent loop session.
+
+## Steps
+
+1. Check for state files in order:
+   - `.claude/loop/state.yaml` (preferred)
+   - `.claude/loop/state.local.md` (legacy fallback)
+2. If neither exists, say "No loop is currently running."
+3. Read `session_name` from the state file.
+4. Check for trace file at `.claude/loop/sessions/<name>/artifacts/trace.jsonl`.
+5. If trace file does not exist, say "No trace events recorded for this session."
+6. If trace file exists, parse each JSONL line and render:
+
+```
+Agent Execution Trace: <session-name>
+======================================
+
+Timestamp            Event      Agent                Phase      Cycle  Details
+-------------------  ---------  -------------------  ---------  -----  -------
+2026-03-13T10:00:00Z dispatch   opnet-contract-dev   build      1      Starting contract development
+2026-03-13T10:05:30Z complete   opnet-contract-dev   build      1      Build passed, ABI exported
+2026-03-13T10:05:31Z dispatch   opnet-frontend-dev   build      1      Starting frontend development
+...
+
+Summary
+-------
+Total events: [N]
+Agents dispatched: [list]
+Errors: [count or "none"]
+```
+
+7. If `--tokens` data is present on events, include a token usage column.
+8. If `--category` data is present on events, include it in the details.
+9. Group events by cycle if multiple cycles exist, with a separator between cycles.