Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "buidl",
"version": "3.6.0",
"description": "Full dev lifecycle for OP_NET Bitcoin L1 projects: idea → challenge → spec → build → review → ship. Self-learning across sessions with pattern extraction, agent performance scoring, score-based finding routing, project-type profiles, cross-layer validation, and starter templates. Includes shell-enforced E2E testing gates, frontend runtime smoke checks, PUA problem-solving methodology, and the OP_NET Bible (2000+ lines). Agents get smarter with every project.",
"version": "4.0.0",
"description": "Full dev lifecycle for OP_NET Bitcoin L1 projects: idea → challenge → spec → build → review → ship. Self-learning across sessions with pattern extraction, agent performance scoring, score-based finding routing, project-type profiles, cross-layer validation, and starter templates. Includes shell-enforced E2E testing gates, frontend runtime smoke checks, PUA problem-solving methodology, the OP_NET Bible (2000+ lines), agent self-critique, incremental audits, dry-run mode, execution tracing, and dynamic re-planning from learned patterns. Agents get smarter with every project.",
"author": {
"name": "dannyplainview + bob"
}
Expand Down
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,25 @@
# Changelog

## [4.0.0] - 2026-03-13

### Added
- **Agent self-critique (Reflexion)**: All 4 builder agents (opnet-contract-dev, opnet-frontend-dev, opnet-backend-dev, loop-builder) now re-read their changes against `requirements.md` before declaring done. Each writes a `self-critique.md` artifact with spec compliance checklist, issues found and fixed, and remaining concerns. Any unmet criterion blocks completion until fixed.
- **Incremental audit mode**: On cycle 2+, the auditor receives a `git diff` of changes since the last audit plus previous findings, instead of re-scanning the entire codebase. Focuses on the diff, blast radius, and verifying previous findings are resolved.
- **Dry-run mode** (`--dry-run` flag): Challenge, Specify, and Explore phases run normally. Phase 4 prints the full execution plan (agents, knowledge, tasks, max_turns) without dispatching any agents, then stops.
- **Agent execution tracing** (`scripts/trace-event.sh`): Appends structured JSONL events (dispatch, complete, route, finding, error, replan, checkpoint, state) to `artifacts/trace.jsonl`. New `/buidl-trace` command renders the trace as a formatted timeline.
- **Dynamic re-planning** (`scripts/query-pattern.sh`): When an agent fails after retry, queries `learning/patterns.yaml` for known fix patterns matching the failure category. If found, presents a 5th option ("Apply known fix") alongside the existing 4 error-handling options.
- **Trace command** (`commands/buidl-trace.md`): New `/buidl-trace` slash command that reads `trace.jsonl` and renders agent dispatch timeline, grouped by cycle.

### Changed
- **Orchestrator error handling** (`commands/buidl.md`): Agent failure flow now queries `query-pattern.sh` before presenting options. If a matching pattern exists, 5 options are shown (apply known fix, retry differently, skip, amend spec, cancel). Otherwise the existing 4 options are shown.
- **Orchestrator Phase 4 Step 2** (`commands/buidl.md`): Each agent dispatch and completion now logs trace events via `trace-event.sh`. Phase transitions log checkpoint trace events. Review findings and routing decisions are traced.
- **Auditor Step 2c** (`commands/buidl.md`): Cycle 2+ audits now pass `git diff` and previous findings to the auditor with incremental audit instructions.
- **Auditor agent** (`agents/opnet-auditor.md`): New "Incremental Audit Mode" section documents the diff-based review process for cycle 2+.
- **Plugin version**: 3.6.0 -> 4.0.0

### Why
Five features that close gaps in the agent intelligence loop. Self-critique catches spec drift before the reviewer does, saving entire review cycles. Incremental audits avoid re-scanning unchanged code, cutting audit time on fix cycles. Dry-run mode lets users preview the execution plan before committing to a full build. Execution tracing provides observability into agent dispatch ordering and timing. Dynamic re-planning applies lessons from past failures automatically instead of requiring manual intervention.

## [3.6.0] - 2026-03-13

### Added
Expand Down
23 changes: 18 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ alias claudeyproj="claude --dangerously-skip-permissions --plugin-dir /path/to/b
| `/buidl-cancel` | Cancel a running loop (preserves worktree for manual work) |
| `/buidl-resume` | Resume an interrupted loop from last checkpoint |
| `/buidl-clean` | Cancel + remove worktree and branch |
| `/buidl-trace` | Show agent execution trace timeline for the current session |

### Flags

Expand All @@ -78,6 +79,7 @@ alias claudeyproj="claude --dangerously-skip-permissions --plugin-dir /path/to/b
| `--builder-model opus\|sonnet` | inherit | Override model for builder agents |
| `--reviewer-model opus\|sonnet` | inherit | Override model for reviewer agent |
| `--max-tokens N` | unlimited | Token budget with advisory enforcement |
| `--dry-run` | off | Run Challenge + Specify + Explore, print execution plan, stop |

## Agents

Expand Down Expand Up @@ -306,15 +308,20 @@ If the loop is interrupted (context exhaustion, wall-clock timeout, manual cance
| E2E hard gate | v3.4 | Shell-level enforcement: loop cannot exit until on-chain tests pass. |
| Frontend smoke check | v3.4 | Playwright runtime verification before declaring frontend success. |
| Pre-flight scan | v3.4 | 10 anti-pattern grep checks block completion on known bad patterns. |
| Agent self-critique | v4.0 | Builder agents re-check output against spec before declaring done. Writes self-critique.md artifact. |
| Incremental audit | v4.0 | Cycle 2+ audits focus on git diff + blast radius instead of full codebase re-scan. |
| Dry-run mode | v4.0 | Preview execution plan without dispatching agents. |
| Execution tracing | v4.0 | JSONL trace log of all agent dispatches, completions, routing, and errors. |
| Dynamic re-planning | v4.0 | Queries learned patterns for known fixes when agents fail. |

## Project Structure

```
buidl/
+-- .claude-plugin/
| +-- plugin.json # Plugin manifest (v3.6.0)
| +-- plugin.json # Plugin manifest (v4.0.0)
+-- agents/ # 12 agent definitions (incl. cross-layer-validator)
+-- commands/ # 7 slash commands
+-- commands/ # 8 slash commands (incl. buidl-trace)
+-- hooks/ # Stop hook + state guards
| +-- scripts/
+-- knowledge/ # OPNet reference + domain slices
Expand All @@ -323,14 +330,14 @@ buidl/
| +-- patterns.yaml # Structured pattern store (auto-updated)
| +-- agent-scores.yaml # Agent performance metrics (auto-updated)
| +-- profiles/ # Auto-generated project-type profiles
+-- scripts/ # Setup + state writer + learning + routing scripts
+-- scripts/ # Setup + state writer + learning + routing + tracing scripts
+-- skills/ # 3 triggerable skills
| +-- audit-from-bugs/
| +-- loop-guide/
| +-- pua/
+-- templates/ # Domain agent, knowledge slice, starter templates
| +-- starters/ # Project scaffolds (op20-token, more planned)
+-- tests/ # 303 structural + functional + integration tests
+-- tests/ # 330+ structural + functional + integration tests
```

## Testing
Expand All @@ -339,7 +346,7 @@ buidl/
bash tests/plugin-tests.sh
```

303 tests across 28 categories:
330+ tests across 34 categories:

| Category | What it checks |
|----------|----------------|
Expand Down Expand Up @@ -371,6 +378,12 @@ bash tests/plugin-tests.sh
| Starter templates | Template manifest, contract template, frontend template, hook files |
| Score-based routing | Taxonomy, keyword matching, candidate validation, functional routing tests |
| Project-type profiles | Schema, threshold generation, profile YAML validation, functional profile tests |
| Self-critique | Self-Critique step in all 4 builder agents, self-critique.md artifact reference |
| Incremental audit | Incremental Audit Mode in auditor, git diff in buidl.md cycle 2 section |
| Dry-run mode | --dry-run flag parsing, execution plan output |
| Agent tracing | trace-event.sh exists, syntax, executable, functional JSON append test |
| Dynamic re-planning | query-pattern.sh exists, syntax, executable, functional pattern query test |
| Version 4.0.0 | plugin.json version matches CHANGELOG first entry |

Tests run automatically on every push and PR via GitHub Actions.

Expand Down
23 changes: 23 additions & 0 deletions agents/loop-builder.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,29 @@ If you detect that you've used most of your context window (responses getting tr
- Write a clear summary of what's done and what remains to the session artifacts
- A partial summary that enables clean resumption is more valuable than one more half-finished step

### Step 3.7: Self-Critique (Reflexion)

Before declaring your build complete, re-read ALL your changes against the requirements:

1. Read `requirements.md` from the spec directory
2. For each acceptance criterion, verify your implementation satisfies it
3. Write `self-critique.md` to your artifacts directory:
```markdown
# Self-Critique: loop-builder

## Spec Compliance
- [x/space] [acceptance criterion 1] -- [status/notes]
- [x/space] [acceptance criterion 2] -- [status/notes]

## Issues Found and Fixed
- [what was caught and fixed, or "None"]

## Remaining Concerns
- [anything uncertain -- reviewer should check]
```
4. If any criterion is NOT met: fix it now, re-run verify, update self-critique.md
5. Only proceed to export/completion after all criteria are checked

### Step 4: Addressing Reviewer Findings (cycles 2+)
When you receive findings from a previous review cycle:
- Address EVERY critical and major finding explicitly.
Expand Down
17 changes: 16 additions & 1 deletion agents/opnet-auditor.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,24 @@ Before auditing ANY code:
- For every finding: verify it by reading the actual code. No false positives.
- After the 27-pattern scan, proactively check: are there patterns NOT in the checklist that this specific codebase is vulnerable to?

## Incremental Audit Mode (Cycle 2+)

When you are dispatched on cycle 2 or later, the orchestrator provides:
1. A `git diff` of changes since the last audit
2. Previous audit findings from `artifacts/audit/findings.md`

In incremental mode:
- **Focus on the diff + blast radius.** Prioritize reviewing changed lines and any code they interact with.
- **Verify previous findings resolved.** For each CRITICAL/HIGH finding from the previous audit, confirm the fix is correct and complete.
- **Check for regressions.** Fixes sometimes introduce new issues -- scan the blast radius of each change.
- **Still run the full 27-pattern scan** on changed files only (not the entire codebase).
- **Output format is the same** as a full audit -- VERDICT, findings by severity, audit summary.

If the diff is empty or trivial, state that and issue a PASS verdict with a note that no material changes were found.

## Process

### Step 1: Real-Bug Pattern Scan (MANDATORY 27 Checks)
### Step 1: Real-Bug Pattern Scan (MANDATORY -- 27 Checks)

Before any domain-specific audit, systematically scan ALL code against these 27 confirmed vulnerability patterns from real OPNet bugs. For each finding, cite the pattern ID and the original bug PR.

Expand Down
23 changes: 23 additions & 0 deletions agents/opnet-backend-dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,29 @@ If any step fails:
### Context Budget Awareness
If context is running low (responses truncating, tool calls slowing): STOP and write a summary of done vs remaining to session artifacts. Partial summary > half-finished step.

### Step 4.7: Self-Critique (Reflexion)

Before writing build-result.json, re-read ALL your changes against the requirements:

1. Read `requirements.md` from the spec directory
2. For each acceptance criterion, verify your implementation satisfies it
3. Write `self-critique.md` to your artifacts directory:
```markdown
# Self-Critique: opnet-backend-dev

## Spec Compliance
- [x/space] [acceptance criterion 1] -- [status/notes]
- [x/space] [acceptance criterion 2] -- [status/notes]

## Issues Found and Fixed
- [what was caught and fixed, or "None"]

## Remaining Concerns
- [anything uncertain -- reviewer should check]
```
4. If any criterion is NOT met: fix it now, re-run verify, update self-critique.md
5. Only proceed to export artifacts after all criteria are checked

### Step 5: Export Artifacts
After successful build:
- Write `build-result.json` with: `{ "status": "success", "buildDir": "dist/", "port": 3000 }`
Expand Down
23 changes: 23 additions & 0 deletions agents/opnet-contract-dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,29 @@ If any step fails:
### Context Budget Awareness
If context is running low (responses truncating, tool calls slowing): STOP and write a summary of done vs remaining to session artifacts. Partial summary > half-finished step.

### Step 5.7: Self-Critique (Reflexion)

Before writing build-result.json, re-read ALL your changes against the requirements:

1. Read `requirements.md` from the spec directory
2. For each acceptance criterion, verify your implementation satisfies it
3. Write `self-critique.md` to your artifacts directory:
```markdown
# Self-Critique: opnet-contract-dev

## Spec Compliance
- [x/space] [acceptance criterion 1] -- [status/notes]
- [x/space] [acceptance criterion 2] -- [status/notes]

## Issues Found and Fixed
- [what was caught and fixed, or "None"]

## Remaining Concerns
- [anything uncertain -- reviewer should check]
```
4. If any criterion is NOT met: fix it now, re-run verify, update self-critique.md
5. Only proceed to export artifacts after all criteria are checked

### Step 6: Export Artifacts
After successful build:
- ABI JSON is generated by the compiler -- copy to the artifacts directory
Expand Down
23 changes: 23 additions & 0 deletions agents/opnet-frontend-dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,29 @@ If any FAIL item is found: fix it, re-run build, re-run smoke check, re-run pre-
### Context Budget Awareness
If context is running low (responses truncating, tool calls slowing): STOP and write a summary of done vs remaining to session artifacts. Partial summary > half-finished step.

### Step 6.9: Self-Critique (Reflexion)

Before writing build-result.json, re-read ALL your changes against the requirements:

1. Read `requirements.md` from the spec directory
2. For each acceptance criterion, verify your implementation satisfies it
3. Write `self-critique.md` to your artifacts directory:
```markdown
# Self-Critique: opnet-frontend-dev

## Spec Compliance
- [x/space] [acceptance criterion 1] -- [status/notes]
- [x/space] [acceptance criterion 2] -- [status/notes]

## Issues Found and Fixed
- [what was caught and fixed, or "None"]

## Remaining Concerns
- [anything uncertain -- reviewer should check]
```
4. If any criterion is NOT met: fix it now, re-run verify, update self-critique.md
5. Only proceed to export artifacts after all criteria are checked

### Step 7: Export Artifacts
After successful build:
- Write `build-result.json` with: `{ "status": "success", "buildDir": "dist/", "devPort": 5173 }`
Expand Down
41 changes: 41 additions & 0 deletions commands/buidl-trace.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
description: "Show agent execution trace for the current loop session"
allowed-tools: ["Bash(bash:*)"]
---

# The Loop -- Trace

Show the execution trace for the current or most recent loop session.

## Steps

1. Check for state files in order:
- `.claude/loop/state.yaml` (preferred)
- `.claude/loop/state.local.md` (legacy fallback)
2. If neither exists, say "No loop is currently running."
3. Read `session_name` from the state file.
4. Check for trace file at `.claude/loop/sessions/<name>/artifacts/trace.jsonl`.
5. If trace file does not exist, say "No trace events recorded for this session."
6. If trace file exists, parse each JSONL line and render:

```
Agent Execution Trace: <session-name>
======================================

Timestamp Event Agent Phase Cycle Details
------------------- --------- ------------------- --------- ----- -------
2026-03-13T10:00:00Z dispatch opnet-contract-dev build 1 Starting contract development
2026-03-13T10:05:30Z complete opnet-contract-dev build 1 Build passed, ABI exported
2026-03-13T10:05:31Z dispatch opnet-frontend-dev build 1 Starting frontend development
...

Summary
-------
Total events: [N]
Agents dispatched: [list]
Errors: [count or "none"]
```

7. If `--tokens` data is present on events, include a token usage column.
8. If `--category` data is present on events, include it in the details.
9. Group events by cycle if multiple cycles exist, with a separator between cycles.
Loading
Loading