Feat/self critic coaching by yhyu13 · Pull Request #2996 · charmbracelet/crush

yhyu13 · 2026-05-24T10:17:39Z

Summary

This PR introduces three built-in agent skills — Critic, Replacer, and Toolcoach — that work together as a middleware stack around the primary SessionAgent. The goal is to improve output quality, prevent regressions, and coach the agent toward better tool usage patterns without adding perceivable latency.

What Changed

1. Critic Skill (`internal/skills/critic/`)

A secondary-LLM review layer that inspects the agent's diff after each turn and returns one of:

approve — changes look good, turn completes normally
revise — rollback changes, inject feedback, re-drive the primary agent
halt — rollback changes, return error to user

Key features:

SQLite-backed review persistence with sqlc-generated queries
SHA-256 keyed LRU cache for deduplicated reviews
Circuit breaker + timeout for resilient LLM calls
LSP diagnostic fetching for richer context
Configurable via crush.json (options.critic)
CLI commands: crush critic list, crush critic show, crush critic stats
Per-session disable via ReplacerEnabled flag

2. Replacer Skill (`internal/skills/replacer/`)

A conversation coach that evaluates whether the primary agent's response is complete or needs a follow-up. Uses a small/fast model to decide stop vs continue.

Key features:

Configurable max iterations (default 3)
Duplicate-prompt guard to avoid repetitive follow-ups
Timeout handling (deadline exceeded = treat as stop)
Coach spinner indicators ("Coach is ready" → "Coach is evaluating")
/skipcoach command to interrupt current evaluation one-time
Commands dialog integration ("Skip Coach")
Auto-interrupt on user typing or Enter press

3. Toolcoach Skill (`internal/skills/toolcoach/`)

A zero-LLM, heuristic-based skill that detects anti-patterns in real-time tool usage and injects coaching tips into tool results.

Key features:

~2.7µs overhead per tool call (benchmarked)
Detects patterns: destructive_bash, write_over_existing, edit_without_view, broad_grep, repeated_view, missing_multiedit
SQLite effectiveness tracking with adaptive severity
Progressive coaching (hint → warning → critical) based on repetition
Guided retry mechanism for destructive operations
Configurable via crush.json (options.toolcoach)

4. Middleware Stack Order

outermost: ToolcoachMiddleware
            ↓ wraps
           ReplacerMiddleware
            ↓ wraps
           CriticMiddleware
            ↓ wraps
inner-most: SessionAgent (primary)

SkipCoach propagates through the chain via interface type-assertion at each layer.

5. UI & Integration

New commands dialog items: "Skip Coach"
/skipcoach (or skipcoach) intercepted in textarea
AgentSkipCoach(sessionID) wired through workspace → coordinator → middleware chain
Coach evaluation no longer blocks user commands (typing, new session, model selection, etc.)
Visual busy indicators preserved (placeholder, progress bar, cancel key, todo spinner)

6. Database & Schema

New migrations:

critic_reviews table (verdict, confidence, concerns, diff snapshot)
written_files table (tracks files written per session for diff baseline)
toolcoach_effectiveness table (pattern hit/miss tracking)

Testing

# Critic tests
go test ./internal/skills/critic/... -v -race

# Replacer tests
go test ./internal/skills/replacer/... -v -race

# Toolcoach tests
go test ./internal/skills/toolcoach/... -v -race

# Standalone critic + replacer demo (no API keys)
go run ./cmd/critic-demo

Configuration Example

{
  "options": {
    "critic": {
      "enabled": true,
      "model": "anthropic/claude-sonnet-4",
      "max_iterations": 3,
      "auto_approve": false,
      "threshold": 0.85
    },
    "replacer": {
      "enabled": true,
      "max_iterations": 3
    },
    "toolcoach": {
      "enabled": true,
      "max_patterns": 100
    }
  }
}

Checklist

I have read CONTRIBUTING.md.
I have created a discussion that was approved by a maintainer (for new features).

charmcli · 2026-05-24T10:17:49Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

yhyu13 · 2026-05-24T10:19:27Z

I have read the Contributor License Agreement (CLA) and hereby sign the CLA.

… conversation coach This squash commit brings in the self-critic improvement feature: - Add CriticConfig and ReplacerConfig to crush.json options with auto-enable - Add self-critic skill core: checkpoints, snapshots, diff computation, revision loop - Add critic CLI commands (critic list, show, stats) and database persistence - Add message-level checkpoint review in addition to file-edit checkpoints - Add replacement agent (conversation coach) for conversation continuation - Add animated spinners and cancellation support for replacer evaluation - Track file writes via filetracker (RecordWrite / ListWrittenFiles) - Wire write tracking into edit, multiedit, and write tools - Fix critic first-run blind spot by re-querying read/written files post-run - Add written_files DB table, migration, and sqlc queries - Add critic/replacer event telemetry and pub/sub integration - Add project-local template overrides for critic and replacer prompts - Fix template whitespace to avoid breaking VCR cassette matching

…f-critic Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ner fixes Toolcoach skill: - Telemetry & measurement (coachMetrics) - Fast-path pattern matching, jsonpeek zero-allocation extractor, regex pool - Semantic edit validation with cached view content and time thresholds - Adaptive severity based on SQLite effectiveness tracking - Progressive coaching intensity (tutor/balanced/minimal) with auto-switch - Enhanced success tracking with per-pattern Validate functions - Guided retry with AutoRetry (1/turn cap) - Coach context fed to critic via CoachSummaryProvider Replacer fixes: - Delete orphaned eval spinners on primary error, nil result, context cancel - Add deleteEvalIndicator helper with background context and retry UI: - /skipcoach command to interrupt current coach evaluation one time - Skip Coach command in commands dialog

…tale references - Fix HOOKS.md path to docs/hooks/README.md - Document all toolcoach config fields (adaptive_severity, intensity, auto_retry, auto_retry_sessions) - Add supervisor-impl builtin skill to list with note that it fails to parse (no YAML frontmatter)

Includes: - replacer: /skipcoach command, orphaned spinner cleanup, one-time skip - critic+toolcoach: SkipCoach delegation through wrapper chain - AGENTS.md: updated toolcoach config, wrapper order, skip docs

Any keystroke or Enter press while the agent is busy now signals AgentSkipCoach so the user can interrupt an ongoing or pending coach evaluation without needing the explicit /skipcoach command.

Includes AGENTS.md updates for UI interaction patterns and toolcoach phase history, plus minor formatting in toolcoach middleware.

Commands that previously showed 'Agent is busy, please wait' during coach evaluation now call AgentSkipCoach and proceed. If the primary agent is still running after the skip attempt, the warning is shown as before. This applies to: - New Session, Summarize, External Editor, Initialize Project - Select Model, Select Reasoning Effort - Suspend, Ctrl+N new session, Ctrl+E open editor

trySkipCoach() now returns void and always lets the caller proceed. The previous implementation checked isAgentBusy() again after signaling SkipCoach, but the coach cancels asynchronously so the second check always returned true and the warning still appeared. All commands now proceed immediately while signaling the coach to skip: New Session, Summarize, External Editor, Initialize Project, Select Model, Select Reasoning Effort, Suspend, Ctrl+N, Ctrl+E.

yhyu13 and others added 12 commits May 29, 2026 23:39

feat: add critic and replacer skills

b778fc4

update go openai

9186f71

feat(replacer): enhance middleware with conversation coaching and sel…

7f98ca1

…f-critic Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat: squash merge self-critic-improve3 — SkipCoach propagation + docs

c4c560d

Includes: - replacer: /skipcoach command, orphaned spinner cleanup, one-time skip - critic+toolcoach: SkipCoach delegation through wrapper chain - AGENTS.md: updated toolcoach config, wrapper order, skip docs

fix(ui): user input interrupts coach evaluation

4c905d1

Any keystroke or Enter press while the agent is busy now signals AgentSkipCoach so the user can interrupt an ongoing or pending coach evaluation without needing the explicit /skipcoach command.

feat: squash merge self-critic-improve3 docs + formatting

65f9c4d

Includes AGENTS.md updates for UI interaction patterns and toolcoach phase history, plus minor formatting in toolcoach middleware.

fix: resolve rebase conflicts and update test stubs/schema

7d08dd8

yhyu13 force-pushed the feat/self-critic-squashed branch from d3eaf09 to 7d08dd8 Compare May 29, 2026 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/self critic coaching#2996

Feat/self critic coaching#2996
yhyu13 wants to merge 12 commits into
charmbracelet:mainfrom
yhyu13:feat/self-critic-squashed

yhyu13 commented May 24, 2026

Uh oh!

charmcli commented May 24, 2026 •

edited

Loading

Uh oh!

yhyu13 commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yhyu13 commented May 24, 2026

Summary

What Changed

1. Critic Skill (internal/skills/critic/)

2. Replacer Skill (internal/skills/replacer/)

3. Toolcoach Skill (internal/skills/toolcoach/)

4. Middleware Stack Order

5. UI & Integration

6. Database & Schema

Testing

Configuration Example

Checklist

Uh oh!

charmcli commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yhyu13 commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Critic Skill (`internal/skills/critic/`)

2. Replacer Skill (`internal/skills/replacer/`)

3. Toolcoach Skill (`internal/skills/toolcoach/`)

charmcli commented May 24, 2026 •

edited

Loading