Skip to content

Feat/self critic coaching#2996

Open
yhyu13 wants to merge 12 commits into
charmbracelet:mainfrom
yhyu13:feat/self-critic-squashed
Open

Feat/self critic coaching#2996
yhyu13 wants to merge 12 commits into
charmbracelet:mainfrom
yhyu13:feat/self-critic-squashed

Conversation

@yhyu13

@yhyu13 yhyu13 commented May 24, 2026

Copy link
Copy Markdown

Summary

This PR introduces three built-in agent skills — Critic, Replacer, and Toolcoach — that work together as a middleware stack around the primary SessionAgent. The goal is to improve output quality, prevent regressions, and coach the agent toward better tool usage patterns without adding perceivable latency.

What Changed

1. Critic Skill (internal/skills/critic/)

A secondary-LLM review layer that inspects the agent's diff after each turn and returns one of:

  • approve — changes look good, turn completes normally
  • revise — rollback changes, inject feedback, re-drive the primary agent
  • halt — rollback changes, return error to user

Key features:

  • SQLite-backed review persistence with sqlc-generated queries
  • SHA-256 keyed LRU cache for deduplicated reviews
  • Circuit breaker + timeout for resilient LLM calls
  • LSP diagnostic fetching for richer context
  • Configurable via crush.json (options.critic)
  • CLI commands: crush critic list, crush critic show, crush critic stats
  • Per-session disable via ReplacerEnabled flag

2. Replacer Skill (internal/skills/replacer/)

A conversation coach that evaluates whether the primary agent's response is complete or needs a follow-up. Uses a small/fast model to decide stop vs continue.

Key features:

  • Configurable max iterations (default 3)
  • Duplicate-prompt guard to avoid repetitive follow-ups
  • Timeout handling (deadline exceeded = treat as stop)
  • Coach spinner indicators ("Coach is ready" → "Coach is evaluating")
  • /skipcoach command to interrupt current evaluation one-time
  • Commands dialog integration ("Skip Coach")
  • Auto-interrupt on user typing or Enter press

3. Toolcoach Skill (internal/skills/toolcoach/)

A zero-LLM, heuristic-based skill that detects anti-patterns in real-time tool usage and injects coaching tips into tool results.

Key features:

  • ~2.7µs overhead per tool call (benchmarked)
  • Detects patterns: destructive_bash, write_over_existing, edit_without_view, broad_grep, repeated_view, missing_multiedit
  • SQLite effectiveness tracking with adaptive severity
  • Progressive coaching (hint → warning → critical) based on repetition
  • Guided retry mechanism for destructive operations
  • Configurable via crush.json (options.toolcoach)

4. Middleware Stack Order

outermost: ToolcoachMiddleware
            ↓ wraps
           ReplacerMiddleware
            ↓ wraps
           CriticMiddleware
            ↓ wraps
inner-most: SessionAgent (primary)

SkipCoach propagates through the chain via interface type-assertion at each layer.

5. UI & Integration

  • New commands dialog items: "Skip Coach"
  • /skipcoach (or skipcoach) intercepted in textarea
  • AgentSkipCoach(sessionID) wired through workspace → coordinator → middleware chain
  • Coach evaluation no longer blocks user commands (typing, new session, model selection, etc.)
  • Visual busy indicators preserved (placeholder, progress bar, cancel key, todo spinner)

6. Database & Schema

New migrations:

  • critic_reviews table (verdict, confidence, concerns, diff snapshot)
  • written_files table (tracks files written per session for diff baseline)
  • toolcoach_effectiveness table (pattern hit/miss tracking)

Testing

# Critic tests
go test ./internal/skills/critic/... -v -race

# Replacer tests
go test ./internal/skills/replacer/... -v -race

# Toolcoach tests
go test ./internal/skills/toolcoach/... -v -race

# Standalone critic + replacer demo (no API keys)
go run ./cmd/critic-demo

Configuration Example

{
  "options": {
    "critic": {
      "enabled": true,
      "model": "anthropic/claude-sonnet-4",
      "max_iterations": 3,
      "auto_approve": false,
      "threshold": 0.85
    },
    "replacer": {
      "enabled": true,
      "max_iterations": 3
    },
    "toolcoach": {
      "enabled": true,
      "max_patterns": 100
    }
  }
}

Checklist

  • I have read CONTRIBUTING.md.
  • I have created a discussion that was approved by a maintainer (for new features).

@charmcli

charmcli commented May 24, 2026

Copy link
Copy Markdown
Contributor

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@yhyu13

yhyu13 commented May 24, 2026

Copy link
Copy Markdown
Author

I have read the Contributor License Agreement (CLA) and hereby sign the CLA.

yhyu13 and others added 12 commits May 29, 2026 23:39
… conversation coach

This squash commit brings in the self-critic improvement feature:

- Add CriticConfig and ReplacerConfig to crush.json options with auto-enable
- Add self-critic skill core: checkpoints, snapshots, diff computation, revision loop
- Add critic CLI commands (critic list, show, stats) and database persistence
- Add message-level checkpoint review in addition to file-edit checkpoints
- Add replacement agent (conversation coach) for conversation continuation
- Add animated spinners and cancellation support for replacer evaluation
- Track file writes via filetracker (RecordWrite / ListWrittenFiles)
- Wire write tracking into edit, multiedit, and write tools
- Fix critic first-run blind spot by re-querying read/written files post-run
- Add written_files DB table, migration, and sqlc queries
- Add critic/replacer event telemetry and pub/sub integration
- Add project-local template overrides for critic and replacer prompts
- Fix template whitespace to avoid breaking VCR cassette matching
…f-critic

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ner fixes

Toolcoach skill:
- Telemetry & measurement (coachMetrics)
- Fast-path pattern matching, jsonpeek zero-allocation extractor, regex pool
- Semantic edit validation with cached view content and time thresholds
- Adaptive severity based on SQLite effectiveness tracking
- Progressive coaching intensity (tutor/balanced/minimal) with auto-switch
- Enhanced success tracking with per-pattern Validate functions
- Guided retry with AutoRetry (1/turn cap)
- Coach context fed to critic via CoachSummaryProvider

Replacer fixes:
- Delete orphaned eval spinners on primary error, nil result, context cancel
- Add deleteEvalIndicator helper with background context and retry

UI:
- /skipcoach command to interrupt current coach evaluation one time
- Skip Coach command in commands dialog
…tale references

- Fix HOOKS.md path to docs/hooks/README.md
- Document all toolcoach config fields (adaptive_severity, intensity, auto_retry, auto_retry_sessions)
- Add supervisor-impl builtin skill to list with note that it fails to parse (no YAML frontmatter)
Includes:
- replacer: /skipcoach command, orphaned spinner cleanup, one-time skip
- critic+toolcoach: SkipCoach delegation through wrapper chain
- AGENTS.md: updated toolcoach config, wrapper order, skip docs
Any keystroke or Enter press while the agent is busy now signals
AgentSkipCoach so the user can interrupt an ongoing or pending
coach evaluation without needing the explicit /skipcoach command.
Includes AGENTS.md updates for UI interaction patterns and toolcoach
phase history, plus minor formatting in toolcoach middleware.
Commands that previously showed 'Agent is busy, please wait' during
coach evaluation now call AgentSkipCoach and proceed. If the primary
agent is still running after the skip attempt, the warning is shown
as before. This applies to:

- New Session, Summarize, External Editor, Initialize Project
- Select Model, Select Reasoning Effort
- Suspend, Ctrl+N new session, Ctrl+E open editor
trySkipCoach() now returns void and always lets the caller proceed.
The previous implementation checked isAgentBusy() again after signaling
SkipCoach, but the coach cancels asynchronously so the second check
always returned true and the warning still appeared.

All commands now proceed immediately while signaling the coach to skip:
New Session, Summarize, External Editor, Initialize Project, Select
Model, Select Reasoning Effort, Suspend, Ctrl+N, Ctrl+E.
@yhyu13 yhyu13 force-pushed the feat/self-critic-squashed branch from d3eaf09 to 7d08dd8 Compare May 29, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants