Feat/self critic coaching#2996
Open
yhyu13 wants to merge 12 commits into
Open
Conversation
Contributor
|
All contributors have signed the CLA ✍️ ✅ |
Author
|
I have read the Contributor License Agreement (CLA) and hereby sign the CLA. |
… conversation coach This squash commit brings in the self-critic improvement feature: - Add CriticConfig and ReplacerConfig to crush.json options with auto-enable - Add self-critic skill core: checkpoints, snapshots, diff computation, revision loop - Add critic CLI commands (critic list, show, stats) and database persistence - Add message-level checkpoint review in addition to file-edit checkpoints - Add replacement agent (conversation coach) for conversation continuation - Add animated spinners and cancellation support for replacer evaluation - Track file writes via filetracker (RecordWrite / ListWrittenFiles) - Wire write tracking into edit, multiedit, and write tools - Fix critic first-run blind spot by re-querying read/written files post-run - Add written_files DB table, migration, and sqlc queries - Add critic/replacer event telemetry and pub/sub integration - Add project-local template overrides for critic and replacer prompts - Fix template whitespace to avoid breaking VCR cassette matching
…f-critic Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ner fixes Toolcoach skill: - Telemetry & measurement (coachMetrics) - Fast-path pattern matching, jsonpeek zero-allocation extractor, regex pool - Semantic edit validation with cached view content and time thresholds - Adaptive severity based on SQLite effectiveness tracking - Progressive coaching intensity (tutor/balanced/minimal) with auto-switch - Enhanced success tracking with per-pattern Validate functions - Guided retry with AutoRetry (1/turn cap) - Coach context fed to critic via CoachSummaryProvider Replacer fixes: - Delete orphaned eval spinners on primary error, nil result, context cancel - Add deleteEvalIndicator helper with background context and retry UI: - /skipcoach command to interrupt current coach evaluation one time - Skip Coach command in commands dialog
…tale references - Fix HOOKS.md path to docs/hooks/README.md - Document all toolcoach config fields (adaptive_severity, intensity, auto_retry, auto_retry_sessions) - Add supervisor-impl builtin skill to list with note that it fails to parse (no YAML frontmatter)
Includes: - replacer: /skipcoach command, orphaned spinner cleanup, one-time skip - critic+toolcoach: SkipCoach delegation through wrapper chain - AGENTS.md: updated toolcoach config, wrapper order, skip docs
Any keystroke or Enter press while the agent is busy now signals AgentSkipCoach so the user can interrupt an ongoing or pending coach evaluation without needing the explicit /skipcoach command.
Includes AGENTS.md updates for UI interaction patterns and toolcoach phase history, plus minor formatting in toolcoach middleware.
Commands that previously showed 'Agent is busy, please wait' during coach evaluation now call AgentSkipCoach and proceed. If the primary agent is still running after the skip attempt, the warning is shown as before. This applies to: - New Session, Summarize, External Editor, Initialize Project - Select Model, Select Reasoning Effort - Suspend, Ctrl+N new session, Ctrl+E open editor
trySkipCoach() now returns void and always lets the caller proceed. The previous implementation checked isAgentBusy() again after signaling SkipCoach, but the coach cancels asynchronously so the second check always returned true and the warning still appeared. All commands now proceed immediately while signaling the coach to skip: New Session, Summarize, External Editor, Initialize Project, Select Model, Select Reasoning Effort, Suspend, Ctrl+N, Ctrl+E.
d3eaf09 to
7d08dd8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces three built-in agent skills — Critic, Replacer, and Toolcoach — that work together as a middleware stack around the primary
SessionAgent. The goal is to improve output quality, prevent regressions, and coach the agent toward better tool usage patterns without adding perceivable latency.What Changed
1. Critic Skill (
internal/skills/critic/)A secondary-LLM review layer that inspects the agent's diff after each turn and returns one of:
approve— changes look good, turn completes normallyrevise— rollback changes, inject feedback, re-drive the primary agenthalt— rollback changes, return error to userKey features:
crush.json(options.critic)crush critic list,crush critic show,crush critic statsReplacerEnabledflag2. Replacer Skill (
internal/skills/replacer/)A conversation coach that evaluates whether the primary agent's response is complete or needs a follow-up. Uses a small/fast model to decide
stopvscontinue.Key features:
stop)/skipcoachcommand to interrupt current evaluation one-time3. Toolcoach Skill (
internal/skills/toolcoach/)A zero-LLM, heuristic-based skill that detects anti-patterns in real-time tool usage and injects coaching tips into tool results.
Key features:
destructive_bash,write_over_existing,edit_without_view,broad_grep,repeated_view,missing_multieditcrush.json(options.toolcoach)4. Middleware Stack Order
SkipCoachpropagates through the chain via interface type-assertion at each layer.5. UI & Integration
/skipcoach(orskipcoach) intercepted in textareaAgentSkipCoach(sessionID)wired through workspace → coordinator → middleware chain6. Database & Schema
New migrations:
critic_reviewstable (verdict, confidence, concerns, diff snapshot)written_filestable (tracks files written per session for diff baseline)toolcoach_effectivenesstable (pattern hit/miss tracking)Testing
Configuration Example
{ "options": { "critic": { "enabled": true, "model": "anthropic/claude-sonnet-4", "max_iterations": 3, "auto_approve": false, "threshold": 0.85 }, "replacer": { "enabled": true, "max_iterations": 3 }, "toolcoach": { "enabled": true, "max_patterns": 100 } } }Checklist
CONTRIBUTING.md.