Skip to content

feat(local-review): add fixed-claude CLI engine via --engine#309

Closed
anagnorisis2peripeteia wants to merge 5 commits into
openclaw:mainfrom
anagnorisis2peripeteia:feat/local-review-engines
Closed

feat(local-review): add fixed-claude CLI engine via --engine#309
anagnorisis2peripeteia wants to merge 5 commits into
openclaw:mainfrom
anagnorisis2peripeteia:feat/local-review-engines

Conversation

@anagnorisis2peripeteia

@anagnorisis2peripeteia anagnorisis2peripeteia commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

What

local-review --engine claude — drive a fixed claude CLI through the same bounded runner as codex. runCodexProcess gains an optional command param (the worker already takes one internally), so a second fixed review CLI reuses the existing process/retry/capture path. codex stays the default. Provider-neutral, zero new dependencies (package.json unchanged).

pnpm local-review -- --base main                 # codex (default)
pnpm local-review -- --base main --engine claude # fixed `claude` CLI

Addresses the #274 close, point by point

  • fixed trusted executable boundary → fixed claude/codex; no --claude-bin / no operator-configurable binary
  • minimal allowlisted environmentcodexEnv() (scrubs GitHub/OpenAI/app creds, keeps the OS env + the engine's own auth)
  • shared process capture/retry handling → the existing runCodexProcess worker, now command-agnostic
  • identical final schema validation → same failureReport / stripMarkdownFence / timestamp path as codex
  • provider-neutral committed-range review + unique temporary state → inherited from the merged local-review foundation
  • no SDK dependency → CLI subprocess, not the Agent SDK

Auth

The claude engine uses the caller's existing Claude auth. For a subscription, set CLAUDE_CODE_OAUTH_TOKEN (from claude setup-token) and leave ANTHROPIC_API_KEY unset.

Verification

  • pnpm run build, lint (type-aware), oxfmt, and the local-review test suite (8 tests, incl. a new unknown---engine guard) all pass.
  • codex path is fully covered by the existing tests — the runner change is backward-compatible (default still resolves codex).
  • The claude execution path is now run-verified via a meta self-review: I ran local-review --engine claude on this very branch (subscription auth, ANTHROPIC_API_KEY unset). It executed end-to-end through the shared runner, read the changed files in full, and produced a schema-valid report each iteration. The loop also caught — and I fixed — four real bugs in the engine itself before converging clean (see below), which doubles as evidence the lane actually reviews rather than rubber-stamps.

Proof — claude engine reviewing its own PR

local-review --engine claude --base origin/main run repeatedly against this branch as it hardened. Each run produced a valid report (front matter + result/confidence/highest_severity); the findings drove the fix commits:

run result severity finding → fix
1 findings (3) claude lane couldn't see the diff / 64 KiB stdout tail truncated front matter / spawn-error gave no detail → b4ea2701 (diff in prompt, full-stdout capture, error fallback)
2 findings medium --allowedTools "Read Grep Glob" was one space-joined token → no tools enabled in -p mode → d0d62765 (comma-separate Read,Grep,Glob)
3 findings low whole-branch git diff buffered host-side could exceed maxBuffer and crash before any report → 36239522 (try/catch → failureReport)
4 nothing_found none clean, high confidence — reviewer read 9 files in full and traced every spawn path

The final clean run (36239522, result: nothing_found, confidence: high) verifies the fixed binary, the env scrub, full-stdout capture, and engine-validation ordering by name. Run via ambient subscription credentials; no API key, no token minted.

Not included

  • The Agent-SDK approach (it pulls many deps — wrong fit for this zero-dep tool).
  • Any configurable binary or second-provider scope beyond the engine switch.

Generalizes runCodexProcess with an optional command param so the same bounded runner drives a fixed `claude` CLI; adds --engine codex|claude (codex default). claude engine: fixed binary (no operator override), codexEnv() scrub + read-only tools, subscription auth via CLAUDE_CODE_OAUTH_TOKEN. Provider-neutral, zero new deps. Inherits openclaw#298's committed-range/clean-checkout/unique-dir/reject-unsupported. Addresses the openclaw#274 close: fixed boundary, shared runner, scrubbed env, identical schema, no SDK deps, no configurable binary.
@anagnorisis2peripeteia

Copy link
Copy Markdown
Contributor Author

@clawsweeper review

@clawsweeper

clawsweeper Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper

clawsweeper Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 17, 2026, 3:15 AM ET / 07:15 UTC.

Summary
The PR adds local-review --engine codex|claude, routes --engine claude through the shared bounded subprocess runner with a fixed claude command, documents Claude auth, and adds an invalid-engine test.

Reproducibility: not applicable. this is a new local-review provider feature, not a broken existing behavior. The verification path is an authenticated local-review --engine claude run, and the PR body now reports that path running end to end on this branch.

Review metrics: 2 noteworthy metrics.

  • Changed surface: 4 files, +162/-6. The diff is compact but spans runtime subprocess selection, CLI behavior, docs, and tests.
  • New engine path: 1 local-review engine added. The provider executable path is the main maintainer decision surface, not the raw line count.

Root-cause cluster
Relationship: canonical
Canonical: #309
Summary: This PR is the current open canonical candidate for the Claude local-review engine; the older Claude attempt is closed unmerged and the merged local-review foundation is adjacent prerequisite work.

Members:

Proposal only: this assessment does not dispatch repair, suppress jobs, mutate sibling items, close, or merge anything.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Maintainers should explicitly accept or reject supporting claude as a fixed local-review provider before merge.

Risk before merge

  • [P1] Merging intentionally adds a second trusted local review provider executable that receives repository review context and the caller's local Claude authentication.
  • [P1] The new engine path depends on an installed, authenticated claude CLI, and green repository CI does not exercise that external provider runtime path.
  • [P1] Maintainers still need to decide whether supporting Claude as a first-class local-review engine fits the ClawSweeper automation boundary.

Maintainer options:

  1. Accept The Fixed Claude Boundary
    Maintainers can merge after explicitly accepting claude as a supported local-review provider that runs through the shared bounded subprocess path.
  2. Keep Codex-Only For Now
    If maintainers are not ready to own a second local provider, pause or close this PR while preserving the proof and design notes for a future provider decision.

Next step before merge

  • [P2] Manual review is appropriate because the remaining blocker is maintainer acceptance of a new provider CLI and security boundary, not a narrow automated code repair.

Security
Needs attention: The diff is intentionally security-sensitive because it adds a new trusted local provider executable that receives repository context and local Claude auth.

Review details

Best possible solution:

Land the fixed-binary Claude engine only if maintainers accept it as a supported local-review provider while preserving Codex as the default and keeping the offline local-review credential boundaries intact.

Do we have a high-confidence way to reproduce the issue?

Not applicable: this is a new local-review provider feature, not a broken existing behavior. The verification path is an authenticated local-review --engine claude run, and the PR body now reports that path running end to end on this branch.

Is this the best way to solve the issue?

Yes for the technical shape, pending product acceptance: the fixed executable and shared runner are narrower than the closed earlier Claude PR. The remaining question is whether maintainers want to support this second local provider boundary in core.

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against c88270b81889.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body now includes after-fix real run proof for local-review --engine claude, including repeated self-review runs and a final schema-valid nothing_found result at the current head.
  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
  • add status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (live_output): The PR body now includes after-fix real run proof for local-review --engine claude, including repeated self-review runs and a final schema-valid nothing_found result at the current head.
  • remove rating: 🧂 unranked krab: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.
  • remove status: 📣 needs proof: Current PR status label is status: ⏳ waiting on author.

Label justifications:

  • P2: This is a normal-priority local-review feature with bounded user impact but meaningful automation and provider-boundary review needs.
  • merge-risk: 🚨 automation: The PR changes local-review subprocess selection and adds an external provider runtime path that CI cannot fully prove.
  • merge-risk: 🚨 security-boundary: The fixed claude executable would receive repository review context and local Claude authentication as a newly supported trusted provider boundary.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (live_output): The PR body now includes after-fix real run proof for local-review --engine claude, including repeated self-review runs and a final schema-valid nothing_found result at the current head.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body now includes after-fix real run proof for local-review --engine claude, including repeated self-review runs and a final schema-valid nothing_found result at the current head.
Evidence reviewed

Security concerns:

  • [medium] Accept the trusted Claude CLI boundary — src/commit-sweeper.ts:420
    runClaudeReview invokes a fixed claude executable in the target repository with the full review prompt and caller Claude auth; codexEnv() and read-only tool args reduce risk, but maintainers still need to accept this new provider trust boundary.
    Confidence: 0.86

What I checked:

  • AGENTS.md policy read: Repository policy was read fully and applied because this PR changes ClawSweeper automation and subprocess behavior. (AGENTS.md:1, c88270b81889)
  • Current main lacks the requested engine: Current main's local-review path still calls the Codex runner directly and has no --engine or claude matches in src/commit-sweeper.ts. (src/commit-sweeper.ts:487, c88270b81889)
  • Latest release lacks the requested engine: The v0.3.0 release tag has the local-review command but no --engine or claude implementation, so the requested capability has not shipped. (src/commit-sweeper.ts:487, dc824915bb6c)
  • PR implementation: PR head adds runClaudeReview, embeds the range diff for the read-only Claude lane, invokes fixed command claude, passes Read,Grep,Glob, and uses codexEnv(). (src/commit-sweeper.ts:360, 36239522d6a1)
  • Shared runner change: PR head adds an optional command override to runCodexProcess while preserving Codex as the default command. (src/codex-process.ts:82, 36239522d6a1)
  • Docs and validation coverage: PR head documents the fixed Claude engine and adds a guard test for unknown --engine values. (docs/commit-sweeper.md:276, 36239522d6a1)

Likely related people:

What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. labels Jun 16, 2026
Move the --engine codex|claude validation to the top of localReviewCommand so an invalid engine fails fast before any run state (run dir, GH_CONFIG_DIR) is created. Addresses the P3 review finding on openclaw#309.
@anagnorisis2peripeteia

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

…full stdout, error fallback)

Found by local-review --engine claude reviewing its own PR. (1) Embed the full committed-range diff in the claude prompt — the read-only reviewer can't run git to see the change (codex's sandbox can). (2) Read the full captured stdout file instead of the 64 KiB tail, so a large report's YAML front matter isn't truncated. (3) Fall back to stdout in failure reports when stderr is empty (mirrors codex).
"Read Grep Glob" was passed as a single space-joined token (one bogus rule → no tools enabled in -p mode). Comma-separate so Read/Grep/Glob parse as three rules and the reviewer can actually open repo files. Found by the claude meta self-review.
…d branches

run() buffers the whole diff host-side; a >maxBuffer whole-branch diff threw before the 256KiB cap applied, aborting the process with no report. Wrap in try/catch -> failureReport, matching the lane's graceful-degradation pattern. Found by the claude meta self-review.
@anagnorisis2peripeteia

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@anagnorisis2peripeteia anagnorisis2peripeteia marked this pull request as ready for review June 17, 2026 06:48
@anagnorisis2peripeteia

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 17, 2026
@steipete

Copy link
Copy Markdown
Contributor

Closing for now because we are not adding a Claude execution provider yet. The implementation was thoughtful and can be revisited if provider expansion becomes explicit product direction. Thank you for the work and validation.

@steipete steipete closed this Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. P2 Normal priority bug or improvement with limited blast radius. proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants