feat(local-review): add claude CLI (-p) review engine (alternative to codex)#274
Conversation
Adds a 'local-review' CLI command that reviews the current branch diff against a base branch using Codex, without requiring a GitHub PR. Includes all prerequisites for local runs: - CODEX_BIN env var support (Windows .cmd wrapper compatibility) - shell: true on win32 for all Codex spawn sites - CLAWSWEEPER_CODEX_LOGIN_METHOD for ChatGPT Pro OAuth login - Applied to all 4 Codex subprocess launchers - Clears stale output before each review run Merge order: this PR subsumes openclaw#250 (Windows spawn) and openclaw#251 (OAuth login). If this lands first, close those as superseded. If either lands first, this PR needs a trivial rebase (identical changes).
Add `--engine claude` as an alternative to codex for the local-review command, shelling out to the Claude CLI in headless `-p` mode. The binary is configurable via `--claude-bin` / CLAUDE_BIN so any `claude -p`-compatible CLI can be dropped in, and `--claude-model` sets the model. Codex remains the default engine (no behaviour change for existing runs). Reuses the existing review prompt and the Decision schema validation; the schema travels in the prompt because the CLI's `--json-schema` is inline-only and the full schema exceeds the platform argv limit. The terminal result record is recognised the same way as a claude-stream-json stream (handles both the --output-format json array and the stream-json line forms), preferring the native structured_output and falling back to the result text. Read-only is enforced via the prompt contract, disabled edit tools, and a dirty-checkout guard.
|
Codex review: needs maintainer review before merge. Reviewed June 11, 2026, 4:59 AM ET / 08:59 UTC. Summary Reproducibility: not applicable. This PR adds a new optional review engine rather than fixing an existing documented failure; the relevant verification is its supplied end-to-end behavior proof. Review metrics: 3 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Resolve #253 first, rebase this PR while preserving newer current-main Codex and review behavior, and land the Claude engine only with clear documentation that the configured executable is trusted code. Do we have a high-confidence way to reproduce the issue? Not applicable: this PR adds a new optional review engine rather than fixing an existing documented failure; the relevant verification is its supplied end-to-end behavior proof. Is this the best way to solve the issue? Unclear: the hardened engine design is technically plausible, but the final rebased implementation and long-term decision to support a second model-specific runtime require maintainer judgment. AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against 429a73be116e. Label changesLabel justifications:
Evidence reviewedSecurity concerns:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
Address review feedback on the claude engine: - Pass a scrubbed environment (claudeReviewEnv) to the spawned binary so the operator-configurable --claude-bin never receives GitHub/OpenAI/ClawSweeper App credentials. - Replace --permission-mode bypassPermissions with an explicit read-only --allowedTools allow-list (Read/Grep/Glob; no edit/write/bash). - Extract argv/env construction and result parsing into src/claude-engine.ts with unit tests (test/local-review-claude.test.ts).
|
@clawsweeper re-review — addressed the P1s: scrubbed subprocess env (no GitHub/OpenAI/app creds reach the configurable binary), replaced |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
…ivergent output `local-review --engine claude` failed at the final validation step. The Claude engine only gets the decision schema as prompt guidance (Codex constrains generation via --output-schema; Claude's --json-schema is validate-or-drop and the full schema exceeds the Windows cmd argv limit through the claude.cmd shim). So the model intermittently (a) adds keys the schema does not define (e.g. `kind` on a review finding) that strict parseDecision rejects outright, and (b) omits a required field such as mantisRecommendation.maintainerComment. - pruneToSchema(value, schema) in claude-engine.ts: a pure, schema-driven walk that drops undefined keys at every nesting level (object/array nodes are recognised by `properties`/`items`, so union- or untyped nodes are handled), reporting the dropped paths. Absorbs the extra-key case with no model round-trip. - Bounded correction retry in runClaude: feed the exact validation error back to Claude (<=2 corrections) so it self-corrects omissions, without blind-defaulting critical fields. Hard errors (spawn/dirty-checkout) still propagate immediately. - Memoize the parsed decision schema. Tested: build + lint clean; new pruneToSchema unit tests (incl. union types) pass; live `local-review --engine claude` over a real diff returns valid, on-topic decisions, exercising both the prune and the retry. The 28 pre-existing apply-decisions suite failures are environmental on Windows and reproduce identically without this change.
…aude omission in the engine path Addresses ClawSweeper's openclaw#274 review (P2, conf 0.96). The prior commit defaulted mantisRecommendation.maintainerComment to "" in the SHARED parseDecision, which relaxed the contract for every engine and let a status:"recommended" result with no command pass validation despite being non-dispatchable downstream. - Restore strict requireString(maintainerComment) in parseMantisRecommendation for all engines, preserving the "recommended => needs a command" invariant. - Move the Claude-specific tolerance into the engine path: normalizeClaudeDecisionShape defaults maintainerComment to "" ONLY when status !== "recommended" (the case models systematically omit). A recommended result missing the comment is left to fail strict validation so the bounded correction retry has Claude supply it. - Tests: strict-required regression for both statuses; normalize-only-when-not-recommended. - Document the --claude-bin operator trust boundary.
|
@clawsweeper re-review — addressed the P2 correctness finding (conf 0.96): reverted the shared-parser |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Review note from local harness testing: Validated the #274 branch locally on Linux: pnpm install --frozen-lockfile
pnpm run build
node --test test/local-review-claude.test.ts
pnpm run test:unitResults:
I did not live-test Recommended merge order: land #253 first, then rebase/merge #274. |
Summary
Adds a Claude CLI (
-p) review engine tolocal-review, selected with--engine claude, as an alternative to the Codex engine. Codex stays the default, so existing runs are unchanged.Unlike the Codex engine — which authenticates via Codex OAuth /
CLAWSWEEPER_CODEX_LOGIN_METHOD(#251) and needs the Windows spawn handling in #250 — the Claude engine shells out toclaude -pand relies on the CLI's own local auth, so it needs none of that login plumbing. The binary is configurable via--claude-bin/CLAUDE_BIN, so anyclaude -p-compatible CLI can be dropped in;--claude-modelsets the model.This builds directly on #253 (
feat/local-review-command) — thelocal-reviewcommand this extends — and is branched from it. Until #253 merges, this PR's diff includes #253's commit. Merge #253 first; the new content here is theclaude CLI (-p) review engine+harden(...)commits.How it works
buildReviewPrompt) and theDecisionschema validation (parseDecision) unchanged — the engine only swaps the subprocess and the output recognition.claude's--json-schemais inline-only and the full schema (~36 KB) exceeds the platform argv limit, so it's appended to the prompt (stdin, unbounded) with an explicit output contract.--output-format jsonarray form and thestream-jsonline form), preferring the CLI's nativestructured_outputand falling back to the result text.Security hardening (addressing review feedback)
The security-relevant subprocess construction is extracted into
src/claude-engine.tsand unit-tested (test/local-review-claude.test.ts, 7 tests):claudeReviewEnv()that withholds GitHub/OpenAI/ClawSweeper-App credentials (GH_TOKEN,GITHUB_TOKEN,COMMIT_SWEEPER_TARGET_GH_TOKEN,CLAWSWEEPER_PROOF_INSPECTION_TOKEN,CLAWSWEEPER_APP_ID,CLAWSWEEPER_APP_PRIVATE_KEY,OPENAI_API_KEY,CODEX_API_KEY) — important because--claude-binis operator-configurable. The Claude CLI authenticates from its own local credentials.--permission-mode bypassPermissionswith an explicit--allowedTools Read Grep Globallow-list — noEdit/Write/Bash/bypass. The dirty-checkout guard remains as defense-in-depth.Real behavior proof
Ran the hardened engine end-to-end against a one-line planted bug (
add()rewritten to subtract). It produced a validDecisionthat flagged the contract break and passed strictparseDecisionvalidation (paths redacted):Testing
tsgobuild clean;node --test test/local-review-claude.test.ts→ 7/7 pass (read-only argv, scrubbed env, result-record parsing in both array and NDJSON forms, JSON extraction).local-review --engine claudeproof above.