Skip to content

feat: add Codex-owned review workflow#276

Open
munsunouk wants to merge 19 commits into
alibaba:mainfrom
munsunouk:codex/fix-codex-owned-review
Open

feat: add Codex-owned review workflow#276
munsunouk wants to merge 19 commits into
alibaba:mainfrom
munsunouk:codex/fix-codex-owned-review

Conversation

@munsunouk

@munsunouk munsunouk commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

This re-submits the Codex-owned review flow from #245 and addresses the OCR bot feedback from that PR.

The change keeps OCR as the deterministic data plane for host-agent-led reviews:

  • ocr agent prepare builds review bundles/manifests without invoking OCR's LLM backend
  • ocr agent context exposes target-aware read/diff/search helpers
  • ocr agent validate-comments validates agent findings against immutable bundle evidence
  • ocr agent report renders validated findings

This is intended to address the Codex-native workflow requested in #275 without requiring OCR to call Codex CLI as a chat-completions provider.

Maintainer follow-up

  • Preserved existing ocr review / ocr scan include-filter behavior; strict include allowlist semantics are intentionally left out of this PR.
  • Added coverage that symbolic refs still work for code search after ref pre-resolution.
  • Landed the neutral ocr agent ... CLI surface in this PR and removed the public ocr codex command path.
  • Session/report observability is now agent-neutral, and viewer support for legacy codex_event / codex-owned records has also been removed.

OCR bot feedback addressed

  • add top-level help hint for ocr codex
  • finalize Codex report sessions after report rendering succeeds
  • improve scan manifest diagnostics
  • clamp context reads to the documented 500-line maximum
  • use cleaned search patterns for context search
  • add cancellation checks and cheaper partition sizing
  • normalize bundle schema $defs naming
  • fix Markdown code span/fence escaping and text warnings
  • make scan target hashes cover reviewed files and make partial budget-driven
  • avoid redundant scan batch parsing
  • stop validation after excluded paths and cache target file reads
  • preserve native include-filter behavior for diff and scan targets
  • resolve range/commit refs to immutable SHAs before later diff loading
  • stream untracked file hashing instead of reading whole files into memory
  • add viewer fallbacks for empty Codex fields
  • harden Codex session record serialization and avoid redundant chmod
  • propagate code-search ref resolution timeouts
  • escape codegraph script path interpolation
  • remove the public ocr codex CLI surface in favor of ocr agent
  • reject legacy codex_event / codex-owned viewer records

Tests

  • GOTOOLCHAIN=auto go test ./...
  • GOTOOLCHAIN=auto go build -o dist/opencodereview ./cmd/opencodereview

@CLAassistant

CLAassistant commented Jul 2, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ munsunouk
❌ Alonoril
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 OpenCodeReview found 23 issue(s) in this PR.

  • ✅ 23 posted as inline comment(s)
  • 📝 0 posted as summary

Comment thread cmd/opencodereview/codex_report_cmd.go Outdated
Comment on lines +175 to +180
func printCodexReportUsage(writer io.Writer) {
fmt.Fprintln(writer, `Usage:
ocr codex report --bundle FILE --comments FILE
[--validation FILE] [--format markdown|text|json]
[--output FILE]`)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage text omits the --repo, --session-id, and --validation flags that are registered in parseCodexReportFlags. Users invoking --help will not see these options. Consider documenting all supported flags, similar to how printCodexPrepareUsage documents its flags.

Suggestion:

Suggested change
func printCodexReportUsage(writer io.Writer) {
fmt.Fprintln(writer, `Usage:
ocr codex report --bundle FILE --comments FILE
[--validation FILE] [--format markdown|text|json]
[--output FILE]`)
}
func printCodexReportUsage(writer io.Writer) {
fmt.Fprintln(writer, `Usage:
ocr codex report --bundle FILE --comments FILE
[--validation FILE] [--format markdown|text|json]
[--output FILE] [--repo PATH] [--session-id ID]`)

Comment on lines +44 to +46
if comments.Comments == nil {
return nil, fmt.Errorf("invalid comments schema: comments must be an array")
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The error message "comments must be an array" is slightly misleading. This nil check triggers both when the comments field is entirely absent from the JSON and when it is explicitly set to null. In both cases the decoded slice is nil. Consider a more accurate message like "comments field is required and must be an array" to help callers diagnose the issue faster.

Suggestion:

Suggested change
if comments.Comments == nil {
return nil, fmt.Errorf("invalid comments schema: comments must be an array")
}
if comments.Comments == nil {
return nil, fmt.Errorf("invalid comments schema: comments field is required and must be an array")
}

Comment on lines +156 to +163
patternArguments := make([]any, 0, len(patterns))
for _, pattern := range patterns {
cleaned, safe := cleanProtocolPath(pattern)
if !safe {
return ContextResult{}, &ProtocolError{Code: "path_escape", Message: "file pattern must stay inside the repository"}
}
patternArguments = append(patternArguments, cleaned)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: cleanProtocolPath applies filepath.Clean() to file patterns, which will mangle git pathspec syntax. For example, a pattern like :(exclude)*_test.go would be transformed into something invalid by filepath.Clean, and glob patterns like *.go may also be affected since filepath.Clean can alter special characters.

The underlying code_search.go tool passes these patterns directly as git pathspec arguments (cmdArgs = append(cmdArgs, pathspec...)), so they need to preserve their original form.

Consider using a lighter validation for search patterns that only checks for path traversal (e.g., rejecting absolute paths and .. components) without running filepath.Clean, or skip sanitization entirely for patterns since they are used as git pathspecs, not filesystem paths.

Comment on lines +176 to +189
func (service *ContextService) ready(ctx context.Context) error {
if service.bundle == nil {
return fmt.Errorf("bundle is required")
}
if service.repoDir == "" {
return fmt.Errorf("repository directory is required")
}
result := ValidationResult{Errors: make([]ValidationNotice, 0)}
validateFreshTarget(ctx, &result, service.bundle, service.repoDir, service.runner)
if len(result.Errors) > 0 {
return &ProtocolError{Code: "stale_bundle", Message: result.Errors[0].Message}
}
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance concern: ready() calls validateFreshTarget() on every single context operation (Read, Find, Diff, Search). For scan-mode bundles, this reads and hashes every file in the bundle. For other modes, it resolves the target via git commands. When an agent performs multiple sequential context operations during a review session, this repeated validation adds significant overhead.

Consider caching the validation result (e.g., validate once at construction time or use a sync.Once), or providing a way to skip re-validation when the caller knows the bundle is fresh.

Comment thread internal/reviewbundle/partition.go Outdated
empty, encoded, err := buildDiffPartition(full, nil, maxBundleSize)
estimated := int64(4096)
if err == nil {
estimated = int64(len(encoded)) + empty.Contract.BundleSizeBytes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The base size estimate is doubled. After marshalWithStableSize, empty.Contract.BundleSizeBytes equals len(encoded) (they converge to the same value). So this line computes estimated = len(encoded) + len(encoded) = 2 * len(encoded), which overestimates the empty-partition overhead by 2×.

This causes partitions to be under-filled (fewer files per partition than necessary), leading to more bundles than needed and wasted capacity. If the intent was to use just the encoded size as the base, remove the + empty.Contract.BundleSizeBytes term.

Suggestion:

Suggested change
estimated = int64(len(encoded)) + empty.Contract.BundleSizeBytes
estimated = int64(len(encoded))

Comment thread scripts/codex-codegraph Outdated
Comment on lines +5 to +6
ROOT_JSON="${ROOT//\\/\\\\}"
ROOT_JSON="${ROOT_JSON//\"/\\\"}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON escaping here only handles backslashes and double quotes, but JSON strings also require escaping of control characters (e.g., newlines \n, tabs \t, carriage returns \r, etc.). While file paths rarely contain these characters, if $ROOT ever contains such characters (e.g., on unusual systems or misconfigured environments), the resulting JSON will be malformed.

Consider using a proper JSON encoding tool like jq for robustness:

ROOT_JSON=$(printf '%s' "$ROOT" | jq -Rs '.')

This produces a fully escaped JSON string (including the surrounding quotes), so you'd adjust the args line accordingly.

Comment thread internal/viewer/store.go Outdated
defer f.Close()

var summary SessionSummary
summary := SessionSummary{ControlPlane: "ocr-llm", TokenUsageAvailable: true}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default ControlPlane value "ocr-llm" and TokenUsageAvailable: true are duplicated in both peekSession (line 132) and LoadSession (line 285). If this default ever needs to change, both locations must be updated in sync, which is error-prone. Consider extracting these into a package-level constant or a constructor function, e.g.:

const defaultControlPlane = "ocr-llm"

func newDefaultSessionSummary() SessionSummary {
    return SessionSummary{ControlPlane: defaultControlPlane, TokenUsageAvailable: true}
}

This ensures consistency between the session list view (peekSession) and the session detail view (LoadSession).

Comment thread internal/viewer/templates/session.html Outdated
<tr>
<td>{{.Event}}</td>
<td>{{if .BundleID}}<code>{{.BundleID}}</code>{{else}}-{{end}}</td>
<td>{{.DurationMS}} ms</td>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor UX issue: When duration_ms is absent from the event record, DurationMS defaults to 0 and the template displays "0 ms", which may be misleading. Consider adding a conditional check similar to how BundleID and Error are handled, e.g.:

<td>{{if .DurationMS}}{{.DurationMS}} ms{{else}}-{{end}}</td>

This would make it clearer that the duration was not recorded rather than being zero.

Suggestion:

Suggested change
<td>{{.DurationMS}} ms</td>
<td>{{if .DurationMS}}{{.DurationMS}} ms{{else}}-{{end}}</td>

Comment on lines 77 to 79
if ref := p.FileReader.Ref; ref != "" {
cmdArgs = append(cmdArgs, "--end-of-options")
cmdArgs = append(cmdArgs, ref)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing --end-of-options from the grep command eliminates a defense-in-depth layer. While resolveGrepRef now validates and resolves the ref to a commit SHA before it reaches buildGrepArgs, keeping --end-of-options would provide an additional safety net against future regressions (e.g., if someone adds a new call site for buildGrepArgs that doesn't go through resolveGrepRef). Since the resolved value is always a hex commit SHA, re-adding --end-of-options here has no functional downside.

Suggestion:

Suggested change
if ref := p.FileReader.Ref; ref != "" {
cmdArgs = append(cmdArgs, "--end-of-options")
cmdArgs = append(cmdArgs, ref)
}
if ref := p.FileReader.Ref; ref != "" {
cmdArgs = append(cmdArgs, "--end-of-options", ref)
}

Comment thread internal/scan/classify.go Outdated
Comment on lines +47 to +50
// extFromPath is retained for native package compatibility.
func extFromPath(path string) string {
return scanExtension(path)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code: extFromPath has no callers within the scan package. The only previous caller was whyExcluded in agent.go, which was refactored in this same changeset to call ExcludeReason directly. Since extFromPath is unexported, it cannot be called from outside the package either. The comment says "retained for native package compatibility" but there is nothing to be compatible with. Consider removing it to avoid unnecessary API surface and confusion.

@lizhengfeng101

Copy link
Copy Markdown
Collaborator

Thanks a lot for picking this back up and carefully working through all the earlier bot feedback — this is a substantial, well-structured contribution, and the two-layer split (OCR as the deterministic data plane, the host agent as the reasoning layer) is exactly the right direction. Splitting prepare / context / validate-comments / report into pure, schema-bound CLI commands with no LLM dependency is a clean design, and the parity matrix + JSON schemas make the contract easy to reason about. Really nice work.

Two directions we'd love to see this evolve toward:

1. An agent-agnostic ("host-agnostic") tooling model.
The deterministic layer here isn't really Codex-specific — bundles, validation, and reports are identical regardless of who drives them. Any agent that can run a shell command (Claude Code, Cursor, Qoder, …) could reuse it as-is. Right now the naming and command space are bound to Codex (ocr codex ..., codex-review-bundle-v1, controlPlane values, session records tagged codex-owned). It would be great to generalize the surface — e.g. a neutral ocr agent ... namespace with Codex as the first adopter — so this becomes a reusable capability rather than a Codex-only path. The host-specific glue (plugin/skill manifests, or exposing these commands as MCP tools) can then stay a thin adapter on top of one shared core. Given we already ship an MCP server, exposing this layer over MCP could let MCP-capable hosts reuse it with zero per-host adaptation.

2. Zero impact on the existing ocr review / ocr scan core.
A hard requirement for us is that this new path must not change the behavior of the existing native review/scan agents. Most of the PR meets that bar (pure additions + behavior-preserving refactors), but there's one spot that does change existing behavior:

The refactor that extracts the inline file-filter logic from internal/agent/preview.go and internal/scan/agent.go into the shared internal/reviewfilter package also changes the --include semantics. The new reviewfilter.Filter.ExcludeReason adds:

if f.FileFilter != nil && f.FileFilter.HasInclude() {
    if !f.FileFilter.IsUserIncluded(path) {
        return model.ExcludeUserRule
    }
}

Neither the old review path nor the old scan path had this branch. Previously, a file not matching the --include list was not short-circuited — it fell through to the extension/path checks and was often still reviewed. The new code turns --include into a hard allowlist (anything outside the list is excluded). The new test case docs/readme.md + Include:["src/**"] → ExcludeUserRule confirms the change (this used to be ExcludeNone).

The new semantics are arguably more intuitive, but they do change what native ocr review / ocr scan return for existing users. Our preference would be to keep this refactor strictly behavior-preserving for the native paths — i.e. have reviewfilter faithfully reproduce the existing behavior — and if the --include = strict-allowlist change is desired, land it as a separate, clearly-documented change rather than folding it into this feature.

(Minor, same theme: internal/tool/code_search.go now pre-resolves a ref via rev-parse before grepping. Equivalent for resolvable refs, but the error path is now stricter — worth a quick check that it doesn't alter existing review/scan search behavior.)

Overall: love the design and the discipline here. If we can (1) generalize it beyond Codex and (2) guarantee the native kernel behavior is untouched, this becomes a really elegant and safe foundation. Thanks again for the great work!

@munsunouk

Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review — agreed on both points.

I’ll first address the hard compatibility requirement in this PR: the reviewfilter extraction should be behavior-preserving for the existing ocr review / ocr scan paths. I’ll remove the strict --include allowlist behavior from this PR and add/adjust tests so the native paths keep their previous semantics. If strict include semantics are still desirable, I’ll treat that as a separate, explicitly documented behavior-change PR.

On the host-agnostic direction, I agree as well. The deterministic bundle/context/validate-comments/report layer is not inherently Codex-specific. To avoid expanding this already-large PR too much, I’d prefer to keep this PR focused on the Codex-first adapter plus compatibility fixes, then follow up with a separate generalization PR for a neutral surface such as ocr agent ... and/or MCP exposure, with Codex as one thin host adapter on top.

I’ll push the behavior-preserving include fix first.

@munsunouk

Copy link
Copy Markdown
Contributor Author

Pushed the behavior-preserving include fix in e4276ad.

What changed:

  • Removed the strict --include allowlist behavior from the shared review filter.
  • Preserved the native diff-review include ordering: user exclude first, include match can pass, then extension/default-path checks.
  • Preserved the native full-scan ordering separately: user exclude, extension check, include match, then default-path checks.
  • Added/updated tests to lock both native paths so non-matching include patterns no longer exclude otherwise-reviewable files.

Verification:

  • GOTOOLCHAIN=auto go test ./internal/reviewfilter ./internal/agent ./internal/scan
  • GOTOOLCHAIN=auto go test ./...
  • pre-push hook passed.

@munsunouk munsunouk force-pushed the codex/fix-codex-owned-review branch from 24c91bb to a24764d Compare July 2, 2026 12:43
@lizhengfeng101

Copy link
Copy Markdown
Collaborator

Thanks for the quick turnaround — I verified e4276ad, and the include fix is exactly right. Splitting the two native orderings into ExcludeReason (review: include short-circuits before ext) and ExcludeScanReason (scan: ext before include), and pointing scan/classify.go at the scan variant, faithfully preserves both native paths while still removing the duplication. The restored test expectations confirm it. And thanks for the extra rev-parse symbolic-ref test coverage too. 👍

Fully agree on keeping this PR focused and landing the host-agnostic generalization separately — that's the right scope call.

One thing I'd like to raise before that follow-up, specifically about the CLI surface we commit to here:

I'd suggest not shipping ocr codex as a public top-level command. The prepare / context / validate-comments / report layer is a deterministic data plane that's identical for every host — which host drives it (Codex, Claude Code, Cursor, …) belongs to the adapter layer (plugin/skill/MCP), not to the CLI command tree. Naming a top-level command after one host effectively promotes "who drives it" into a command dimension, and it means we'd be publishing an interface we already know we'll want to rename.

The concern is both debt and newcomer UX:

  • If we later add ocr agent, we either keep both (top-level becomes review / scan / codex / agent / ..., and a newcomer immediately wonders which of codex/agent to use and how they differ), or we deprecate codex and migrate docs/plugins/user scripts. Neither is great.
  • What actually confuses newcomers isn't an extra advanced command — it's two overlapping commands doing the same thing.

Since I don't think this needs to expand the PR, two low-cost options:

  1. Land the neutral name now (e.g. ocr agent ...) with Codex as the first adapter on top of it — same code, just a host-agnostic surface from day one; or
  2. Keep these commands experimental / hidden in this PR (not listed in top-level -h), so we don't commit to a public contract until the neutral design settles.

Either way, the guiding principle would be: host differences (codex/claude/cursor) live entirely in the adapter layer (plugin/skill/MCP), and the CLI top level stays limited to capability commands like review / scan. That keeps the surface clean for newcomers and avoids a future deprecation cycle.

Not a blocker for the compatibility work — just want to settle the public surface before ocr codex becomes something people depend on. Curious what you think.

@munsunouk

Copy link
Copy Markdown
Contributor Author

Thanks, that makes sense to me.

I agree that prepare, context, validate-comments, and report are deterministic data-plane primitives, not Codex-specific capabilities. Making Codex a top-level public command would bake the first adapter into the CLI shape, which gets awkward once Claude Code, Cursor, MCP, or other adapters use the same layer.

I’ll update this PR so ocr agent ... is the canonical public surface, and I’ll update the user-facing Codex plugin/skill instructions to use that path. I won’t keep ocr codex as a parallel public command, since that would create a confusing “which one should I use?” split.

To keep this PR scoped, I’ll limit the change to the CLI, docs, plugin, and skill instructions. I won’t rename broad internal package/type/schema identifiers here unless they are user-facing or required for tests. A deeper protocol/internal rename can be handled separately if we decide it is worth doing.

@munsunouk

munsunouk commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Updated this PR to make ocr agent ... the only CLI surface for these deterministic primitives.

I removed the ocr codex dispatch entirely, including the hidden compatibility path, and updated the user-facing plugin/skill/docs instructions to use ocr agent.

I also made the user-facing observability path agent-neutral: --session-id now records controlPlane: "agent", model: "host-agent", reviewMode: "agent", and agent_event; the viewer/report labels now say Agent rather than Codex. In db947d1, I also removed the remaining viewer read-side compatibility for legacy codex_event / codex-owned records, so new and supported viewer history is agent-only as well.

I intentionally kept broader internal/schema names unchanged to avoid expanding this PR beyond the public CLI/docs/plugin surface.

Validation: GOTOOLCHAIN=auto go test ./... and the pre-push hook pass.

munsunouk added 3 commits July 2, 2026 16:35
Remove sync.Once caching in ContextService so cancelled contexts do not
poison later reads, validate bundle IDs when loading plain bundles, omit
content hashes for deleted files, and drop redundant chmod in private
output writes. Add cmd and unit tests for path escape, bundle ID mismatch,
and cancelled-context recovery.
Document --preview, --output, scan tuning, and manifest --bundle-index usage
in CODEX_USAGE and host-agent SKILL files. Align CODEX_SKILL_FEASIBILITY with
ocr agent commands and expand bundle schema docs for validation codes, manifest
identity hashing, and partial semantics.
@munsunouk munsunouk force-pushed the codex/fix-codex-owned-review branch 2 times, most recently from 3ab8e10 to e0d607a Compare July 3, 2026 12:56
@munsunouk munsunouk force-pushed the codex/fix-codex-owned-review branch 7 times, most recently from 95d0778 to f8d30a5 Compare July 3, 2026 19:07
@munsunouk munsunouk force-pushed the codex/fix-codex-owned-review branch 3 times, most recently from 00dd088 to 7c2bf4a Compare July 4, 2026 07:53
@munsunouk

munsunouk commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

Thanks again for the detailed feedback here — especially the push toward a host-agnostic surface and the requirement to keep the existing ocr review / ocr scan behavior unchanged.

I reworked the landing plan to make the review scope smaller and cleaner. Instead of continuing this as one large PR, I’m splitting it into a stacked series and opening the first PR first:

  1. host-agent review bundle runtime
  2. host-agent surface / alias / docs
  3. Cursor adapter and CI bridge

I’m starting with PR1 only so the core runtime/protocol/session/reporting layer can be reviewed independently before I layer the surface rename and the Cursor adapter and CI bridge on top.

New PR1: #301

I’ll use that as the canonical review path going forward, and then follow with PR2/PR3 after PR1 direction is confirmed. This PR can then be closed as superseded once the split path is in place.

Thanks again — the discussion on this PR directly shaped the split.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants