Skip to content

feat: add host-agent review bundle runtime#301

Closed
munsunouk wants to merge 2 commits into
alibaba:mainfrom
munsunouk:codex/pr1-host-agent-review-bundle-runtime
Closed

feat: add host-agent review bundle runtime#301
munsunouk wants to merge 2 commits into
alibaba:mainfrom
munsunouk:codex/pr1-host-agent-review-bundle-runtime

Conversation

@munsunouk

@munsunouk munsunouk commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR extracts and lands the host-agent review bundle runtime as a standalone, reviewable core.

It keeps OCR as the deterministic data plane for host-agent-led reviews:

  • ocr agent prepare builds immutable review bundles/manifests
  • ocr agent context exposes target-aware read/find/diff/search helpers
  • ocr agent validate-comments validates agent findings against bundle evidence
  • ocr agent report renders validated findings
  • session / viewer support records and renders host-agent workflow events

The goal of this PR is to land the adapter-agnostic runtime first, without mixing in the larger surface/docs migration or the Cursor adapter and CI bridge.

What Is In Scope

  • review bundle protocol loading, validation, partitioning, reporting, and scan support
  • target-aware context helpers
  • session/event recording for host-agent workflows
  • viewer/runtime support needed to inspect those records
  • the minimum ocr agent ... CLI/runtime plumbing required to expose and test the flow
  • generic hardening needed for schema/session/prepare/report/context correctness

What Is Explicitly Out Of Scope

These will follow in separate PRs:

  • broad surface/docs/README/plugin migration work
  • Cursor-specific plugin / skill / marketplace integration
  • Cursor CI bridge and installer workflow
  • host-specific adapter glue beyond the shared runtime

Design Notes

This follows the direction discussed in #276:

  • keep the deterministic layer host-agnostic
  • avoid committing to a host-specific top-level CLI shape
  • preserve the existing native ocr review / ocr scan behavior
  • keep host-specific integration as a thin layer on top of one shared core

Validation

  • GOTOOLCHAIN=auto go test ./...
  • GOTOOLCHAIN=auto go build -o dist/opencodereview ./cmd/opencodereview

Follow-ups

Planned follow-up PRs after this lands or direction is confirmed:

  1. host-agent surface / alias / docs cleanup
  2. Cursor adapter and CI bridge

@CLAassistant

CLAassistant commented Jul 4, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ munsunouk
❌ Test


Test seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment on lines +87 to +104
if options.sessionID != "" {
if err := recordAgentEvent(
repoDir,
options.sessionID,
bundle.BundleID,
"validate",
session.AgentEvent{
Files: comments.Summary.FilesReviewed,
Findings: len(comments.Comments),
Warnings: len(result.Warnings),
DurationMS: time.Since(started).Milliseconds(),
ValidationValid: &result.Valid,
},
false,
); err != nil {
fmt.Fprintf(os.Stderr, "Warning: agent session not recorded: %v\n", err)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor consistency issue: The helper function recordAgentEventBestEffort in agent_cmd.go already encapsulates this exact pattern (call recordAgentEvent, print warning to stderr on error). Using it here would reduce duplication and ensure consistent behavior if the error-handling strategy changes in the future.

recordAgentEventBestEffort(
    repoDir,
    options.sessionID,
    bundle.BundleID,
    "validate",
    session.AgentEvent{...},
    false,
)

Comment on lines +168 to +171
manifest, manifestErr := reviewbundle.LoadScanManifest(bytes.NewReader(content))
if manifestErr != nil {
return nil, nil, bundleErr
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the input file fails to parse as both a Bundle and a ScanManifest, only bundleErr is returned here, discarding manifestErr. This makes debugging difficult because the user won't know why the manifest interpretation also failed. Consider wrapping both errors to provide full diagnostic context.

Similarly, on line 177, when the bundleID is not found in a successfully parsed manifest, the error message doesn't mention that the file was initially tried as a standalone bundle (and failed), which could confuse users who expected it to be a bundle.

Suggestion:

Suggested change
manifest, manifestErr := reviewbundle.LoadScanManifest(bytes.NewReader(content))
if manifestErr != nil {
return nil, nil, bundleErr
}
manifest, manifestErr := reviewbundle.LoadScanManifest(bytes.NewReader(content))
if manifestErr != nil {
return nil, nil, fmt.Errorf("input is neither a valid bundle (%v) nor a valid scan manifest (%v)", bundleErr, manifestErr)
}

Comment on lines +112 to +123
fileLevel := false
if rawValue, ok := comment["file_level_comment"]; ok {
_ = json.Unmarshal(rawValue, &fileLevel)
}
if fileLevel {
var startLine, endLine int
if rawValue, ok := comment["start_line"]; ok {
_ = json.Unmarshal(rawValue, &startLine)
}
if rawValue, ok := comment["end_line"]; ok {
_ = json.Unmarshal(rawValue, &endLine)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Errors from json.Unmarshal are silently discarded with _ =. If file_level_comment contains a non-boolean value (e.g., a string "true" or a number), the unmarshal will fail and fileLevel will remain false, causing the file-level constraint check to be skipped entirely. Similarly, if start_line/end_line contain non-integer values, they'll silently default to 0, potentially passing the startLine != 0 || endLine != 0 check incorrectly.

Since LoadComments aims to be a strict decoder (it uses decodeStrict and validates shape before struct decoding), these silent failures undermine the validation contract. At minimum, unmarshal errors should be propagated. Alternatively, since decodeStrict with DisallowUnknownFields will catch type mismatches during the subsequent struct decode, you could add a comment explaining that these unmarshal errors are intentionally ignored because the strict decode below will reject malformed types. However, even in that case, there's a subtle gap: if file_level_comment fails to unmarshal as bool, the file-level line-number constraint is never enforced, and the strict decode might still accept the document if the field happens to be a valid JSON type that Go coerces.

Suggestion:

Suggested change
fileLevel := false
if rawValue, ok := comment["file_level_comment"]; ok {
_ = json.Unmarshal(rawValue, &fileLevel)
}
if fileLevel {
var startLine, endLine int
if rawValue, ok := comment["start_line"]; ok {
_ = json.Unmarshal(rawValue, &startLine)
}
if rawValue, ok := comment["end_line"]; ok {
_ = json.Unmarshal(rawValue, &endLine)
}
fileLevel := false
if rawValue, ok := comment["file_level_comment"]; ok {
if err := json.Unmarshal(rawValue, &fileLevel); err != nil {
return fmt.Errorf("invalid comments schema: comments[%d].file_level_comment must be a boolean", index)
}
}
if fileLevel {
var startLine, endLine int
if rawValue, ok := comment["start_line"]; ok {
if err := json.Unmarshal(rawValue, &startLine); err != nil {
return fmt.Errorf("invalid comments schema: comments[%d].start_line must be an integer", index)
}
}
if rawValue, ok := comment["end_line"]; ok {
if err := json.Unmarshal(rawValue, &endLine); err != nil {
return fmt.Errorf("invalid comments schema: comments[%d].end_line must be an integer", index)
}
}

Comment thread internal/reviewbundle/context.go Outdated
Comment on lines +233 to +235
if startLine-1 > totalLines || (totalLines > 0 && startLine-1 >= totalLines) {
return "", fmt.Errorf("file %q has only %d lines, requested range starting at %d", path, totalLines, startLine)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: When the file content is empty (totalLines == 0), this boundary check allows execution to proceed (both conditions evaluate to false), resulting in output with LINE_RANGE: 1-0 and zero content lines. This is misleading to the consumer.

Additionally, the compound condition is redundant — the first disjunct startLine-1 > totalLines is strictly weaker than the second startLine-1 >= totalLines (when totalLines > 0), making the logic harder to reason about.

Consider simplifying and handling the empty-file case explicitly:

Suggestion:

Suggested change
if startLine-1 > totalLines || (totalLines > 0 && startLine-1 >= totalLines) {
return "", fmt.Errorf("file %q has only %d lines, requested range starting at %d", path, totalLines, startLine)
}
if totalLines == 0 {
return "", fmt.Errorf("file %q is empty", path)
}
if startLine > totalLines {
return "", fmt.Errorf("file %q has only %d lines, requested range starting at %d", path, totalLines, startLine)
}

Comment on lines +384 to +387
// ponytail: minimal git-pathspec compatibility; expand if scan search needs full pathspec semantics.
if strings.HasPrefix(pattern, "*.") && strings.HasSuffix(path, strings.TrimPrefix(pattern, "*")) {
return true
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavioral inconsistency: This function does not handle git-pathspec negation/exclude patterns such as :(exclude)*_test.go or :!vendor/. The non-scan search path uses git grep which natively supports these patterns. In scan mode, exclude patterns will silently fail to match any file (they won't match via filepath.Match, prefix check, or the *. suffix check), meaning all files will be included rather than excluded. This produces incorrect search results without any warning.

Consider either adding basic exclude-pattern support or returning an error/warning when an unsupported pattern syntax is detected.

Comment on lines +68 to +73
if int64(len(data)) > MaxProtocolDocumentBytes {
return nil, fmt.Errorf(
"document exceeds %d byte limit",
MaxProtocolDocumentBytes,
)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent error type: readLimited returns a generic fmt.Errorf when the document exceeds the byte limit, while validateProtocolDocumentSize and protocolDocumentWriter.Write return a structured *ProtocolError with a machine-readable Code field. Since readLimited is used by LoadBundle, LoadComments, and LoadScanManifest (in load.go), callers that use errors.As(err, &ProtocolError{}) to detect size violations will fail to match when the oversized document is read from an io.Reader but will succeed when validating an already-loaded byte slice or streaming write.

Consider returning a *ProtocolError here for consistency, e.g.:

return nil, &ProtocolError{
    Code: "document_too_large",
    Message: fmt.Sprintf(
        "document exceeds %d byte protocol limit (%d bytes); reduce scope or bundle count",
        MaxProtocolDocumentBytes,
        len(data),
    ),
}

Also note the error code should likely be document_too_large rather than manifest_too_large, since this function is used for bundles and comments as well.

Comment thread internal/reviewbundle/partition.go Outdated
Comment on lines +112 to +119
if len(files) == 1 {
path = files[0].Path
if files[0].Reviewable {
manifest.Summary.ReviewableFiles--
manifest.Summary.ExcludedFiles++
manifest.Summary.Insertions -= files[0].Insertions
manifest.Summary.Deletions -= files[0].Deletions
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Summary counters not adjusted for non-reviewable skipped files.

When a file is skipped due to bundle_too_large, the summary adjustment (decrementing ReviewableFiles/Insertions/Deletions and incrementing ExcludedFiles) only happens when files[0].Reviewable == true. However, a non-reviewable file can also end up being skipped here (e.g., if it has a very large patch). In that case, the file is added to SkippedFiles but the manifest's Summary remains unchanged — meaning TotalFiles still counts it, yet it won't appear in any bundle. This creates an inconsistency between the manifest summary and the actual contents of the bundles + skipped list.

Consider adjusting the summary for non-reviewable files as well, or at minimum decrementing TotalFiles / ExcludedFiles appropriately so the aggregate stays consistent.

Comment on lines +151 to +155
_, encoded, err := buildDiffPartition(full, nil, maxBundleSize)
estimated := int64(4096)
if err == nil {
estimated = int64(len(encoded))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential issue: Silent fallback on empty-bundle estimation error.

If buildDiffPartition fails when called with nil files (which shouldn't normally happen, but could occur due to marshal errors or other unexpected conditions), the error is silently discarded and a hardcoded 4096-byte baseline is used. If the actual empty bundle serialization is significantly different from 4096 bytes, all subsequent packing decisions will be based on an incorrect baseline, potentially causing either excessive partition splits or repeated overflow-and-re-split cycles.

Consider logging the error or using a more defensive approach (e.g., returning the error rather than silently falling back).

Comment on lines +104 to +108
bundle, encoded, err := buildDiffPartition(base, files, maxBundleSize)
if err == nil && int64(len(encoded)) <= maxBundleSize {
manifest.Bundles = append(manifest.Bundles, *bundle)
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance concern: Each call to buildDiffPartition performs full JSON marshaling via marshalWithStableSize, which itself iterates up to 8 times.

During the initial packing loop, estimateAddition uses lightweight JSON marshaling of individual files/rules. But when a flush occurs (flushDiffPartitionappendDiffPartitionbuildDiffPartition), the entire partition is serialized with marshalWithStableSize (up to 8 marshal iterations). If the estimate is inaccurate and many flushes trigger recursive bisection, each bisection step also calls buildDiffPartition with its own marshalWithStableSize. For large diffs with many files, this could result in significant redundant serialization work.

Consider caching or short-circuiting the size check (e.g., skip marshalWithStableSize during bisection and use plain json.Marshal since the stable-size invariant is only needed for the final accepted bundle).

Comment on lines +31 to +39
warnings = append(warnings, ProtocolNotice{
Code: "oversized_diff",
Message: fmt.Sprintf(
"%s (~%d tokens) exceeds 80%% of max review tokens (%d)",
path,
tokens,
maxTokens,
),
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ProtocolNotice struct has a dedicated Path field (see bundle.go line 131), but it's not being populated here. The path is only embedded in the human-readable Message string, which makes it harder for programmatic consumers to extract the file path from the warning. Consider populating the Path field as well.

Additionally, when both diff.NewPath and diff.OldPath are empty strings, the warning message will display an empty path, making it difficult to identify which file was skipped. Consider providing a fallback like "<unknown>".

Suggestion:

Suggested change
warnings = append(warnings, ProtocolNotice{
Code: "oversized_diff",
Message: fmt.Sprintf(
"%s (~%d tokens) exceeds 80%% of max review tokens (%d)",
path,
tokens,
maxTokens,
),
})
if path == "" {
path = "<unknown>"
}
warnings = append(warnings, ProtocolNotice{
Code: "oversized_diff",
Path: path,
Message: fmt.Sprintf(
"%s (~%d tokens) exceeds 80%% of max review tokens (%d)",
path,
tokens,
maxTokens,
),
})

Comment on lines +155 to +158
contentSHA256 := hashFields([]byte(change.NewFileContent))
if change.IsDeleted {
contentSHA256 = ""
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor inefficiency: hashFields computes a SHA256 digest of change.NewFileContent even when the file is deleted, only to immediately discard the result. For deleted files, NewFileContent is typically empty, so the cost is trivial, but the code reads as if the hash is meaningful before being overridden. Consider guarding the computation to make intent clearer and avoid unnecessary work.

Suggestion:

Suggested change
contentSHA256 := hashFields([]byte(change.NewFileContent))
if change.IsDeleted {
contentSHA256 = ""
}
var contentSHA256 string
if !change.IsDeleted {
contentSHA256 = hashFields([]byte(change.NewFileContent))
}

Comment on lines +281 to +287
func writeMarkdownBody(output *bytes.Buffer, value string) {
if strings.Contains(value, "```") {
writeFencedCode(output, value)
return
}
fmt.Fprintln(output, escapeMarkdownBody(value))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: When comment.Content or comment.Recommendation contains triple backticks (e.g., a code example within a prose explanation, which is very common in code reviews), the entire value is wrapped in a fenced code block. This causes all surrounding prose text to be rendered as literal code, making the review comment unreadable.

For example, if Content is:

This function has a race condition. Consider using a mutex:
```go
mu.Lock()
defer mu.Unlock()

This ensures thread safety.

The current logic would wrap everything (including "This function has..." and "This ensures...") inside a single code fence.

Consider instead splitting the content on code fences and rendering each segment appropriately (prose vs. code), or simply always rendering as escaped markdown body without the special-casing for triple backticks.

Comment on lines +169 to +171
for _, notice := range validation.Errors {
fmt.Fprintf(output, "\n- `%s`: %s\n", notice.Code, notice.Message)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistency: Validation error and warning messages (notice.Message) are interpolated directly into markdown output without escaping, while writeMarkdownNotices (line 189) correctly applies escapeMarkdownBody. If a validation message contains markdown-sensitive characters (e.g., lines starting with #), it could produce unintended formatting in the rendered report. Apply escapeMarkdownBody here for consistency.

Suggestion:

Suggested change
for _, notice := range validation.Errors {
fmt.Fprintf(output, "\n- `%s`: %s\n", notice.Code, notice.Message)
}
for _, notice := range validation.Errors {
fmt.Fprintf(output, "\n- `%s`: %s\n", notice.Code, escapeMarkdownBody(notice.Message))
}

Comment on lines +18 to +20
if err := json.Unmarshal(document, &instance); err != nil {
return fmt.Errorf("invalid %s schema: %w", label, err)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message "invalid %s schema" is misleading here. When json.Unmarshal(document, &instance) fails, it means the input document is not valid JSON — not that the schema is invalid. This makes debugging harder because the caller sees "invalid bundle schema: ..." when the actual problem is malformed input.

Consider distinguishing between document parse errors and schema validation errors, e.g.:

  • Line 19: "invalid %s document: %w" (document is not valid JSON)
  • Line 26: "%s document failed schema validation: %w" (document doesn't conform to schema)

Suggestion:

Suggested change
if err := json.Unmarshal(document, &instance); err != nil {
return fmt.Errorf("invalid %s schema: %w", label, err)
}
if err := json.Unmarshal(document, &instance); err != nil {
return fmt.Errorf("invalid %s document: %w", label, err)
}

Comment on lines +33 to +34
resolvedSchemaMu.Lock()
defer resolvedSchemaMu.Unlock()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutex is held during the potentially expensive json.Unmarshal(schemaBytes, &schema) and schema.Resolve(nil) calls below. This blocks all other goroutines from accessing the cache, including those that could get an immediate cache hit.

While in practice there are only 3 fixed embedded schemas (limiting real-world impact), a common improvement is to release the lock before doing the expensive work, then re-acquire it to store the result. Alternatively, use sync.Once per schema key or a single-check-then-compute pattern:

resolvedSchemaMu.Lock()
if resolved, ok := resolvedSchemas[cacheKey]; ok {
    resolvedSchemaMu.Unlock()
    return resolved, nil
}
resolvedSchemaMu.Unlock()
// ... do expensive unmarshal/resolve without holding the lock ...
resolvedSchemaMu.Lock()
defer resolvedSchemaMu.Unlock()
if existing, ok := resolvedSchemas[cacheKey]; ok {
    return existing, nil // another goroutine beat us
}
resolvedSchemas[cacheKey] = resolved
return resolved, nil

Comment on lines +182 to +191
if options.MaxTokenBudget > 0 {
sort.SliceStable(items, func(i, j int) bool {
left := scan.EstimateItemTokens(items[i], true)
right := scan.EstimateItemTokens(items[j], true)
if left != right {
return left < right
}
return items[i].Path < items[j].Path
})
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance: EstimateItemTokens calls llm.CountTokens(item.Content) which processes the full file content string each time. Inside a sort comparator, this is invoked O(n log n) times, meaning each file's token count is redundantly recomputed many times during sorting. For large scans with many files, this can be a significant performance bottleneck.

Pre-compute the token estimates once before sorting and use the cached values in the comparator.

Suggestion:

Suggested change
if options.MaxTokenBudget > 0 {
sort.SliceStable(items, func(i, j int) bool {
left := scan.EstimateItemTokens(items[i], true)
right := scan.EstimateItemTokens(items[j], true)
if left != right {
return left < right
}
return items[i].Path < items[j].Path
})
}
if options.MaxTokenBudget > 0 {
// Pre-compute token estimates to avoid O(n log n) redundant CountTokens calls.
tokenCache := make(map[string]int64, len(items))
for _, item := range items {
tokenCache[item.Path] = scan.EstimateItemTokens(item, true)
}
sort.SliceStable(items, func(i, j int) bool {
left := tokenCache[items[i].Path]
right := tokenCache[items[j].Path]
if left != right {
return left < right
}
return items[i].Path < items[j].Path
})
}

Comment thread internal/reviewbundle/scan.go Outdated
Comment on lines +257 to +261
midpoint := len(items) / 2
if err := appendScanBundles(ctx, manifest, items[:midpoint], batchIndex, targetHash, resolver, maxBundleSize); err != nil {
return err
}
return appendScanBundles(ctx, manifest, items[midpoint:], batchIndex, targetHash, resolver, maxBundleSize)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness / Partial state mutation on error: When appendScanBundles recursively splits a batch due to bundle_too_large, it mutates manifest.Bundles, manifest.SkippedFiles, and manifest.Summary counters incrementally. If the first recursive half succeeds (appending bundles and updating summary) but the second half fails with a non-recoverable error, the manifest is left in an inconsistent state — some bundles are appended and summary counters partially updated, yet the error propagates up to PrepareScan which returns nil, nil, err.

While the caller discards the manifest on error, this pattern is fragile. If any future caller inspects the manifest after an error return, or if error recovery/retry logic is added, the partial mutations will cause bugs. Consider either: (1) building bundles into a local slice and only appending to the manifest on full success, or (2) documenting explicitly that the manifest is invalid after any error return from this function.

Comment on lines +242 to +256
var protocolError *ProtocolError
if !errors.As(err, &protocolError) || protocolError.Code != "bundle_too_large" || len(items) <= 1 {
if errors.As(err, &protocolError) && protocolError.Code == "bundle_too_large" && len(items) == 1 {
manifest.SkippedFiles = append(manifest.SkippedFiles, ScanSkippedFile{
Path: items[0].Path,
Reason: "bundle_too_large",
})
manifest.Summary.ReviewableFiles--
manifest.Summary.ExcludedFiles++
manifest.Summary.Insertions -= int64(items[0].LineCount)
manifest.Partial = true
return nil
}
return err
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: The condition logic here is difficult to follow and verify. The outer if checks !errors.As(...) || protocolError.Code != "bundle_too_large" || len(items) <= 1, and then the inner if re-checks errors.As(...) && protocolError.Code == "bundle_too_large" && len(items) == 1. This double-check pattern is confusing because:

  1. When the outer condition is true AND the inner condition is also true (single oversized item), the file is skipped gracefully.
  2. When the outer condition is true AND the inner condition is false (non-bundle_too_large error, or bundle_too_large with len(items) == 0), the error is returned.
  3. When the outer condition is false (bundle_too_large with len(items) > 1), execution falls through to the split logic.

Consider restructuring for clarity, e.g., using explicit early returns or a switch statement to make each case self-documenting.

Comment thread internal/reviewbundle/target_io.go Outdated
if _, err := hasher.Write(length[:]); err != nil {
return err
}
written, err := io.Copy(hasher, reader)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential issue: Unbounded read beyond expected size. io.Copy will read the entire reader to EOF, which means if the file grows between the os.Stat call (which determined size) and during reading, all extra bytes will be consumed into the hash before the mismatch is detected at line 67. While the error prevents returning an incorrect hash, this could cause unnecessary I/O and memory usage for large files that grow significantly.

Consider using io.LimitReader to cap the read at exactly size bytes, then verify the actual bytes read match:

limited := io.LimitReader(reader, size)
written, err := io.Copy(hasher, limited)

This ensures no more than size bytes are ever read from the file, making the function more defensive against concurrent modifications.

Comment thread internal/session/file_lock_windows.go Outdated
Comment on lines +8 to +9
func lockSessionFile(*os.File) error { return nil }
func unlockSessionFile(*os.File) error { return nil }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The no-op implementation means there is no cross-process file locking on Windows. While the sync.Mutex in AgentRecorder protects concurrent writes within a single process, multiple processes writing to the same JSONL session file (e.g., parallel agent runs sharing a session) can corrupt the file with interleaved writes.

Windows provides LockFileEx / UnlockFile via the syscall or golang.org/x/sys/windows package that can provide equivalent exclusive file locking. Consider implementing actual file locking instead of silently returning nil, or at minimum logging a warning when locking is skipped so users are aware of the concurrency risk.

Comment thread internal/scan/provider.go Outdated
Comment on lines +332 to +334
if _, err := io.Copy(&body, file); err != nil {
return false, nil, 0, err
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOCTOU race: The caller checks info.Size() > p.maxFileSizeBytes via Lstat before calling this function, but the file could grow (or be replaced with a larger file) between the stat and the read. Since readRegularFile reads the entire file into memory without any size cap, this defeats the memory-protection purpose of maxFileSizeBytes. Consider adding an io.LimitReader inside readRegularFile (e.g., capped at maxFileSizeBytes + 1) and returning an error if the limit is exceeded, similar to how readLimited works in internal/reviewbundle/read_limit.go.

Comment thread internal/viewer/server.go Outdated
return
}
if err := ValidateSessionID(sid); err != nil {
http.Error(w, "invalid session path", http.StatusBadRequest)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The error message says "invalid session path" but this is validating a session ID, not a path. Consider using "invalid session ID" for clarity and consistency with the ValidateSessionID function name.

Suggestion:

Suggested change
http.Error(w, "invalid session path", http.StatusBadRequest)
http.Error(w, "invalid session ID", http.StatusBadRequest)

Comment on lines +74 to +77
contentCache := make(map[string][]byte)
for index := range comments.Comments {
validateOneComment(ctx, &result, bundle, files, contentCache, comments.Comments[index], index, repoDir, runner)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: When readTargetFile fails (e.g., file not found or git error), the function returns without caching anything. However, when the read succeeds but the hash check at line 264 fails (stale content), the stale content remains cached. On subsequent comments referencing the same path, the stale content is reused from cache and the hash mismatch is re-detected — this is correct behavior.

However, there's a subtle correctness issue: if readTargetFile fails for a transient reason (e.g., temporary I/O error), the next comment on the same file will retry the read. But if the first read succeeds and caches content, then a different process modifies the file before the second comment is validated, the cached (now-stale) content is used instead of re-reading. Since validation is supposed to check against the current target state, this caching strategy can mask concurrent modifications within a single validation run.

Consider either not caching when staleness is detected, or documenting that the cache assumes no concurrent file modifications during validation.

Comment on lines +75 to +77
for index := range comments.Comments {
validateOneComment(ctx, &result, bundle, files, contentCache, comments.Comments[index], index, repoDir, runner)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing context cancellation check: The validation loop iterates over all comments without checking ctx.Done(). For bundles with many comments (especially when each comment triggers a git show call via readTargetFile), a cancelled context won't stop the loop promptly. Consider adding a context check inside the loop.

Suggestion:

Suggested change
for index := range comments.Comments {
validateOneComment(ctx, &result, bundle, files, contentCache, comments.Comments[index], index, repoDir, runner)
}
for index := range comments.Comments {
select {
case <-ctx.Done():
addValidationError(&result, "context_cancelled", "", nil, "validation cancelled")
result.Valid = false
return result
default:
}
validateOneComment(ctx, &result, bundle, files, contentCache, comments.Comments[index], index, repoDir, runner)
}

Comment thread internal/reviewbundle/validate.go Outdated
Comment on lines +305 to +306
normalizedContent := "\n" + strings.TrimSuffix(string(content), "\n") + "\n"
normalizedSnippet := "\n" + strings.Trim(snippet, "\n") + "\n"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential inconsistency: countStandaloneSnippet normalizes the content with strings.TrimSuffix(string(content), "\n") (removes only ONE trailing newline) but normalizes the snippet with strings.Trim(snippet, "\n") (removes ALL leading and trailing newlines). This asymmetry means that if the file content ends with multiple newlines (e.g., "code\n\n") while the snippet also has trailing newlines, the boundary matching could produce unexpected results. Consider using consistent normalization for both content and snippet, e.g., strings.TrimRight(string(content), "\n") for the content side as well.

Suggestion:

Suggested change
normalizedContent := "\n" + strings.TrimSuffix(string(content), "\n") + "\n"
normalizedSnippet := "\n" + strings.Trim(snippet, "\n") + "\n"
normalizedContent := "\n" + strings.TrimRight(string(content), "\n") + "\n"
normalizedSnippet := "\n" + strings.Trim(snippet, "\n") + "\n"

Comment thread internal/session/agent.go Outdated
Comment on lines +186 to +202
scanner := bufio.NewScanner(file)
for scanner.Scan() {
var record map[string]any
if err := json.Unmarshal(scanner.Bytes(), &record); err != nil {
continue
}
if recordType, _ := record["type"].(string); recordType != "session_start" {
continue
}
timestamp, _ := record["timestamp"].(string)
started, err := time.Parse(time.RFC3339, timestamp)
if err != nil {
return fallback
}
return started
}
return fallback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scanner buffer size inconsistency: The bufio.Scanner here (and in readAgentSessionBundleID, agentSessionHasEnd, and agentSessionFileHasEnd) uses the default max token size of 64KB. However, the viewer's peekSession and LoadSession in internal/viewer/store.go explicitly set a 10MB buffer (scanner.Buffer(buf, 10*1024*1024)). If any JSONL record exceeds 64KB (e.g., an event with a large files_reviewed list), these scanners will silently fail — scanner.Scan() returns false and scanner.Err() may return nil, causing session resumption to miss the start record, lose bundle IDs, or incorrectly determine that no session_end exists.

Recommend setting an explicit larger buffer on all scanners in this file to match the viewer's approach.

Suggestion:

Suggested change
scanner := bufio.NewScanner(file)
for scanner.Scan() {
var record map[string]any
if err := json.Unmarshal(scanner.Bytes(), &record); err != nil {
continue
}
if recordType, _ := record["type"].(string); recordType != "session_start" {
continue
}
timestamp, _ := record["timestamp"].(string)
started, err := time.Parse(time.RFC3339, timestamp)
if err != nil {
return fallback
}
return started
}
return fallback
scanner := bufio.NewScanner(file)
buf := make([]byte, 0, 1024*1024)
scanner.Buffer(buf, 10*1024*1024)
for scanner.Scan() {
var record map[string]any
if err := json.Unmarshal(scanner.Bytes(), &record); err != nil {
continue
}
if recordType, _ := record["type"].(string); recordType != "session_start" {
continue
}
timestamp, _ := record["timestamp"].(string)
started, err := time.Parse(time.RFC3339, timestamp)
if err != nil {
return fallback
}
return started
}
return fallback

Comment thread internal/session/agent.go Outdated
Comment on lines +271 to +272
unlockSessionFile(file)
return file.Close()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent unlock error handling: In writeExclusiveStart, the return value of unlockSessionFile(file) on line 271 is silently discarded, while in the write() method (lines 314-320), unlock errors are properly checked and returned. Although on Unix flock(LOCK_UN) rarely fails, this inconsistency could mask issues on other platforms or in edge cases. Consider handling the unlock error consistently.

Suggestion:

Suggested change
unlockSessionFile(file)
return file.Close()
if unlockErr := unlockSessionFile(file); unlockErr != nil {
_ = file.Close()
return fmt.Errorf("unlock agent session: %w", unlockErr)
}
return file.Close()

Comment thread internal/session/agent.go Outdated
Partial bool `json:"partial,omitempty"`
DurationMS int64 `json:"duration_ms,omitempty"`
ValidationValid *bool `json:"validation_valid,omitempty"`
FilesReviewed []string `json:"files_reviewed,omitempty"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential issue with FilesReviewed being dropped when empty: The AgentEvent.FilesReviewed field has omitempty, so when it is nil or empty, it will be omitted during the marshal→unmarshal round-trip in agentEventRecord. This means the files_reviewed key won't appear in the JSONL record. While this may be intentional for most events, note that in Finalize, record["files_reviewed"] is explicitly set afterward (line 137), which correctly overrides this. However, if Record() is ever called with a non-empty FilesReviewed, the omitempty behavior would still drop it if the slice happens to be empty. Verify this is the intended behavior.

Comment thread internal/viewer/templates/session.html Outdated
Comment on lines +111 to +114
<td>{{if .Files}}{{.Files}}{{else}}-{{end}}</td>
<td>{{if .Findings}}{{.Findings}}{{else}}-{{end}}</td>
<td>{{if .Warnings}}{{.Warnings}}{{else}}-{{end}}</td>
<td>{{if .ContextCalls}}{{.ContextCalls}}{{else}}-{{end}}</td>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: In Go's html/template, {{if .Files}} treats integer zero as falsy. When an agent event legitimately has Files: 0, Findings: 0, Warnings: 0, ContextCalls: 0, or DurationMS: 0, the template will display - instead of 0. This loses meaningful information — for example, a "report" event with 0 findings is different from one where findings were not tracked.

Consider using pointer types (e.g., *int) for these fields in AgentEvent so that nil means "not provided" and 0 means "explicitly zero". Alternatively, you could add separate boolean flags like HasFiles, HasFindings, etc., or compare against a sentinel value.

Comment thread internal/viewer/store.go Outdated
Comment on lines 178 to 183
var rec map[string]any
if err := json.Unmarshal(line, &rec); err == nil {
if typ, _ := rec["type"].(string); typ == "session_end" {
lastSessionEnd = append([]byte(nil), line...)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance: Every line is JSON-unmarshalled twice in peekSession.

When summary.Timestamp.IsZero() is true (i.e., for the first few lines until a timestamp is found), the line is parsed once at line 149 and then parsed again at line 179. After the timestamp is set, every remaining line is still parsed at line 179 just to check for session_end.

Since peekSession is called for every session during listing (ListSessions), this double-parsing adds unnecessary overhead — especially for large session files where only the last session_end record matters.

Consider restructuring the loop to avoid re-parsing. For example, you could track whether the current line was already parsed and reuse the result, or only attempt the session_end check inside an else branch / after the timestamp block using the already-parsed rec.

Comment thread internal/viewer/templates/session.html Outdated
<td>{{if .ContextCalls}}{{.ContextCalls}}{{else}}-{{end}}</td>
<td>{{if .Partial}}yes{{else}}-{{end}}</td>
<td>{{.ValidationLabel}}</td>
<td>{{if .DurationMS}}{{.DurationMS}} ms{{else}}-{{end}}</td>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same zero-value issue as above: {{if .DurationMS}} will display - when the duration is legitimately 0 ms (e.g., a very fast operation). Since DurationMS is an int64, consider using a pointer type *int64 or a separate boolean to distinguish "not recorded" from "zero duration".

Comment thread internal/tool/code_search.go Outdated
Comment on lines +37 to +39
if strings.Contains(s, "..") {
return "Error: file_patterns must not contain ..", nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strings.Contains(s, "..") check is overly broad and will produce false positives. Legitimate git pathspec patterns like file..go, my..package/, or glob patterns containing .. as a substring (but not as a path traversal component) would be incorrectly rejected.

A more precise approach would check for .. as an actual path component, e.g.:

if s == ".." || strings.HasPrefix(s, "../") || strings.HasSuffix(s, "/..") || strings.Contains(s, "/../") {
    return "Error: file_patterns must not contain ..", nil
}

Alternatively, you could use filepath.Clean and check if the cleaned path starts with ...

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

🔍 OpenCodeReview found 33 issue(s) in this PR.

  • ✅ 32 posted as inline comment(s)
  • 📝 1 posted as summary

⚠️ 1 warning(s) occurred during review.


📊 Posting Statistics:

  • ✅ Successfully posted: 32 comment(s)
  • ❌ Failed to post: 1 comment(s)

⚠️ Inline comments shown in summary


📄 internal/scan/provider.go (L339-L353)

⚠️ GitHub could not post this as an inline comment: Unprocessable Entity: "Line could not be resolved"

Dead code: isBinaryFile is no longer called anywhere after this refactor — its only call site was replaced by readRegularFile. Keeping it around creates a maintenance hazard (future developers might use the less efficient two-pass version by mistake). Consider removing it entirely.

@munsunouk munsunouk closed this Jul 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants