Skip to content

fix(review-filter): add submit_filter_result func for review filter stage#295

Open
ScarletCarpet wants to merge 1 commit into
alibaba:mainfrom
ScarletCarpet:fix/review-filter-fix
Open

fix(review-filter): add submit_filter_result func for review filter stage#295
ScarletCarpet wants to merge 1 commit into
alibaba:mainfrom
ScarletCarpet:fix/review-filter-fix

Conversation

@ScarletCarpet

@ScarletCarpet ScarletCarpet commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

fix(review-filter): add submit_filter_result func for review filter stage

At review filter stage, LLM(test GLM5.2) may return natural language content such as:
"""
Looking at each comment...

  • commit 1: ...
  • commit 2: ...

Summary:
json [...]
"""

add function calling for LLM will get more stable result for some model of providers.
and this tool only appears at review filter stage, so it does no harm for review quality.

Description

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Refactoring (no functional changes)
  • Documentation update
  • CI / Build / Tooling

How Has This Been Tested?

  • make test passes locally
  • Manual testing (describe below)

Checklist

  • My code follows the project's coding style (go fmt, go vet)
  • I have performed a self-review of my code
  • I have added tests that prove my fix is effective or my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the documentation accordingly (if applicable)
  • I have signed the CLA

Related Issues

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 OpenCodeReview found 3 issue(s) in this PR.

  • ✅ 3 posted as inline comment(s)
  • 📝 0 posted as summary

Comment thread internal/llmloop/compression.go Outdated
Comment thread internal/llmloop/compression.go Outdated
Comment thread internal/agent/agent.go Outdated
@ScarletCarpet ScarletCarpet force-pushed the fix/review-filter-fix branch from 2ffdf0e to 55bcd09 Compare July 3, 2026 12:57
@ScarletCarpet

ScarletCarpet commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

@lizhengfeng101 这里咨询下,为什么reviewfiletr不用tool_call的方式限制LLM的输出呢?保证LLM的注意力么?

修改:
已将文本解析改为functioncalling:

image

@ScarletCarpet ScarletCarpet force-pushed the fix/review-filter-fix branch from 55bcd09 to 90c919a Compare July 3, 2026 16:31
@ScarletCarpet ScarletCarpet changed the title fix(review-filter): recover JSON from natural language response fix(review-filter): add submit_filter_result func for review filter stage Jul 3, 2026
… stage

At review filter stage, LLM(test GLM5.2) may return natural language content such as:
"""
Looking at each comment...
- commit 1: ...
- commit 2: ...

Summary:
```json [...]```
"""

add function calling for LLM will get more stable result for some model of providers.
and this tool only appears at review filter stage, so it does no harm for review quality.
@ScarletCarpet ScarletCarpet force-pushed the fix/review-filter-fix branch from 90c919a to 8032c52 Compare July 3, 2026 16:46
@revenant20

Copy link
Copy Markdown
Contributor

+1 — independent repro with the exact same model, and some data from the session store showing this bug has real output-affecting impact.

Repro. v1.7.1 built from 10f5875, glm-5.2 via an OpenAI-protocol provider, range review of an external Kotlin repo (10 files, 10 comments). The filter ran 6 times (once per commented file); 4 of 6 calls failed to parse with invalid character 'L' — the model reasoned step by step in prose despite the "without any explanation" instruction.

The verdicts were computed and lost. The session store keeps the full filter responses, and all 4 unparseable ones have the same shape — reasoning prose with a correct fenced JSON array as the last line:

... The fundamental mechanism the reviewer describes — synchronous throwing from `launch` — does
not exist in the Kotlin coroutines API. This is verifiable from the diff.

```json
["c-0"]
```

Three of the four lost verdicts were [] (harmless), but the one above was ["c-0"]: the filter had correctly decided to remove a refuted comment, the parse failure discarded that verdict, and the comment shipped in the final output. So this PR fixes real damage, not cosmetics — thanks for it.

Two residual gaps you may want to cover here or in a follow-up (happy to send a PR for either):

  1. The text fallback stays brittle. When a model ignores the tool or a provider lacks function calling, parseFilterResponse still requires the whole response to be a bare array — and the failure shape observed above (prose + trailing fenced array) loses the verdict again. A small salvage step on unmarshal failure — take the last balanced [...] fragment that unmarshals into []string — would have recovered all 4 responses in my run.

  2. Parse failure is still indistinguishable from "checked and kept everything". The new review_filter.completed event reports removed=0 in both cases (and telemetry is off by default), and the raw-response preview in the log is truncated to 200 chars — which is exactly what hides the trailing array. A parse-failure counter surfaced in the run summary would make the fail-open visible.

@ScarletCarpet

Copy link
Copy Markdown
Contributor Author

+1 — 独立重现,使用完全相同的模型,并且一些来自会话存储的数据表明这个错误确实对输出产生了影响。

Repro. v1.7.1 由 10f5875, glm-5.2 构建,通过 OpenAI 协议提供商,范围审查外部 Kotlin 代码库(10 个文件,10 条注释)。过滤器运行了 6 次(每个注释的文件运行一次);6 次调用中有 4 次未能解析invalid character 'L' — 模型尽管有“不作任何解释”的指令,还是逐步用散文进行推理。

判决结果已计算并丢失。 会话存储保持完整的过滤响应,所有 4 个无法解析的响应具有相同的形状 —— 以 正确括起来的 JSON 数组作为最后一行的推理文字:

... The fundamental mechanism the reviewer describes — synchronous throwing from `launch` — does
not exist in the Kotlin coroutines API. This is verifiable from the diff.

```json
["c-0"]

四个败诉中有三个是`[]`(无害的),但上面的那个不是`["c-0"]`:过滤器正确地决定删除一个被证伪的评论,解析失败丢弃了该败诉,而该评论被最终输出。所以这个PR修复了真正的损害,而不是表面问题——感谢你的努力。

两个剩余的差距,您可能希望在这里或后续阶段解决(很乐意为此发送一个PR):

1. **文本回退仍然很脆弱。** 当模型忽略工具或提供者缺乏函数调用时,`parseFilterResponse` 仍然需要整个响应是一个裸数组——而上面观察到的失败形态( prose + 末尾的围栏数组)再次失去了判断。在反序列化失败时进行一个小的恢复步骤——取**最后一个**平衡的`[...]`片段,该片段可以反序列化为`[]string`——将恢复我在运行中的所有4个响应。
2. **解析失败仍然无法与“已检查并保留所有内容”区分开来。** 新的 `review_filter.completed` 事件报告 `removed=0` 在两种情况下(默认情况下遥测数据是关闭的),日志中的原始响应预览被截断为200个字符——这正是隐藏数组尾巴的地方。运行总结中出现的解析失败计数器将使失败打开可见。

thanks a lot for reveiwing!

i'll try to do impromve the logic which you mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants