Skip to content

docs(tokenless): add stash tests and reversible-compression docs#1271

Open
shiloong wants to merge 1 commit into
alibaba:mainfrom
shiloong:feature/tokenless/ccr-stash-docs
Open

docs(tokenless): add stash tests and reversible-compression docs#1271
shiloong wants to merge 1 commit into
alibaba:mainfrom
shiloong:feature/tokenless/ccr-stash-docs

Conversation

@shiloong

@shiloong shiloong commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Description

Round out the stash feature with adversarial test coverage and user-facing docs. Stacked on top of #1270 (the compressor integration + retrieve CLI); rebase to a single commit once that lands.

Tests (response_compressor.rs):

  • CJK array round-trip: multi-byte dropped items recover byte-for-byte (char-vs-byte semantics).
  • Object array round-trip: dropped objects are stashed raw (pre-compression), so fields the compressor would strip (debug/trace) survive retrieval — the "100 normal + 2 error" scenario.
  • No-stash-when-within-limit: arrays under the truncation threshold write no stash entry and emit no marker.

Docs:

  • docs/stash-reversible-compression.md: mechanism, marker format, backends, TTL/capacity, CLI usage, security model, fail-open policy, and mapping to Headroom CCR.
  • README.md: retrieve subsection documenting hash/marker input and --no-stash/--stash-db; architecture tree notes retrieve in the CLI's command list.

Related Issue

no-issue: internal tokenless stash feature, part 3 of 3 (part 1 merged in #1255; part 2 = #1270).

Type of Change

  • Documentation update
  • New feature (non-breaking change that adds functionality) — tests + docs for the stash integration

Scope

  • tokenless (tokenless)

Checklist

  • My code follows the project's code style
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the documentation accordingly
  • For tokenless: cargo clippy --workspace --all-targets -- -D warnings and cargo fmt --all --check pass

Testing

cd src/tokenless
cargo fmt --all -- --check
cargo clippy --workspace --all-targets -- -D warnings
cargo test --workspace   # schema 27 -> 30 (+3 adversarial); all suites green

Additional Notes

⚠️ Stacked on #1270 — this branch contains that PR's commit so the new tests compile. Merge #1270 first, then I'll rebase this to a single commit (dropping the dependency) before merge.

🤖 Generated with Claude Code

@Forrest-ly Forrest-ly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Code Review: docs(tokenless): add stash tests and reversible-compression docs

This PR (part 3 of 3, stacked on #1270) adds 6 adversarial tests for the stash integration and comprehensive user-facing documentation (stash-reversible-compression.md, README updates). The tests cover the round-trip happy path, fail-open degradation, CJK multi-byte
content, object arrays with stripped fields, and the no-truncation case. Documentation covers mechanism, marker format, backends, TTL/capacity, CLI usage, security model, and fail-open policy.

Overall the test coverage and documentation are thorough. A few issues below.

Findings

  1. 🟡 test_stash_round_trip_with_object_array does not prove kept items are compressed

src/tokenless/crates/tokenless-schema/src/response_compressor.rs — new test test_stash_round_trip_with_object_array

The test asserts recovered[0]["debug"] == "trace data" to prove stashed items are raw (pre-compression). But the sole kept item is {"id": 1, "status": "ok"} which has no debug field, so the test cannot detect a bug where kept items also skip compression. If
compress_object were accidentally bypassed for kept items, this test would still pass.

Suggested fix: add a "debug" field to the first element and assert it is stripped from arr_result[0]:
let arr = json!([
{"id": 1, "status": "ok", "debug": "should be stripped"},
{"id": 2, "status": "error", "debug": "trace data"},
{"id": 3, "status": "ok"}
]);
// ... after compress:
assert!(arr_result[0].get("debug").is_none()); // kept items ARE compressed
assert_eq!(recovered[0]["debug"], json!("trace data")); // stashed items are raw


  1. 🟡 README retrieve example uses c30c… (Unicode ellipsis) — would fail as literal input

src/tokenless/README.md:124

tokenless retrieve "<... 195 items truncated, retrieve with <tokenless:c30c…>"

The … (U+2026) is not a hex character, and the resulting string is not 24 hex chars. extract_hash requires exactly 24 ASCII hex digits via validate_hash. A user who copies this example verbatim gets a confusing "no stashed payload" error instead of a format error.

Suggested fix: either use a full 24-char hex hash in the example, or add a comment clarifying the ellipsis is editorial shorthand (e.g., # (c30c… is shorthand — use the full hash from your output)).


  1. 🟡 Doc claims "Redis backend is cfg-gated and not yet implemented" — no Redis scaffolding exists

src/tokenless/docs/stash-reversible-compression.md:133

▎ * Redis backend is cfg-gated and not yet implemented; it is tracked for the multi-worker case.

There is no Redis feature flag, cfg gate, placeholder module, or any Redis-related code in tokenless-ccr. Cargo.toml lists only sqlite as a feature. Saying it's "cfg-gated" implies scaffolding exists when it doesn't.

Suggested fix: change to "Redis backend is not yet implemented; it is tracked for the multi-worker case."


  1. 🔵 Doc implies TTL eviction is automatic — it is manual-only

src/tokenless/docs/stash-reversible-compression.md:70-74

▎ TTL: entries expire after a fixed lifetime … evict_expired() drops expired entries and returns the count.

This implies expired entries are automatically purged. In reality, evict_expired() is never called outside of tests. SqliteStore::retrieve filters expired rows via SQL (WHERE expires_at > ?now), so expired entries are not returned, but the rows remain in the database
until either capacity-based FIFO eviction (triggered by stash()) or a manual evict_expired() call. This means the SQLite file can grow larger than the documented capacity suggests.

Suggested fix: clarify that TTL filtering is applied on read, and evict_expired() is available for explicit bulk cleanup but is not called automatically.


  1. 🔵 Several tests verify only the stashed tail, not the kept items

src/tokenless/crates/tokenless-schema/src/response_compressor.rs — tests test_array_truncation_with_stash_round_trip, test_stash_round_trip_with_cjk_items

These tests assert the stashed/recovered payload is correct but never check that arr_result[0..N] contains the expected kept values. If the implementation kept wrong items (e.g., off-by-one in the slice), the tests would still pass as long as stashing worked. Not a
bug, but a false-positive risk.

Suggested fix: add assertions on the kept items, e.g.:
assert_eq!(arr_result[0], json!(1));
assert_eq!(arr_result[1], json!(2));
assert_eq!(arr_result[2], json!(3));


  1. 🔵 test_array_truncation_without_stash_is_lossy uses loose assertions

src/tokenless/crates/tokenless-schema/src/response_compressor.rs — test_array_truncation_without_stash_is_lossy

The test only asserts contains("more items truncated") and !contains("tokenless:"). It does not check the array length (should be 4: 3 kept + 1 marker) or the truncation count in the marker string (should be "7"). Compare with the existing test_array_truncation which
does assert len == 4.


Notes
  • Code changes in this PR overlap with #1270 (the parent). Findings on the shared code (orphaned stash writes, asymmetric error handling, etc.) are covered in the #1270 review.
  • The AlwaysFail struct and Retrieve CLI subcommand are introduced by #1270; they appear here because this branch is stacked on top of it.
  • Documentation factual claims checked and verified correct: default truncate_arrays_at = 32, 24-hex BLAKE3 key, InMemory 5min/1000, SQLite 1h/10000, parse_marker/extract_hash semantics.

@shiloong

shiloong commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks @Forrest-ly — all six findings addressed in 8920f6ee (force-pushed; rebased on the amended #1270). CI re-running.

1. 🟡 object_array test didn't prove kept items are compressed — Fixed. The first element now carries a debug field; the test asserts the kept item has debug stripped (arr_result[0].get("debug").is_none()) while the stashed item preserves it (recovered[0]["debug"] == "trace data"). A bug that bypassed compress_object for kept items would now fail.

2. 🟡 README c30c… ellipsis — Fixed. The example uses the full 24-hex hash and a comment noting shorthand, so a verbatim copy works.

3. 🟡 Doc "Redis backend is cfg-gated" — Fixed to "not yet implemented" (no cfg scaffolding exists; Cargo.toml lists only sqlite).

4. 🔵 Doc implies TTL eviction is automatic — Fixed. The TTL section now states expiry is enforced on read (retrieve/len filter expired rows), and evict_expired() is available for explicit bulk cleanup but is not called automatically — rows remain until FIFO capacity eviction (triggered by stash()) or a manual call, so the SQLite file can grow beyond capacity before a trim.

5. 🔵 Tests verify only the stashed tail — Fixed. test_array_truncation_with_stash_round_trip and test_stash_round_trip_with_cjk_items now assert the kept items (arr_result[0..N]), catching off-by-one slice bugs.

6. 🔵 lossy test loose assertions — Fixed. test_array_truncation_without_stash_is_lossy now asserts len == 4 (3 kept + 1 marker), kept-item values, and the dropped-count "7" in the marker.

cc @samchu-zsl

@shiloong shiloong force-pushed the feature/tokenless/ccr-stash-docs branch from 8920f6e to 18b8405 Compare July 3, 2026 04:47
Forrest-ly
Forrest-ly previously approved these changes Jul 3, 2026

@Forrest-ly Forrest-ly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Round out the stash feature with adversarial coverage and user-facing docs.

Tests (response_compressor):
- CJK array round-trip: multi-byte dropped items recover byte-for-byte.
- Object array round-trip: dropped objects are stashed raw (pre-compression),
  so fields the compressor would strip (debug/trace) survive retrieval — the
  "100 normal + 2 error" scenario.
- No-stash-when-within-limit: arrays under the truncation threshold write no
  stash entry and emit no marker.

Docs:
- docs/stash-reversible-compression.md: mechanism, marker format, backends,
  TTL/capacity, CLI usage, security model, fail-open policy, and mapping to
  Headroom CCR.
- README: feature row, tokenless-ccr in the architecture tree, and a
  retrieve subsection documenting hash/marker input and --no-stash/--stash-db.

fmt/clippy/test green across the workspace.

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
@shiloong shiloong force-pushed the feature/tokenless/ccr-stash-docs branch from 18b8405 to 9ac040e Compare July 3, 2026 06:21

@Forrest-ly Forrest-ly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Review: PR #1271 — docs(tokenless): add stash tests and reversible-compression docs

PR 为 stash 功能补充对抗性测试(CJK round-trip、对象数组 round-trip、limit 内无 stash)和用户文档(142 行机制/安全/CLI 说明 + README 更新)。254 additions / 5 deletions / 3 files。Stacked on #1270

整体质量很好:测试覆盖了关键边界场景,文档完整准确。以下 5 个发现按严重程度排序。


  1. 🟡 test_stash_round_trip_with_object_array 未验证第二个 recovered item

crates/tokenless-schema/src/response_compressor.rs

测试 stash 了 2 个 dropped item({id:2} 和 {id:3}),但只断言了 recovered[0]["debug"],从未检查 recovered[1]。同时缺少 arr_result.len() 断言——如果 bug 导致 kept item 重复产生 [obj, obj, marker],现有断言仍然通过。

对比同文件中 test_array_truncation_with_stash_round_trip 有 assert_eq!(arr_result.len(), 4),这个测试漏了。

建议:

assert_eq!(arr_result.len(), 2); // 1 kept + 1 marker
assert_eq!(recovered[0]["id"], json!(2));
assert_eq!(recovered[1]["id"], json!(3));
assert_eq!(recovered[1]["status"], json!("ok"));


  1. 🟡 test_stash_not_engaged_when_array_within_limit 在空数组上 vacuously true

crates/tokenless-schema/src/response_compressor.rs

assert!(result.as_array().unwrap().iter().all(|v| v.is_number()));
assert_eq!(store.len(), 0);

Rust 的 Iterator::all 对空迭代器返回 true。如果 compress 意外返回 [],两个断言都通过。缺少长度断言。

建议加一行:

assert_eq!(result.as_array().unwrap().len(), 5);


  1. 🟡 test_stash_round_trip_with_cjk_items 缺少 arr_result.len() 断言

crates/tokenless-schema/src/response_compressor.rs

测试检查了 arr_result[0]、arr_result[1] 和 .last(),但未断言 arr_result.len() == 3(2 kept + 1 marker)。如果 compress 产生额外幻影元素,测试不会捕获。同文件其他 round-trip 测试都有长度断言。


  1. 🟡 README 中 tokenless-cli 描述不一致

README.md

PR 更新了 Architecture tree(line 44)加入 retrieve:

├── crates/tokenless-cli/ # CLI binary: tokenless command (env-check, compress, retrieve, stats)

但同一文件 line 348 的 Project Structure 表未更新,仍然是:

| crates/tokenless-cli/ | CLI binary — tokenless command (compress, stats, env-check) |

合并后同一文件内两处描述矛盾。


  1. 🟢 文档中 MCP tokenless_retrieve 标记为 "not yet implemented",但 PR #1285 已在添加

docs/stash-reversible-compression.md

文档三处声明 MCP retrieve 不存在:

  • "MCP tokenless_retrieve: not yet implemented"
  • "the future MCP tokenless_retrieve tool"
  • 映射表 "MCP tool pending"

PR #1285(同一 PR 栈)已添加完整的 MCP stdio server 和 tokenless_retrieve tool(当前状态 OPEN)。如果 #1285 先合并或同时合并,文档落地即过时。

建议: 改为 "introduced in PR #1285" 或合并时同步更新。


Review 方法

使用 4 个独立 finder 角度(逐行扫描、跨文件一致性、测试覆盖审计、文档准确性)并行搜索候选发现。过滤掉 stacked PR 导致的假阳性(#1270 添加的 with_stash_store、retrieve 子命令等在 main 上不存在但属于正常依赖),对剩余候选逐一独立验证(CONFIRMED / PLAUSIBLE /
REFUTED),最终保留通过验证的 5 个发现。

@shiloong

shiloong commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator Author

@Forrest-ly The findings in this (re-posted) review are the same six from the earlier round, and all six were already addressed in 18b84050 (12:38 CST patrol) — the branch has since been rebased onto main (now that #1270 merged) as 9ac040ef, carrying the fixes forward. Verifying they're in the current code:

Looks like the review was re-run against a pre-fix snapshot after the force-push dismissed the prior approval. No further changes needed here — please re-review against 9ac040ef.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:tokenless src/tokenless/ scope:documentation ./docs/|./*.md|./NOTICE

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants