Skip to content

feat: add CountIndexedTree element with auto-cascading secondary index#657

Open
QuantumExplorer wants to merge 72 commits into
developfrom
claude/gallant-elion-214ef4
Open

feat: add CountIndexedTree element with auto-cascading secondary index#657
QuantumExplorer wants to merge 72 commits into
developfrom
claude/gallant-elion-214ef4

Conversation

@QuantumExplorer
Copy link
Copy Markdown
Member

@QuantumExplorer QuantumExplorer commented May 10, 2026

Summary

Adds Element::CountIndexedTree and Element::ProvableCountIndexedTree — element types that pair a CountTree-shaped primary Merk with a count-ordered secondary Merk. The pairing turns "top-k by count" and "count in [a, b]" queries from O(n) into O(log n + k) while preserving GroveDB's standard proof semantics. Each element points at two child Merks; the parent Merk binds both into its value_hash via H1-A composition: combine_hash_three(value_hash(elem_bytes), primary_root_hash, secondary_root_hash). The secondary uses ProvableCountedMerkNode so existing AggregateCountOnRange machinery applies natively, and lives at the derived prefix Blake3(primary_prefix ‖ 0x01).

Design doc

docs/book/src/count-indexed-tree.md — full design and shipped-API reference. All major decisions ratified inline (H1-A, S2-B, V1-A, Q1-A, S1-A, W1, C1).

Public API surface

Direct (non-batch) APIs:

  • db.insert_into_count_indexed_tree(...), db.delete_from_count_indexed_tree(...) — dedicated APIs that mirror to the secondary and chain H1-A
  • db.count_indexed_top_k(...), db.count_indexed_count_range(...) — read APIs in count order
  • db.reconcile_count_indexed_tree_secondary(...) — rebuild the secondary from the primary on demand

Proofs:

  • db.prove_count_indexed_top_k(...) / GroveDb::verify_count_indexed_top_k(...) — dedicated proof shape for top-k
  • db.prove_count_indexed_query(...) / GroveDb::verify_count_indexed_query(...) — accepts an arbitrary MerkQuery over the secondary's (count_be ‖ key) keyspace
  • V1 generic prove_query / verify_query descend cidx via combine_hash_three-aware proof emission

Batch path:

  • apply_batch / apply_partial_batch support cidx primary item-level mutations end-to-end
  • Empty cidx creation, deletion via DeleteTree, and item mutations inside a cidx primary all work
  • Mutations bubble up via a new internal GroveOp::ReplaceCountIndexedTreeRootKeys variant
  • Nested cidx (cidx → cidx, cidx → tree → cidx) work correctly

Integrity:

  • verify_grovedb walks every cidx primary and asserts (a) chain integrity via H1-A and (b) content consistency between primary's count_value and the secondary's entries

Test coverage

~85 cidx-focused tests in grovedb/src/tests/count_indexed_tree_tests.rs, organized by scenario:

Scenario Tests
Element construction + serialization round-trip 4
Direct insert / delete happy paths 8
Reference insertion + resolution into cidx primary 4
Nested cidx (2-level, 3-level, mixed cidx → tree → cidx) 5
Batch path: item-level mutations, multi-key, multi-cidx 7
Batch path: DeleteTree on cidx (empty + non-empty, DontCheckWithNoCleanup) 4
Batch path: cidx-overwrite rejection (both flag-on and flag-off paths) 3
Atomicity: mixed cidx + non-cidx batches with mid-batch failure 5
count_indexed_top_k / count_indexed_count_range correctness 8
Proof generation + verification (top-k, arbitrary query, nested envelope) 7
V1 generic prove/verify cidx subquery + V0 frozen rejection 4
Direct delete cleans both primary and secondary storage 2
Batch DeleteTree cleans both namespaces (apply_batch + apply_partial_batch) 3
db.insert rejects cidx primary target 1
verify_grovedb content-consistency check via deliberate corruption 3
Property-based stress tests — 300+200 random ops, single + nested cidx, deterministic seed 2
Differential test — direct API vs batch path produce identical state on same input 1
Rejection tests with tightened variant-specific assertions ~13

All 1579 grovedb tests pass (debug + release).

Audit findings + fixes

During development I audited the PR and found a real silent-corruption bug, plus closed two related gaps that the audit exposed:

  1. Nested-cidx batch bubble-up missed the outer's secondary mirror (a8bb34fb). The mutates match in execute_ops_on_path listed every variant except ReplaceCountIndexedTreeRootKeys (the new variant introduced for cidx bubble-up). When inner cidx → outer cidx propagation fired, the outer's pre-state HashMap stayed empty for the inner's key, post-apply mirror walked an empty list, outer's secondary stayed stale while H1-A still passed. Subsequent commit fb71003a refactored the mutates check into a method GroveOp::can_mutate_child_count with an exhaustive match (no wildcard) so adding any future GroveOp variant fails to compile until classified — type-system guard against this bug class.

  2. verify_grovedb only checked chain integrity, not content consistency (dd90e6b5). The H1-A walk verified the cidx element's value_hash matches the two Merks' root hashes via combine_hash_three, but never validated the secondary's contents against the primary's count_value. That's exactly the gap the nested-cidx bug exploited — the secondary's content was stale but its root hash was internally consistent. Added a content-consistency pass that walks both Merks' raw storage and reports mismatches via sentinel paths (__cidx_primary_orphan__, __cidx_secondary_orphan__, __cidx_count_mismatch__, __cidx_secondary_malformed_key__). Three deliberate-corruption tests confirm each drift class is caught.

  3. delete_from_count_indexed_tree didn't clean up tree-entry child storage (4f1d7305). Found by the new property-based stress test in iteration <100. Same class as the storage-orphan bugs fixed earlier in db.delete and batch DeleteTree. Fix: after removing the cidx entry from the primary, if the deleted entry was a tree, run find_subtrees + clear on each subtree's storage, and clear the secondary namespace for nested cidx entries.

Known limitations + workarounds

  • Batch overwrite of an existing cidx element is rejected. Storage-pointer semantics are ambiguous (the new element's primary_root_key / secondary_root_key could be intended to reuse the old storage or refer to fresh storage; neither resolution is safe to assume). Workaround: DeleteTree the cidx in one batch (now cleans both namespaces), then re-create in a follow-up batch. Both batches are clean today.
  • db.insert() into a cidx primary target is rejected. Use db.insert_into_count_indexed_tree(...). The dedicated API handles the dual-Merk write; the generic insert path has no secondary-mirror hook and would produce silent drift.
  • V0 generic prove_query / verify_query do not support cidx descents. V0 is a frozen wire format. V1 generic and the dedicated prove_count_indexed_* entry points both work.

Follow-up items (not blocking this PR)

  • BigSumIndexedTree / ProvableSumTree / ProvableSumIndexedTree (parallel features, separate PRs)
  • Batch overwrite of cidx with non-cidx or empty cidx (the unambiguous safe subset; cidx → non-empty cidx stays rejected)
  • Performance benchmarks against plain CountTree for representative workloads

Backwards compatibility / on-disk format

  • Element discriminants 17 / 18 / 145 / 146 allocated (cidx, provable cidx, NonCounted twins). Discriminants 19+ reserved for the sum-indexed follow-up.
  • The Element enum variant order matches the discriminant numbering so bincode's auto-derived encoding round-trips correctly. The merge with NotSummed (2498b97f) involved re-shuffling the variant order to maintain this invariant — documented in that commit.
  • V0 generic prove/verify code paths are unchanged; cidx descents are explicitly rejected on V0 (frozen wire format).
  • No existing data is touched; cidx is purely additive.

How to review

The commit history is the design narrative. Recommended reading order:

  1. The book chapter (docs/book/src/count-indexed-tree.md)
  2. Element + ElementType + TreeType + feature-type additions (in grovedb-element/, grovedb-query/, merk/src/)
  3. Dedicated direct APIs (grovedb/src/operations/count_indexed_tree.rs)
  4. Proof shape (grovedb/src/operations/proof/count_indexed.rs)
  5. V1 generic proof integration (grovedb/src/operations/proof/generate.rs + verify.rs)
  6. Auto-propagation (grovedb/src/lib.rs)
  7. Batch path integration (grovedb/src/batch/mod.rs)
  8. verify_grovedb content-consistency check (grovedb/src/lib.rs)
  9. Tests (grovedb/src/tests/count_indexed_tree_tests.rs)

🤖 Generated with Claude Code

A note on test ordering

A few commits in the history carry "to clear codecov 80%" framing in their messages. To be honest, those tests were discovered (via coverage reports flagging untested paths) rather than written ahead of the corresponding implementation. The content of each is still a real functional test — delete behavior, proof round-trips, integrity checks, nested propagation — and they remain in the test suite on their merits. Future contributions should aim to write the test first when possible.

Summary by CodeRabbit

  • New Features
    • CountIndexedTree & ProvableCountIndexedTree: new count-indexed tree types with atomic primary+secondary updates, direct insert/delete, top-k and count-range queries, proof generation/verification, integrity-check DB open mode, and a 247-byte item-key limit for secondary entries.
  • Documentation
    • Added comprehensive chapter and TOC entry describing dual-tree design, keys, queries, proofs, semantics, and limits.
  • Tests & Benchmarks
    • New tests and benchmarks covering operations, queries, proofs, reconciliation, propagation, and edge cases.

Review Change Stack

Adds two new GroveDB element types — CountIndexedTree and
ProvableCountIndexedTree — that pair a CountTree-shaped primary Merk with
a count-ordered secondary Merk for sub-linear top-k and count-range
queries.

Each element points at two child Merks. The parent Merk binds both via
H1-A composition: combined_value_hash = Blake3(actual_value_hash ||
primary_root_hash || secondary_root_hash). The secondary is itself a
ProvableCountTree (each entry contributes count = 1) so existing
AggregateCountOnRange machinery applies natively.

Storage prefix derivation (S2-B): primary keeps the existing
build_prefix(path); secondary is Blake3(primary_prefix || 0x01).

Public API:
- insert_into_count_indexed_tree / delete_from_count_indexed_tree —
  dedicated direct APIs that mirror to the secondary inline and chain
  the H1-A combine into the parent.
- count_indexed_top_k / count_indexed_count_range — read APIs walking
  the secondary in count order.
- reconcile_count_indexed_tree_secondary — rebuild the secondary from
  the primary on demand; used after batch operations that bypass the
  dedicated write path.
- prove_count_indexed_top_k / verify_count_indexed_top_k — proof
  generation and verification for top-k queries, binding the secondary
  range proof to the GroveDB root hash via the H1-A composition.
- Empty CountIndexedTree elements can be created via apply_batch.

Auto-cascading: propagate_changes_with_transaction is now CountIndexed-
aware. When the propagation pass crosses a CountIndexedTree primary
level, it mirrors the count delta to that level's secondary; when a
CountIndexedTree element needs reconstruction, it uses the H1-A
three-input combine. Nested CountIndexedTrees and deep db.insert paths
through sub-trees of a cidx primary cascade correctly.

Design doc at docs/book/src/count-indexed-tree.md captures the ratified
decisions (H1-A, S2-B, V1-A, Q1-A, S1-A, Q2 with conditional subqueries
deferred). Spike note at docs/spikes/cascading-aggregation-spike.md
records the architectural analysis for the propagation refactor.

Tests: 27 dedicated tests covering empty creation, insert/update/delete
with count deltas, NonCounted handling, deep cascading through sub-trees,
nested CountIndexedTrees, top-k and count-range queries, reconciliation,
batch creation, proof round-trips, and forge tests (tampered bytes,
wrong path).

Workspace: 2615 lib tests pass, no regressions.

Deferred for follow-up:
- Item-level batch inserts INTO a cidx primary (use the dedicated API)
- Replication / chunk restoration support for two-Merk subtrees
- Conditional-by-count subqueries within CountIndexedQuery (Q2.3)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds CountIndexedTree and ProvableCountIndexedTree element types with parallel primary/secondary Merks, H1-A hashing, direct/batch operations, secondary mirroring, upward propagation with deferred state, proof generation/verification for top-k and count-range queries, storage prefix derivation, comprehensive tests, and benchmarks.

Changes

CountIndexedTree Dual-Merk Implementation

Layer / File(s) Summary
Specification & Documentation
docs/book/src/SUMMARY.md, docs/book/src/appendix-a.md, docs/book/src/count-indexed-tree.md
Protocol-observable design and type reference for count-indexed trees with dual-Merk secondary indexing, H1‑A hashing, write/read/batch/proof semantics, and limitations.
Type System: ElementType & TreeType
grovedb-element/src/element_type.rs, merk/src/tree_type/mod.rs, merk/src/tree_type/costs.rs
New ElementType variants 17–18 and NonCounted twins 145–146; TreeType variants 11–12 with discriminant round-trip, predicates, and cost sizing.
Element Enum & Constructors
grovedb-element/src/element/mod.rs, grovedb-element/src/element/constructor.rs
Element enum variants with optional primary/secondary root keys, count, and flags; factory constructors for empty/parameterized CountIndexedTree and ProvableCountIndexedTree.
Element Helpers & Tree Inspection
grovedb-element/src/element/helpers.rs, merk/src/element/tree_type.rs
count_value extraction, tree classification, flag accessors, and tree-type feature mapping for count-indexed variants.
Element Display & Visualization
grovedb-element/src/element/visualize.rs
Visualization impl rendering count_indexed_tree and provable variants with primary/secondary keys and count; unit tests added.
H1‑A Hash Composition & Node APIs
merk/src/tree/hash.rs, merk/src/tree/kv.rs, merk/src/tree/mod.rs, merk/src/tree/walk/mod.rs
combine_hash_three utility; KV/TreeNode constructors and update paths for three-input composition; walker helper for two-reference updates.
Merk Ops & Element Storage Integration
merk/src/tree/ops.rs, merk/src/element/costs.rs, merk/src/element/delete.rs, merk/src/element/get.rs, merk/src/element/insert.rs, merk/src/element/reconstruct.rs
New Put/Replace layered count-indexed op variants; cost sizing and layered-value handling; insert/delete/get routing; reconstruct_with_two_root_keys and insert_count_indexed_subtree APIs.
Storage: Secondary Prefix
storage/src/rocksdb_storage/storage.rs
RocksDbStorage::secondary_prefix_for deriving deterministic secondary subtree prefix via Blake3; tests added.
Direct Operations
grovedb/src/operations/count_indexed_tree.rs
insert/delete/reconcile/top_k/count_range implementations with secondary key encoding (u64 BE
Operations Module & Insert Wiring
grovedb/src/operations/mod.rs, grovedb/src/operations/insert/mod.rs
Exports count_indexed_tree under minimal; generic insert path rejects direct primary merk creation and validates provided child root keys, delegating to count-indexed insertion APIs.
Batch Execution & Propagation
grovedb/src/batch/mod.rs, grovedb/src/batch/batch_structure.rs, grovedb/src/batch/estimated_costs/*, grovedb/src/batch/just_in_time_reference_update.rs
ReplaceAggregateIndexedTreeRootKeys GroveOp; pre-apply count capture, secondary mirroring, deferred secondary bubble-up, batch insertion emptiness enforcement, and cleanup of secondary namespaces for deletes/overwrites; cost estimators updated.
Propagation & Integration
grovedb/src/lib.rs, grovedb/src/operations/delete/mod.rs, grovedb/src/operations/get/query.rs
propagate_changes_with_transaction_with_initial_deferred for deferred secondary state; update_count_indexed_tree_item_preserve_flag_into_batch_operations; open_with_cidx_integrity_check; verify_grovedb H1‑A checks; delete now clears cidx secondary namespaces; query routing updated to map cidx elements to count values where appropriate.
Proof System
grovedb/src/operations/proof/count_indexed.rs, grovedb/src/operations/proof/mod.rs, grovedb/src/operations/proof/generate.rs, grovedb/src/operations/proof/verify.rs
CountIndexedRangeProof envelope and prover/verify paths; secondary range proof integrated, verify uses combine_hash_three for H1‑A; V0 rejects cidx subqueries, V1 descends into primary and wraps primary proof with secondary-root attestation.
Tests & Benchmarks
grovedb/src/tests/count_indexed_tree_tests.rs, grovedb/src/tests/v1_proof_tests.rs, grovedb/Cargo.toml, grovedb/benches/cidx_benchmark.rs, grovedb/src/tests/mod.rs
Comprehensive functional tests (insertion, deletion, cascades, queries, reconciliation, proofs, batch behavior) and new Criterion bench target cidx_benchmark.

Estimated code review effort
🎯 5 (Critical) | ⏱️ ~120 minutes

(_/)
(•_•) I tally hops and hash with glee,
Two Merks mirrored beneath the tree.
Top-k, range, and proofs to show,
A rabbit’s hop where counts will grow.
🐇 Hop—index—let queries flow.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/gallant-elion-214ef4

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (4)
merk/src/tree/ops.rs (1)

389-410: ⚡ Quick win

Add focused unit coverage for the new op variants.

This module’s local tests still exercise only the legacy Put/Delete paths, so regressions in new_with_layered_value_hash_three(...) or put_value_with_two_reference_value_hashes_and_value_cost(...) would currently slip through here. A pair of tests that hits both apply_to(None, ...) and update-on-existing-node would lock down the new hashing path well.

Also applies to: 561-589

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@merk/src/tree/ops.rs` around lines 389 - 410, Add unit tests that exercise
the new op variants PutLayeredCountIndexedReference and
ReplaceLayeredCountIndexedReference so the layered hashing path is covered:
write tests that call the op's apply_to(None, ...) to create a fresh node and
then apply the op again against an existing node (update-on-existing-node) to
exercise TreeNode::new_with_layered_value_hash_three and the
put_value_with_two_reference_value_hashes_and_value_cost code paths; assert
expected node hashes, costs, and stored references (use mid_key/mid_value
equivalents from the diff) and mirror these tests for both variants to prevent
regressions.
merk/src/element/reconstruct.rs (1)

97-125: ⚡ Quick win

Add a direct test for reconstruct_with_two_root_keys.

This helper is now the reconstruction path for count-indexed parents, but the test module still exercises only reconstruct_with_root_key. A small test for both raw and NonCounted-wrapped count-indexed elements would catch swapped root keys or wrapper loss early.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@merk/src/element/reconstruct.rs` around lines 97 - 125, Add unit tests that
call reconstruct_with_two_root_keys directly: create both CountIndexedTree and
ProvableCountIndexedTree Element instances (and their NonCounted(Box::new(...))
wrapped variants), call reconstruct_with_two_root_keys with distinct
primary_root_key and secondary_root_key and an AggregateData that yields a known
count, and assert the returned Element preserves the correct variant, wrapper
(NonCounted present when expected), and that primary_root_key and
secondary_root_key are placed in the reconstructed Element in the correct order
(i.e., not swapped). Use the existing AggregateData helpers and Element
constructors to build inputs and compare reconstructed fields to expected
values.
grovedb/src/operations/count_indexed_tree.rs (2)

316-381: ⚡ Quick win

Extract the nested-secondary mirror path into one helper.

The grandparent lookup, parent-secondary mirror, and deferred-secondary seeding logic is duplicated almost verbatim in both insert and delete. This path is subtle, and keeping two copies in sync will be error-prone as the CountIndexedTree propagation rules evolve. A shared helper returning the initial deferred-secondary state would reduce drift risk here.

Also applies to: 1090-1155

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grovedb/src/operations/count_indexed_tree.rs` around lines 316 - 381, The
code that computes initial_deferred_secondary (the grandparent lookup,
extracting parent_secondary_root_key_before from gp_element, opening
parent_secondary_merk via open_count_indexed_secondary_at_path, calling
mirror_to_secondary, and returning (sh, sk) from
parent_secondary_merk.root_hash_key_and_aggregate_data()) is duplicated in
insert and delete; extract this into a single helper (e.g.,
compute_initial_deferred_secondary or seed_nested_secondary) that accepts
parent_path, parent_merk, count_indexed_key, old_count_in_parent,
new_count_in_parent, transaction, batch, grove_version and returns Option<(sh,
sk)> or an error, then replace both duplicated blocks with a call to that helper
and reuse it from the same call sites (keeping references to
mirror_to_secondary, open_transactional_merk_at_path,
open_count_indexed_secondary_at_path and
Element::CountIndexedTree/ProvableCountIndexedTree logic inside the helper).

155-163: ⚡ Quick win

Use cost_return_on_error! for these early exits.

These branches hand-roll return Err(...).wrap_with_cost(cost) instead of using the repo-standard early-return helper that the rest of the Rust codebase expects for cost accounting. Converting these sites would make the file consistent with the project convention.

As per coding guidelines **/*.rs: Use cost_return_on_error! macro for early returns with cost accumulation in Rust source files.

Also applies to: 478-486, 847-855, 933-941

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grovedb/src/operations/count_indexed_tree.rs` around lines 155 - 163, Replace
the manual early-return that does `return
Err(Error::InvalidPath(...)).wrap_with_cost(cost)` after calling
`path.derive_parent()` with the project-standard macro `cost_return_on_error!`,
e.g. invoke `cost_return_on_error!(Error::InvalidPath("cannot insert into
count-indexed tree at the root path".to_string()), cost)` so cost accounting is
applied consistently; apply the same change to the other analogous early-exit
sites in this file that wrap `Err(...).wrap_with_cost(cost)` (the other
occurrences around the count-indexed-tree logic) so all early returns use
`cost_return_on_error!` instead of hand-rolled `wrap_with_cost(cost)`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/book/src/count-indexed-tree.md`:
- Around line 3-7: Update the status banner string "Status: design ratified,
awaiting implementation." in the docs/book/src/count-indexed-tree.md to reflect
that the feature is implemented and available (e.g., change to "Status:
implemented and available" or similar); locate the exact banner line containing
that phrase and replace it with the new implemented/available wording so the
docs match the delivered code and tests.
- Around line 212-217: Replace the absolute phrase "Collision-free secondary"
and any wording that claims absolute collision-freedom with language that
accurately describes domain separation and collision resistance for the
`secondary prefix` derived via Blake3; e.g., explain that the secondary prefix
is produced by Blake3 over a fixed 33-byte input and is domain-separated from
path-derived prefixes, making collisions extremely unlikely
(collision-resistant) given the construction, rather than stating it is
impossible. Also keep the existing rationale about the fixed-length 33-byte
input vs. variable-length `path_body` (ending with per-segment length bytes) and
the use of a distinct trailing tag to clarify why the two classes of prefixes do
not overlap.

In `@grovedb/src/batch/mod.rs`:
- Around line 1969-2037: The branch handling Element::CountIndexedTree /
Element::ProvableCountIndexedTree is unsafe because the generic batch pipeline
(operations like ReplaceTreeRootKey, InsertTreeWithRootHash and DeleteTree) only
handles a single child root and does not propagate or clean up the secondary
prefix (child_path / find_subtrees(child_path)), which can leave secondary
indexes stale; change this branch to reject count-indexed tree insertions in
apply_batch and force callers to use the dedicated APIs
(insert_into_count_indexed_tree / delete_from_count_indexed_tree). Concretely:
remove or disable the code path that calls
insert_count_indexed_subtree_into_batch_operations and instead return
Err(Error::InvalidBatchOperation(...)) with a message instructing to use the
dedicated insert_into_count_indexed_tree/delete_from_count_indexed_tree APIs; if
you prefer to support it, implement two-root-key propagation in the batch
pipeline by extending ReplaceTreeRootKey/InsertTreeWithRootHash/ DeleteTree
handling to accept and propagate both primary and secondary root keys and ensure
find_subtrees(child_path) is run/cleaned for the derived secondary prefix, but
the minimal fix is to reject count-indexed operations here and point callers to
the dedicated APIs.

In `@grovedb/src/lib.rs`:
- Around line 1182-1190: In verify_grovedb(), do not unconditionally skip
Element::CountIndexedTree / Element::ProvableCountIndexedTree: replace the
current "continue" branch with a call into the H1-A verification path for
count-indexed nodes (e.g. invoke the module/function that performs H1-A
verification for count-indexed trees, or add a new
verify_h1a_count_indexed(node, ...) function and call it from the
Element::CountIndexedTree / Element::ProvableCountIndexedTree arm); if the H1-A
verifier is not yet implemented, fail closed by returning an
Err(VerificationError::UnsupportedCountIndexedNode or similar) from
verify_grovedb() instead of treating the node as verified. Ensure you reference
and propagate errors from the H1-A verifier so verify_grovedb() reports
corruption rather than silently continuing.

In `@grovedb/src/operations/count_indexed_tree.rs`:
- Around line 793-831: The current logic uses Query::new() + insert_all() and
then post-filters by lo_count/hi_count which causes full scans; instead
construct a query that seeks directly to the encoded secondary-key bounds so the
iterator starts inside the requested window. Replace the insert_all() usage in
the count-indexed scan (where KVIterator::new(..., &all_query) is created) with
a Query configured to start at the encoded lower or upper secondary key (use the
same secondary-key encoding used by decode_secondary_key) depending on
descending: for ascending, build a start key based on
encode_secondary_key(lo_count, minimal_original_key) and an optional end key
based on encode_secondary_key(hi_count, maximal_original_key); for descending,
start the query at the encoded upper bound and iterate left_to_right=false.
Ensure inclusivity semantics for counts equal to lo_count/hi_count and keep the
same decode_secondary_key/count checks, but the iterator will no longer scan
from the collection edge.

In `@grovedb/src/operations/get/query.rs`:
- Around line 557-560: In function query_item_value_or_sum, the
reference-resolution branch currently doesn't handle Element::CountIndexedTree
and Element::ProvableCountIndexedTree, causing referenced counts to fall through
to InvalidQuery; update the reference-handling match (the branch that resolves
referenced elements) to mirror the direct-element branch by matching
Element::CountIndexedTree(.., count_value, _) and
Element::ProvableCountIndexedTree(.., count_value, _) and returning
QueryItemOrSumReturnType::CountValue(count_value) so referenced count elements
are handled consistently.

In `@grovedb/src/operations/proof/count_indexed.rs`:
- Around line 41-64: The CountIndexedRangeProof envelope currently only carries
a single primary_root_hash, so nested count-indexed ancestors cannot be attested
when building the chain in combine_hash (see combine_hash and the path[..last]
chaining); fix by extending the proof to include per-ancestor H1-A attestation
data (e.g. replace primary_root_hash: [u8;32] with a Vec<[u8;32]> or
primary_root_hashs: Vec<[u8;32]> aligned with layer_proofs) and update the
verifier logic that iterates path layers (the code at lines that use
combine_hash over layer_proofs/path) to consume the corresponding primary
attestation for each layer instead of always using a single primary_root_hash so
nested CountIndexedTree ancestors validate correctly.

In `@grovedb/src/operations/proof/generate.rs`:
- Around line 1463-1465: The code in generate.rs currently treats
Element::CountIndexedTree and Element::ProvableCountIndexedTree like append-only
or fixed-size trees by falling into the final continue arm, which silently
allows V1 subqueries that will produce proofs failing verification; update the
match so that CountIndexedTree and ProvableCountIndexedTree are handled the same
way as the other rejected subquery variants (i.e., return an error/abort the
subquery attempt) instead of continuing – locate the match over Element in the
proof generation function (the arm with
Ok(Element::DenseAppendOnlyFixedSizeTree(..)) |
Ok(Element::CountIndexedTree(..)) | Ok(Element::ProvableCountIndexedTree(..)) =>
continue) and move or duplicate the CountIndexedTree and
ProvableCountIndexedTree variants into the branch that rejects unsupported
subqueries for V1 so non-empty count-indexed trees produce an immediate error
rather than proceeding.

In `@grovedb/src/tests/count_indexed_tree_tests.rs`:
- Around line 829-871: Update the test reconcile_rebuilds_secondary_from_scratch
to first corrupt/clear the secondary index before calling
reconcile_count_indexed_tree_secondary so you actually test rebuilding: after
inserting the CountIndexedTree and its entries (using db.insert and
db.insert_into_count_indexed_tree), explicitly invalidate the secondary (for
example by deleting secondary nodes or overwriting the secondary element for the
path [TEST_LEAF, b"cidx"] with a broken/empty secondary using available db
remove/insert APIs), then call db.reconcile_count_indexed_tree_secondary(...)
and finally assert that db.count_indexed_top_k(...) returns the expected top-k
result; reference functions: reconcile_rebuilds_secondary_from_scratch,
reconcile_count_indexed_tree_secondary, count_indexed_top_k,
db.insert_into_count_indexed_tree.

In `@merk/src/tree/hash.rs`:
- Around line 151-153: The doc comment for combine_hash_three contradicts the
implementation: it says "cost is one hash call" but the function records
hash_node_calls: 2; update the documentation on combine_hash_three to state the
correct cost (two hash calls) and explain briefly that 96 bytes span two 64-byte
Blake3 compression blocks so hash_node_calls is 2, ensuring the comment matches
the implementation.

---

Nitpick comments:
In `@grovedb/src/operations/count_indexed_tree.rs`:
- Around line 316-381: The code that computes initial_deferred_secondary (the
grandparent lookup, extracting parent_secondary_root_key_before from gp_element,
opening parent_secondary_merk via open_count_indexed_secondary_at_path, calling
mirror_to_secondary, and returning (sh, sk) from
parent_secondary_merk.root_hash_key_and_aggregate_data()) is duplicated in
insert and delete; extract this into a single helper (e.g.,
compute_initial_deferred_secondary or seed_nested_secondary) that accepts
parent_path, parent_merk, count_indexed_key, old_count_in_parent,
new_count_in_parent, transaction, batch, grove_version and returns Option<(sh,
sk)> or an error, then replace both duplicated blocks with a call to that helper
and reuse it from the same call sites (keeping references to
mirror_to_secondary, open_transactional_merk_at_path,
open_count_indexed_secondary_at_path and
Element::CountIndexedTree/ProvableCountIndexedTree logic inside the helper).
- Around line 155-163: Replace the manual early-return that does `return
Err(Error::InvalidPath(...)).wrap_with_cost(cost)` after calling
`path.derive_parent()` with the project-standard macro `cost_return_on_error!`,
e.g. invoke `cost_return_on_error!(Error::InvalidPath("cannot insert into
count-indexed tree at the root path".to_string()), cost)` so cost accounting is
applied consistently; apply the same change to the other analogous early-exit
sites in this file that wrap `Err(...).wrap_with_cost(cost)` (the other
occurrences around the count-indexed-tree logic) so all early returns use
`cost_return_on_error!` instead of hand-rolled `wrap_with_cost(cost)`.

In `@merk/src/element/reconstruct.rs`:
- Around line 97-125: Add unit tests that call reconstruct_with_two_root_keys
directly: create both CountIndexedTree and ProvableCountIndexedTree Element
instances (and their NonCounted(Box::new(...)) wrapped variants), call
reconstruct_with_two_root_keys with distinct primary_root_key and
secondary_root_key and an AggregateData that yields a known count, and assert
the returned Element preserves the correct variant, wrapper (NonCounted present
when expected), and that primary_root_key and secondary_root_key are placed in
the reconstructed Element in the correct order (i.e., not swapped). Use the
existing AggregateData helpers and Element constructors to build inputs and
compare reconstructed fields to expected values.

In `@merk/src/tree/ops.rs`:
- Around line 389-410: Add unit tests that exercise the new op variants
PutLayeredCountIndexedReference and ReplaceLayeredCountIndexedReference so the
layered hashing path is covered: write tests that call the op's apply_to(None,
...) to create a fresh node and then apply the op again against an existing node
(update-on-existing-node) to exercise
TreeNode::new_with_layered_value_hash_three and the
put_value_with_two_reference_value_hashes_and_value_cost code paths; assert
expected node hashes, costs, and stored references (use mid_key/mid_value
equivalents from the diff) and mirror these tests for both variants to prevent
regressions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eb221ea0-9ae9-4765-9e7e-b2c6aec12c7b

📥 Commits

Reviewing files that changed from the base of the PR and between 347bd9b and 8a87da8.

📒 Files selected for processing (35)
  • docs/book/src/SUMMARY.md
  • docs/book/src/appendix-a.md
  • docs/book/src/count-indexed-tree.md
  • docs/spikes/cascading-aggregation-spike.md
  • grovedb-element/src/element/constructor.rs
  • grovedb-element/src/element/helpers.rs
  • grovedb-element/src/element/mod.rs
  • grovedb-element/src/element/visualize.rs
  • grovedb-element/src/element_type.rs
  • grovedb/src/batch/mod.rs
  • grovedb/src/lib.rs
  • grovedb/src/operations/count_indexed_tree.rs
  • grovedb/src/operations/get/query.rs
  • grovedb/src/operations/insert/mod.rs
  • grovedb/src/operations/mod.rs
  • grovedb/src/operations/proof/count_indexed.rs
  • grovedb/src/operations/proof/generate.rs
  • grovedb/src/operations/proof/mod.rs
  • grovedb/src/operations/proof/verify.rs
  • grovedb/src/tests/count_indexed_tree_tests.rs
  • grovedb/src/tests/mod.rs
  • merk/src/element/costs.rs
  • merk/src/element/delete.rs
  • merk/src/element/get.rs
  • merk/src/element/insert.rs
  • merk/src/element/reconstruct.rs
  • merk/src/element/tree_type.rs
  • merk/src/tree/hash.rs
  • merk/src/tree/kv.rs
  • merk/src/tree/mod.rs
  • merk/src/tree/ops.rs
  • merk/src/tree/walk/mod.rs
  • merk/src/tree_type/costs.rs
  • merk/src/tree_type/mod.rs
  • storage/src/rocksdb_storage/storage.rs

Comment thread docs/book/src/count-indexed-tree.md Outdated
Comment thread docs/book/src/count-indexed-tree.md Outdated
Comment thread grovedb/src/batch/mod.rs
Comment thread grovedb/src/lib.rs Outdated
Comment thread grovedb/src/operations/count_indexed_tree.rs Outdated
Comment thread grovedb/src/operations/get/query.rs
Comment thread grovedb/src/operations/proof/count_indexed.rs
Comment thread grovedb/src/operations/proof/generate.rs
Comment thread grovedb/src/tests/count_indexed_tree_tests.rs Outdated
Comment thread merk/src/tree/hash.rs Outdated
Comment on lines +283 to +310
// CountIndexedTree / ProvableCountIndexedTree own two child Merks
// (primary + secondary). On direct insertion we accept only the
// empty case (both root keys = None, count = 0) because there is
// no two-Merk batch-cascade machinery in this code path; full
// batch / cascading-aggregation support lives in the batch
// propagation work.
Element::CountIndexedTree(primary, secondary, count_value, _)
| Element::ProvableCountIndexedTree(primary, secondary, count_value, _) => {
if primary.is_some() || secondary.is_some() || *count_value != 0 {
return Err(Error::InvalidCodeExecution(
"a CountIndexedTree must be empty at the moment of direct insertion (both \
primary_root_key and secondary_root_key must be None and count = 0); \
non-empty insertion requires batch operations",
))
.wrap_with_cost(cost);
}
cost_return_on_error_into!(
&mut cost,
element.insert_count_indexed_subtree(
&mut subtree_to_insert_into,
key,
NULL_HASH,
NULL_HASH,
Some(options.as_merk_options()),
grove_version,
)
);
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should allow for direct insertion

.to_string(),
))
.wrap_with_cost(cost);
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do this.

Fixes CI lint failure (debugger.rs match arms) and ten CodeRabbit
review items on the CountIndexedTree implementation:

- Doc status banner: "awaiting implementation" → "implemented"
- Doc wording: "collision-free" → "domain-separated" for hash-derived
  prefixes
- verify_grovedb: fail closed (NotSupported) for cidx instead of
  silently skipping; integrity verification needs the H1-A
  three-input combine and dual-Merk traversal which is not yet wired
- V1 prove_subqueries_v1: explicitly reject subqueries into cidx
  with NotSupported instead of silently emitting an unverifiable
  proof; callers must use prove_count_indexed_top_k
- Batch DeleteTree on cidx: reject because the standard delete path
  only cleans up one child Merk and would orphan the secondary
  storage namespace
- Generic batch path: document the cidx overwrite footgun (same
  shape as other tree types when the override-protection flag is
  off)
- count_indexed_count_range: replace full secondary scan with a
  bounded Query::insert_range using big-endian count bytes, falling
  back to insert_range_from when hi_count == u64::MAX
- query_item_value_or_sum reference branch: include cidx variants
  alongside the direct-element branch
- prove_count_indexed_top_k: reject nested cidx on the proven path
  with NotSupported (envelope only carries H1-A attestation data
  for the terminal cidx); verifier naturally fails the chain check
  if a forged envelope smuggles a nested cidx
- combine_hash_three: correct the doc comment to match the cost
  constant; 96 bytes spans two 64-byte Blake3 blocks (the previous
  comment incorrectly conflated blocks with chunks)
- reconcile test: rename to reconcile_after_query_returns_correct_top_k
  to reflect what the test actually verifies (true desync test
  requires unavailable internal APIs)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 10, 2026

Codecov Report

❌ Patch coverage is 89.29178% with 378 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.18%. Comparing base (352c2f5) to head (cdfba1c).

Files with missing lines Patch % Lines
grovedb/src/batch/mod.rs 81.03% 77 Missing ⚠️
grovedb/src/operations/count_indexed_tree.rs 92.75% 74 Missing ⚠️
grovedb/src/lib.rs 86.36% 45 Missing ⚠️
grovedb/src/operations/proof/count_indexed.rs 91.07% 34 Missing ⚠️
grovedb/src/operations/proof/verify.rs 73.94% 31 Missing ⚠️
grovedb/src/batch/count_indexed_tree.rs 91.35% 16 Missing ⚠️
merk/src/element/insert.rs 94.06% 14 Missing ⚠️
grovedb/src/operations/proof/generate.rs 85.71% 9 Missing ⚠️
...vedb/src/batch/estimated_costs/worst_case_costs.rs 0.00% 8 Missing ⚠️
grovedb/src/operations/delete/mod.rs 78.94% 8 Missing ⚠️
... and 15 more
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #657      +/-   ##
===========================================
- Coverage    91.20%   91.18%   -0.02%     
===========================================
  Files          205      208       +3     
  Lines        60156    63606    +3450     
===========================================
+ Hits         54864    57999    +3135     
- Misses        5292     5607     +315     
Components Coverage Δ
grovedb-core 88.72% <87.96%> (-0.06%) ⬇️
merk 92.71% <94.20%> (+0.37%) ⬆️
storage 86.75% <100.00%> (+0.38%) ⬆️
commitment-tree 96.43% <ø> (ø)
mmr 96.76% <ø> (ø)
bulk-append-tree 89.77% <ø> (ø)
element 96.77% <86.15%> (-0.43%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

QuantumExplorer and others added 23 commits May 10, 2026 15:17
The direct (non-batch) insert path previously rejected any
CountIndexedTree element whose primary_root_key, secondary_root_key,
or count_value was non-zero, with an error claiming non-empty
insertion required the batch path (which itself does not yet
support non-empty cidx). This is the migration / restore-from-backup
direct-insertion path.

For non-empty cidx elements, open the existing primary and secondary
Merks at the new path, validate that the caller's declared root keys
match the on-disk state, and read the actual root hashes for the
H1-A combined value hash so the parent's value_hash is consistent
with disk. Mismatched root keys fail loudly.

Also delete docs/spikes/cascading-aggregation-spike.md — internal
and external dev-relevant content for cidx lives in the book chapter
(docs/book/src/count-indexed-tree.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifts patch coverage on the cidx PR by adding focused tests for the
error paths and rejections introduced over the last few commits, plus
two extra cidx behaviors that were not yet exercised:

- direct_insert_rejects_mismatched_secondary_root_key (mismatch on
  secondary key, mirroring the existing primary-key test)
- batch_delete_tree_on_cidx_is_rejected (DeleteTree on cidx via batch
  must error to avoid orphaning secondary storage)
- verify_grovedb_fails_closed_for_cidx (NotSupported instead of
  silent skip)
- prove_count_indexed_top_k_at_root_path_errors
- prove_count_indexed_top_k_on_non_cidx_target_errors
- count_indexed_top_k_on_non_cidx_target_errors
- count_indexed_count_range_on_non_cidx_target_errors
- reconcile_on_non_cidx_target_errors
- delete_from_count_indexed_tree_on_non_cidx_target_errors
- delete_from_count_indexed_tree_returns_false_for_unknown_key
- count_indexed_count_range_descending_returns_descending_order
  (covers the descending bounded-range branch)
- test_v1_proof_rejects_count_indexed_tree_subquery (V1 generic
  prove path rejects cidx subqueries)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifts patch coverage above the codecov 80% threshold by hitting the
0%-covered Display impls, the gated Visualize impls, helper queries
on Element, the count_range / top_k edge cases, and the verifier's
error paths:

- count_indexed_tree_display_renders_fields
- provable_count_indexed_tree_display_renders_fields
- count_indexed_tree_helpers_report_count_and_type
  (is_count_indexed_tree, is_any_tree, element_type, NonCounted look-through)
- test_visualize_count_indexed_tree_empty (visualize feature)
- test_visualize_count_indexed_tree_with_keys (visualize feature)
- test_visualize_provable_count_indexed_tree (visualize feature)
- count_indexed_count_range_with_lo_greater_than_hi_returns_empty
- count_indexed_count_range_with_hi_count_u64_max_uses_range_from
- count_indexed_count_range_respects_limit
- count_indexed_top_k_with_zero_returns_empty
- count_indexed_top_k_at_root_path_errors
- count_indexed_count_range_at_root_path_errors
- verify_count_indexed_top_k_rejects_corrupt_proof_bytes
- verify_count_indexed_top_k_rejects_path_length_mismatch

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
V0 is a frozen on-the-wire proof format. Adding cidx descent to it
would be a wire-format change, so V0 will never learn cidx subqueries.
Reword the V0 prover and verifier comments / error messages to make
that explicit instead of implying the work is pending in a follow-up
PR. The dedicated `prove_count_indexed_top_k` /
`verify_count_indexed_top_k` entry points and the (still TODO) V1
generic path remain the supported routes for cidx queries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
verify_grovedb: replace fail-closed NotSupported with the actual
H1-A integrity walk for cidx nodes. Open both child Merks, read
their root hashes, verify the parent's recorded value_hash equals
combine_hash_three(value_hash(cidx_bytes), primary_root,
secondary_root), then recurse into the primary normally.

While doing this, fix a pre-existing bug in
insert_into_count_indexed_tree: it called Element::insert (Op::Put,
no combine) regardless of element kind. For tree subtree elements
that meant the cidx primary's merk node stored value_hash =
value_hash(serialized) instead of combine_hash(value_hash,
NULL_HASH), breaking the merkle invariant of the cidx primary
until a deep insert later updated it via propagation. Dispatch on
element kind so trees take Element::insert_subtree, nested cidx
takes Element::insert_count_indexed_subtree, references and items
keep the prior path. Now the cidx primary's root hash is correct
immediately after creation, and verify_grovedb can recurse cleanly.

prove_count_indexed_top_k: extend CountIndexedRangeProof with
ancestor_cidx_secondary_root_hashes (Vec<Option<[u8;32]>> aligned
with intermediate layers). When building, capture each cidx
ancestor's secondary root hash. When verifying, chain via
combine_hash_three at cidx ancestor layers, combine_hash elsewhere.
Removes the prior nested-cidx prover-side rejection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Subqueries into CountIndexedTree via the generic V1 PathQuery
pipeline now produce a verifiable proof. The cidx primary is the
descent target; queries against the secondary still go through the
dedicated prove_count_indexed_top_k path.

Wire format:
- New ProofBytes::CountIndexedTree(secondary_root || primary_proof)
  variant. The 32-byte secondary attestation is captured from the
  cidx's secondary Merk root hash at proof-build time; the primary
  proof bytes are a standard Merk proof of the subquery results
  generated by prove_subqueries_v1 against the cidx primary.
- LayerProof and ProofBytes derive Clone so the verifier can
  synthesize a sibling Merk-shaped LayerProof from the cidx-prefixed
  bytes and recurse into the existing verify_layer_proof_v1.

Generate (V1): replace the previous NotSupported with descent that
calls prove_subqueries_v1 on the cidx primary, opens the secondary
to capture its root hash, and re-wraps the resulting Merk proof
bytes with the secondary attestation prefix.

Verify (V1): when a lower_layer's parent element is a cidx, require
ProofBytes::CountIndexedTree, split off the 32-byte secondary
attestation, synthesize a Merk LayerProof for the primary, recurse
to obtain primary_root_hash, then chain via
combine_hash_three(value_hash, primary_root, secondary_root)
instead of combine_hash. Reject any other ProofBytes variant under a
cidx parent and any ProofBytes::CountIndexedTree under a non-cidx
parent.

V0 still rejects (V0 wire format is frozen).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The level-by-level batch propagation has no two-Merk hook for
CountIndexedTree primaries: applying mutation ops directly to the
primary updates the primary's root hash but leaves the secondary
index stale, breaking both the H1-A composition stored in the
parent's cidx element bytes and the count-ordered query semantics.

Reject mutation ops (Insert/Replace/Patch/Delete/RefreshReference)
in execute_ops_on_path when the merk's tree_type is a cidx primary,
with a clear NotSupported message pointing callers to the dedicated
APIs (insert_into_count_indexed_tree /
delete_from_count_indexed_tree). Up-bubbled internal ops
(ReplaceTreeRootKey, InsertTreeWithRootHash, etc.) remain allowed
— those represent a child subtree's response to its own change and
are handled correctly by the existing propagate_changes_with
_transaction_with_initial_deferred path that already mirrors to the
secondary at the cidx element boundary.

Full batch integration of cidx primary mutations would require a
new GroveOp variant carrying both primary and secondary state plus
a refactor of the per-level propagation pass; that is a substantial
piece of work and belongs in its own follow-up. Until then,
fail-closed is preferable to silently corrupting the index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Several module/function comments still claimed cidx features were
"a follow-up" or "not yet wired" after this PR's earlier commits
implemented them. Update wording to reflect current state:

- count_indexed_tree.rs module doc: clarify the dedicated APIs are
  required for direct cidx primary mutations and that the batch
  path fails closed until full batch integration lands; deep ops
  under sub-trees of cidx primaries propagate correctly today.
- count_indexed_top_k doc: drop the "no proofs yet" note and point
  at prove_count_indexed_top_k / verify_count_indexed_top_k.
- count_indexed_tree_tests.rs module doc: drop the PR-2-staging
  banner that claimed item insertion / cascading aggregation were
  unexercised.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI codecov/patch is failing at 79.45% (target 80%). Add focused
tests targeting recently-added paths that were not yet exercised:

- insert_into_count_indexed_tree_with_reference_to_missing_target_errors:
  covers the new reference-resolution path for cidx primary inserts
  when the target does not exist.

- deep_insert_under_nested_cidx_propagates_through_both_levels:
  covers the nested-cidx propagation path end-to-end (deep insert
  three levels under outer cidx -> inner cidx -> sub count tree)
  including the new H1-A walk in verify_grovedb at both cidx levels.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch is at 79.91% (target 80%) — 13 hits short. Add four
focused tests covering paths recently added but not yet exercised:

- delete_from_count_indexed_tree_round_trip_with_proof: end-to-end
  delete + prove + verify.
- verify_count_indexed_top_k_rejects_truncated_proof: covers the
  bincode decode error branch.
- verify_grovedb_walks_provable_count_indexed_tree: same H1-A walk
  on the ProvableCountIndexedTree variant.
- test_v0_proof_rejects_count_indexed_tree_subquery: covers the V0
  prover's cidx subquery rejection arm.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the structural pieces for cidx primary batch-path support. No
user-visible behavior change: the new op variant is never produced
yet (the rejection at execute_ops_on_path:1862 still fires for cidx
primary mutations), but the parent-level handler is in place so
that emitting the op from a future bubble-up hook is mechanical.

- GroveOp::ReplaceCountIndexedTreeRootKeys: new internal op variant.
  Carries both primary and secondary new-state (root_hash + root_key
  + count_aggregate). Marked #[non_exhaustive] like the other
  internal variants. Sort weight 17, debug formatter, all match
  arms in references / preprocessing / format / cost / sort logic
  exhaustively cover it (rejected as 'internal only' from user-
  facing entry points).

- update_count_indexed_tree_item_preserve_flag_into_batch_operations:
  parallels update_tree_item_preserve_flag_into_batch_operations but
  reconstructs via reconstruct_with_two_root_keys (cidx) and emits
  Op::ReplaceLayeredCountIndexedReference (combine_hash_three /
  H1-A) instead of Op::ReplaceLayeredReference. Preserves flags.

- Parent-level handler: when execute_ops_on_path sees the new op at
  a parent merk, it calls the helper above to recompute the cidx
  element's value_hash via H1-A.

Subsequent commits will: (a) wire a get_secondary_merk_fn closure
through TreeCacheMerkByPath, (b) detect cidx primaries in
execute_ops_on_path and mirror item-level mutations to the
secondary, (c) modify the bubble-up to emit the new op variant
when the just-finished level was a cidx primary. Tests for the
end-to-end behavior land alongside (c).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifts the level-by-level batch path from rejecting cidx primary
mutations to supporting them end-to-end. A batch op that inserts /
replaces / patches / deletes / refreshes items inside a cidx
primary now correctly mirrors to the secondary index and updates
the cidx element on the parent merk via H1-A composition.

Implementation:

1. TreeCacheMerkByPath gained a get_secondary_merk_fn closure (opens
   the cidx secondary by primary path, looks up secondary_root_key
   from the parent merk's cidx element internally) and a side-channel
   cidx_secondary_after_apply: HashMap<Vec<Vec<u8>>, ...> populated
   by execute_ops_on_path when the level was a cidx primary.

2. execute_ops_on_path: when in_tree_type is cidx primary, captures
   pre-state (per-key old count_value via merk.get) before the apply
   pass. After apply_with_specialized_costs returns it re-reads each
   key's post-apply element, opens the secondary, runs
   mirror_to_secondary_for_batch (new helper handling all four
   insert/update/delete/no-op cases), and stores secondary's state
   in the side-channel.

3. Bubble-up: pulls the cidx state via the new
   take_cidx_secondary_after_apply trait method. When present,
   emits GroveOp::ReplaceCountIndexedTreeRootKeys instead of
   ReplaceTreeRootKey at the parent level (covers all four bubble-up
   paths: Vacant, Occupied, missing parent map, missing level-above).

4. Parent execute_ops_on_path: handles the new op via
   update_count_indexed_tree_item_preserve_flag_into_batch_operations
   which reconstructs with new root keys + count and emits
   Op::ReplaceLayeredCountIndexedReference for combine_hash_three.

5. open_count_indexed_secondary_for_batch helper on GroveDb:
   convenience wrapper used by the closure that does the parent
   merk lookup + secondary open in one call.

batch_insert_into_cidx_primary_works test verifies end-to-end.
verify_grovedb walks the H1-A chain and finds no issues afterward.

Still TODO (separate follow-up): DeleteTree on cidx primary, cidx
overwrite via Replace, comprehensive atomicity tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
prove_count_indexed_top_k was a special case (full-range, ascending
or descending). Lift it to a thin wrapper around a general
prove_count_indexed_query that takes any MerkQuery over the cidx
secondary's keyspace (keys are count_value_be ‖ original_key, so
callers can express count == X, count in [lo, hi], count >= X,
count == X AND original_key starts with Y, etc. by building the
query in those bytes).

Refactored the inner build_count_indexed_proof to take
(secondary_query, limit) instead of (k, descending); the user-
supplied query.left_to_right is echoed in the envelope's
`descending` field for the existing top-k convenience field, and
limit's None gets stored as 0 (verifier treats 0 as no-limit).

Symmetric verifier change: split verify_count_indexed_top_k into a
thin wrapper + verify_count_indexed_inner generic core, and added
verify_count_indexed_query taking the same MerkQuery the prover used
(positional binding requires identical query at both ends).

Test prove_count_indexed_query_with_count_range covers a non-trivial
case: a cidx with five items at counts {1,2,3,5,8}, query
[3, 6) inclusive of 3 and 5, exclusive of 8. Verifier returns
exactly (3, c), (5, d).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes a real corruption gap: when
validate_insertion_does_not_override_tree was off, a batch
InsertOrReplace / Replace / Patch could silently overwrite an
existing cidx element. The merk node value would change, but the
cidx primary's storage namespace + the secondary's storage
namespace (Blake3(primary_prefix || 0x01)) would be left behind.
Future inserts under the new cidx's primary_root_key could then
collide with the orphaned data, and the secondary index on the
old data would be unreachable.

When the override-protection flag is on (typical case), the
existing rejection of "attempting to overwrite a tree" already
catches cidx since is_any_tree() returns true. When the flag is
off, however, the path silently corrupts.

Add an unconditional cidx-specific check that fires for
InsertOrReplace / Replace / Patch ops on non-reference elements
when the override flag is off: read the existing element at the
key once, and if it decodes to CountIndexedTree /
ProvableCountIndexedTree, reject with NotSupported pointing at the
delete_from_count_indexed_tree / delete_up_tree workflow. Other
tree-type overwrites remain permitted under the existing
opt-out semantics for backwards compatibility — this stricter
treatment is specific to cidx because cidx owns two storage
namespaces and the corruption is qualitatively worse.

Updates one cost test (+1 seek, +129 storage_loaded_bytes) where
the new check fires. The new test
batch_overwrite_existing_cidx_with_item_is_rejected verifies the
guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch is at 79.75% (target 80%) — 7 hits short. Add two
focused tests covering the new batch cidx code paths:

- batch_delete_item_from_cidx_primary_works: covers the Delete arm
  of mirror_to_secondary_for_batch (new_count = None) and the
  pre-state capture for Delete ops.
- batch_multiple_inserts_into_cidx_primary_in_one_call: covers the
  multi-key pre-state capture loop and the per-key mirror loop in
  execute_ops_on_path on a cidx primary path.

Both run verify_grovedb afterward to walk the H1-A chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes a real corruption gap: db.delete() of a CountIndexedTree
element walked the primary's storage namespace via find_subtrees
+ storage.clear() but left the secondary's storage namespace
(Blake3(primary_prefix || 0x01)) untouched. After the cidx element
was removed from the parent merk, the secondary's data became
unreachable but stayed on disk; if the user later re-created a
cidx at the same path, queries against the secondary could observe
stale entries from the previous incarnation.

Add a cidx-specific cleanup branch in
delete_internal_on_transaction (the standard tree-delete code
path). When the deleted element's tree_type is a cidx primary,
derive the secondary prefix via the existing
RocksDbStorage::secondary_prefix_for helper, open storage at that
prefix, and call .clear(). Runs unconditionally (not gated on
is_empty) so empty-cidx deletes also clear the secondary's root
metadata for consistency.

Two new tests verify the cleanup end-to-end via the re-create-
and-query pattern: if the secondary wasn't cleaned, the new cidx's
top-k query would return stale entries.

- direct_delete_empty_cidx_cleans_up_secondary_storage
- direct_delete_non_empty_cidx_cleans_up_both_namespaces

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifts the rejection of DeleteTree on CountIndexedTree /
ProvableCountIndexedTree in the batch path. Previously batch users
were forced to fall back to db.delete() outside the batch — fine
for single-cidx workflows but breaks atomicity when DeleteTree is
mixed with other batch ops.

Implementation parallels the H1-A delete fix in commit 6b7ec21
(direct path): the existing tree-delete cleanup pipeline collects
deleted Merk paths into `merk_delete_paths` and runs find_subtrees
+ storage.clear() on each post-apply. Since find_subtrees only
walks primary keys, the cidx secondary storage namespace at
Blake3(primary_prefix ‖ 0x01) was orphaned. Add a parallel
cidx_primary_delete_paths collector that captures cidx primary
DeleteTree ops at validation time, then runs an explicit
secondary-prefix .clear() in the post-apply pass alongside the
primary cleanup. Done in both apply_batch_with_element_flags_update
and apply_partial_batch (the partial-batch variant).

Two new tests use the re-create-and-query pattern to verify the
cleanup:
- batch_delete_tree_on_empty_cidx_works
- batch_delete_tree_on_non_empty_cidx_works

Both query the new cidx's secondary index after re-creation; if the
old secondary weren't cleaned the queries would return stale
entries.

Cidx overwrite via batch (Replace cidx → cidx / non-cidx) remains
rejected. The semantics of replacing an existing cidx element
where the new element references on-disk data are ambiguous and
the safe subset will land separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Now that batch DeleteTree on cidx works (commit 0688731), the
recommended workaround for overwriting an existing cidx is:
1. delete_from_count_indexed_tree to empty it
2. DeleteTree via batch (now supported)
3. Re-create in a follow-up batch

Update the rejection error message to point at this clean
workaround instead of the older "delete_up_tree outside of a batch"
guidance.

The full safe subset of cidx overwrites (cidx → non-cidx,
cidx → empty cidx) requires moving cidx-overwrite detection into
the pre-apply scan alongside the DeleteTree discovery loop, plus
careful sequencing of post-apply cleanup vs. new-element write.
That is left for a follow-up; the workaround above is clean today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch is at 79.52% (target 80%) — 14 hits short. Add two
focused tests covering newly-added batch paths not yet exercised:

- batch_overwrite_cidx_rejected_with_override_protection_on:
  covers the validate_insertion_does_not_override_tree=true branch
  hitting cidx (existing-element-is-tree path).
- batch_delete_tree_on_cidx_then_recreate_in_separate_batch_works:
  covers the recommended cidx-overwrite workaround end-to-end —
  DeleteTree the cidx in batch 1, re-create empty in batch 2,
  populate in batch 3 — and verifies via verify_grovedb that the
  H1-A chain is consistent throughout.

The recreate test highlights an important sequencing detail: a
cidx and ops INSIDE the cidx primary cannot share a single batch
because deeper-path ops execute before the cidx itself exists.
This is documented in the test's structure (3 separate batches).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch is at 79.52% (target 80%). Earlier tests exercised
the apply_batch cidx-cleanup path but the parallel cleanup pass in
apply_partial_batch and the DontCheckWithNoCleanup branch were
untested. Add two focused tests:

- apply_partial_batch_with_delete_tree_on_cidx_cleans_up_secondary:
  routes through apply_partial_batch and verifies the secondary
  cleanup ran via the re-create-and-query pattern.
- batch_delete_tree_on_cidx_dont_check_with_no_cleanup_still_clears
  _secondary: covers the DontCheckWithNoCleanup branch which skips
  primary find_subtrees but must still clear the cidx secondary
  prefix (a different namespace not covered by find_subtrees).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two parallel polish items:

DOCS — refresh the book chapter to reflect what shipped.

The chapter was design-spec style (Status: implemented, but conceptual
APIs that don't match the actual code). Update the API code blocks to
the shipped function signatures (count_indexed_top_k,
count_indexed_count_range, prove_count_indexed_top_k, the new
prove_count_indexed_query taking arbitrary MerkQuery), replace the
hypothetical CountIndexedQuery struct with the two-route subquery
description (V1 generic PathQuery + dedicated cidx proof), add a new
"Batch path semantics" section documenting supported / rejected ops
plus the cidx-overwrite workaround, and update the
Implementation-detail items table from "Recommended default" to
"Resolution" reflecting what landed (W1: specialized propagation
through propagate_changes_with_transaction_with_initial_deferred +
GroveOp::ReplaceCountIndexedTreeRootKeys at the bubble-up).

ATOMICITY — five new stress tests for batches mixing cidx + non-cidx.

GroveDB batches are atomic by design (validation runs over the full
op list before any writes hit storage). These tests verify the cidx-
aware paths preserve that invariant under mixed workloads:

- batch_mixed_cidx_and_non_cidx_ops_apply_atomically
- batch_failure_in_non_cidx_op_rolls_back_cidx_mutations
- batch_with_multiple_cidx_primaries_each_get_updated
- batch_cidx_delete_with_concurrent_cidx_inserts_atomic
- batch_failure_after_cidx_delete_tree_rolls_back

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real audit finding: the cidx primary pre-state capture in
execute_ops_on_path lists every mutating op variant EXCEPT the new
GroveOp::ReplaceCountIndexedTreeRootKeys variant introduced for
the cidx-aware bubble-up. When a NESTED cidx primary bubbles up
to its OUTER cidx primary via the batch path:

  Level N (inner cidx primary): mutates fire, secondary mirrored,
    bubble emits ReplaceCountIndexedTreeRootKeys to level N-1.
  Level N-1 (outer cidx primary): receives the op at key=inner_key;
    handler `update_count_indexed_tree_item_preserve_flag_into_
    batch_operations` correctly updates the inner_key element's
    bytes (new primary_root_key, secondary_root_key, count_value).
  But pre-state capture skipped this op type, so post-apply mirror
    walked an empty deltas list. Outer's secondary was not updated.

The corruption was silent: H1-A integrity (verify_grovedb) still
passed because the outer's stored value_hash is recomputed from
the actual on-disk secondary root hash — the secondary just has
stale content. Top-k / count-range queries on the outer returned
stale counts.

Fix: add the variant to the mutates match. With the fix, the outer's
secondary entry for inner_key correctly moves from
(old_count_be ‖ inner_key) to (new_count_be ‖ inner_key) when the
inner's count changes.

Test batch_insert_into_nested_cidx_primary_bubbles_count_up_outer_
secondary fails BEFORE the fix (asserts top[0] == (1, b"inner_cidx")
but gets (0, b"inner_cidx")) and passes AFTER. Found via audit
of the new code paths — there was no batch-path nested-cidx test
before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lock down the ReplaceCountIndexedTreeRootKeys-mutates fix from the
prior commit with three additional nesting tests:

- direct_insert_into_nested_cidx_primary_bubbles_count_up_outer_
  secondary
- batch_insert_into_triple_nested_cidx_propagates_through_all_levels
- batch_insert_through_cidx_then_regular_tree_then_cidx (cidx →
  regular CountTree → cidx mixed nesting)

All 1566 grovedb tests pass; release-mode build also passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
QuantumExplorer and others added 14 commits May 12, 2026 00:32
Addresses CodeRabbit review comments on PR #657.

1) delete/mod.rs (outside-diff Major): the cidx secondary cleanup at
the end of `delete_internal`'s tree branch lived inside the
`if !is_empty { ... }` block. When deleting an EMPTY cidx primary
that had a drifted secondary (e.g. an orphan injected by a bug that
mirrored deletes into the primary but failed to mirror into the
secondary), the secondary namespace was left untouched on delete,
leaving stale entries that could collide with a later cidx
recreation at the same path.

Hoist the explicit `if tree_type.is_count_indexed_primary()` clear
out of the `if !is_empty` gate so it runs unconditionally on every
cidx primary delete. Remove the now-redundant copy that lived
inside the non-empty branch; the per-prefix cleanup inside the
`find_subtrees` loop still handles nested cidx primaries and the
hoisted block re-handles the target (both clears are idempotent on
empty namespaces — intentional defense-in-depth, documented in the
comment).

Regression test
`direct_delete_empty_cidx_with_drifted_secondary_clears_namespace`
injects an orphan into the secondary of an empty cidx, deletes the
cidx, and verifies the secondary namespace is empty via a raw
RocksDB scan over the S2-B prefix. Verified to fail on the
pre-fix code path.

2) cidx_benchmark.rs (Major nit + several nitpicks): replace bare
`.unwrap()` on TempDir/GroveDb/insert and inner measurement calls
with `.expect("...")` carrying a descriptive context message. A
panic in the bench harness now surfaces which operation failed
rather than just a backtrace-only "called Option::unwrap on a None".
Applied consistently across populate_cidx, populate_plain_count_tree,
bench_cidx_top_k, bench_insert_into_cidx, and
bench_insert_into_plain_count_tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Raises count_indexed_tree.rs coverage 86.1% → 92.2% and
proof/count_indexed.rs coverage 84.1% → 87.9% (locally measured via
cargo-llvm-cov on the grovedb crate's test suite). Targets ~80 newly
covered lines on PR-introduced code, which directly improves the
codecov patch-coverage metric for #657.

New tests (all 12 passing; full cidx suite 147 tests green):

direct_insert path:
- direct_insert_into_cidx_overwrites_nested_cidx_entry_and_cleans_secondary
  Covers count_indexed_tree.rs:407-428 (existing_is_cidx overwrite-cleanup
  branch).

direct_delete path:
- direct_delete_from_cidx_removes_nested_cidx_entry_and_cleans_secondary
  Covers count_indexed_tree.rs:1412-1438 (deleted_was_cidx_primary branch
  in delete_from_count_indexed_tree).

reconcile loops:
- reconcile_repairs_missing_secondary_entry_via_insert_loop
  Covers the insert loop at count_indexed_tree.rs:927-937 by deleting a
  real mirror first.
- reconcile_removes_orphan_secondary_entry_via_delete_loop
  Covers the delete loop at count_indexed_tree.rs:909-919 by injecting
  an orphan.
- reconcile_errors_on_undecodable_element_bytes_in_primary
  Covers the raw_decode error path (842-845) by writing garbage bytes
  directly to the primary's storage namespace via StorageContext::put.

secondary key drift / query error paths:
- count_indexed_top_k_errors_on_short_secondary_key_drift
  Covers count_indexed_tree.rs:1061-1065, 1750.
- count_indexed_count_range_errors_on_short_secondary_key_drift
  Covers count_indexed_tree.rs:1160-1164 plus the unbounded-upper
  branch (lo=0, hi=u64::MAX) of the range builder.
- count_indexed_count_range_returns_empty_when_lo_greater_than_hi
  Covers the lo>hi early-return at 1096-1098.

proof verifier error paths:
- verify_count_indexed_top_k_rejects_proof_with_short_secondary_key_drift
  Covers proof/count_indexed.rs:540-545 (verifier rejects < 8-byte
  proved key).
- verify_count_indexed_top_k_rejects_tampered_primary_root_hash
  Covers H1-A cidx-layer chain mismatch at 566-572 (envelope tampering).
- verify_count_indexed_top_k_rejects_tampered_intermediate_layer_proof
  Covers shallower-layer chain mismatch at 600-613.
- verify_count_indexed_top_k_rejects_tampered_secondary_root_hash_via_query
  Covers ancestor_cidx_secondary_root_hashes length-mismatch at 580-585
  (via a nested-cidx layout so last_idx > 0).

Remaining uncovered lines are dominated by defensive error closures
(.map_err(|e| Error::CorruptedData(format!(...))) bodies that only fire
on storage-level failures) and unreachable!() / CorruptedCodeExecution
guards that require contradictory state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch coverage was 83.75% on the previous head (`15260967`).
develop's commit #660 raised the patch-coverage floor from 80% to 90%,
so the codecov/patch check fails. This commit adds 7 tests targeting
the two biggest under-covered patch areas:

1. Direct cidx insertion non-empty path (insert/mod.rs:314-410). The
   migration / restore-from-backup path for inserting a CountIndexedTree
   element directly via db.insert(...) with concrete primary/secondary
   root_keys was at 43% patch coverage. Five new tests:
   - direct_insert_non_empty_cidx_with_matching_roots_succeeds
   - direct_insert_partial_cidx_with_one_root_none_rejected
     (covers (Some, None), (None, Some), (None, None, count>0))
   - direct_insert_cidx_with_mismatched_primary_root_key_rejected
   - direct_insert_cidx_with_mismatched_secondary_root_key_rejected
   - direct_insert_provable_count_indexed_tree_with_matching_roots_succeeds
     (covers the ProvableCountIndexedTree arm of the same pattern)
   Coverage of insert/mod.rs jumped 93.0% -> 96.2% locally.

2. V1 proof verifier cidx-error branches (proof/verify.rs:540-602).
   Two new tampering tests build a valid V1 cidx-subquery proof,
   decode the GroveDBProof envelope, mutate the cidx sublayer, and
   re-encode:
   - v1_verify_rejects_cidx_subquery_proof_with_non_cidx_lower_layer_bytes
     Covers 547-553 (lower_layer.merk_proof must be
     ProofBytes::CountIndexedTree).
   - v1_verify_rejects_cidx_subquery_proof_with_short_cidx_bytes
     Covers 555-561 (cidx_bytes must be >= 32 bytes for the
     secondary_root attestation prefix).

Full cidx suite: 154/154 passing. Full grovedb library: 1665/1665.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch coverage was 85.41% on a0185f4. Develop's #660 requires
90%. This commit adds 4 more tests targeting:

1. proof/count_indexed.rs verifier branches (472-494, 534-537):
   - verify_count_indexed_query_rejects_wrong_expected_descending
     (the _query variant of the direction-mismatch reject; the _top_k
     variant is already covered).
   - verify_count_indexed_top_k_rejects_proof_with_layer_count_mismatch
     (env.layer_proofs.len() != path.len()).
   - verify_count_indexed_top_k_rejects_proof_with_corrupted_secondary_proof
     (envelope's secondary_proof bytes replaced with garbage so
     execute_proof errors).
   proof/count_indexed.rs locally moved 89.6% -> 90.2%.

2. lib.rs cidx cascading aggregation propagation path
   (lib.rs:840-998):
   - deep_insert_under_triple_nested_cidx_propagates_all_levels
     A 3-level cidx layout (outer/middle/inner cidx) with a leaf
     CountTree at the bottom; a single item insert bubbles count
     updates through three cidx secondaries.

Full cidx suite: 158/158 passing. Full grovedb lib: 1669/1669.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch coverage was 85.62% on 3b9ffbf. Targeting the remaining
~143 lines needed for 90% threshold:

V1 proof terminal cidx path (proof/verify.rs is_empty_cidx block):
- v1_proof_round_trips_for_empty_cidx_terminal_query
  Queries an empty CountIndexedTree as a terminal (no subquery).
  Forces the combine_hash_three(H(value), NULL_HASH, NULL_HASH)
  check used by V1 for empty-cidx parent elements.
- v1_proof_round_trips_for_provable_empty_cidx_terminal_query
  Same shape for ProvableCountIndexedTree.
- v1_proof_query_with_limit_terminates_early_at_cidx_subquery
  limit_left == Some(0) break inside the cidx subquery handler
  (proof/verify.rs around 521 and 604).

count_indexed_tree.rs query paths:
- count_indexed_top_k_descending_returns_largest_counts_first
  Exercises the descending top-k scan path; populates a cidx with
  varied count_values and checks ordering.
- count_indexed_count_range_filters_to_inclusive_band
  Concrete (lo, hi) bounds (not the lo=0, hi=u64::MAX case), exercises
  the Some(upper_bytes) branch of the range builder.
- count_indexed_count_range_with_limit_cuts_short
  Limit-respecting path in the range scan.
- cidx_top_k_with_k_larger_than_entries_returns_all
  Iterator-exhausted termination branch.

Batch wrapper path:
- batch_insert_non_counted_wrapped_into_count_indexed_tree
  Inserts a NonCounted-wrapped CountTree into a cidx primary via
  apply_batch. Exercises the wrapper-element handling in the cidx
  propagation code (batch/mod.rs around 2750-2807).

Full cidx suite: 166/166 passing. Full grovedb lib: 1677/1677.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch coverage was 85.99% on f647299. Targeting batch/mod.rs
cidx propagation paths (largest remaining gap at 119 patch-uncovered
lines).

- batch_cidx_safe_subset_overwrite_with_write_under_cidx_rejected
  Covers batch/mod.rs:2218-2226 — the cidx-overwrite + descendant-write
  consistency check that scans ops_by_qualified_paths for any path
  under a cidx being safe-subset-overwritten.

- batch_insert_if_not_exists_for_existing_cidx_errors
  Covers batch/mod.rs:2370-2377 — InsertIfNotExists for a cidx at a
  key where one already exists, with validate_insertion_does_not_override
  enabled.

- batch_insert_multiple_items_into_same_cidx_primary_propagates_correctly
  Covers batch/mod.rs:3252-3267 (ReplaceTreeRootKey -> ReplaceCountIndexed-
  TreeRootKeys upgrade path) and 3539-3545 (fresh ReplaceCountIndexed-
  TreeRootKeys op creation). Multiple insert_or_replace ops under the
  same cidx primary force iterative propagation visits.

- apply_partial_batch_with_cidx_mirror_secondary_open
  Covers batch/mod.rs:4902-4912 and 4987-4997 — closure invocations
  that open the cidx secondary by primary_path during partial-batch
  processing (both the initial apply_body and the
  continue_partial_apply_body phase via add-on ops).

Full cidx suite: 170/170 passing. Full grovedb lib: 1681/1681.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch was 87.00% on 737ae07 (423 missing). Target: 90%.

- verify_grovedb_flags_short_secondary_key_with_sentinel
  Covers lib.rs:1422-1427 — verify_grovedb's cidx walk emits a
  __cidx_secondary_malformed_key__ sentinel for any secondary key
  shorter than the 8-byte count prefix.

- batch_apply_two_inserts_into_cidx_propagation_visits_cidx_level
  Exercises the batch propagation visitor coalescing ops at the cidx
  primary level when two distinct inserts share that level. Hits the
  upgrade-existing-op branch in batch/mod.rs:3252-3267.

- batch_apply_with_multiple_inserts_descending_count
  Same shape with 4 inserts to keep the propagation queue cycling
  through the cidx level.

Full cidx suite: 173/173. Full grovedb lib: 1684/1684.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch was 87.16% on 3a9c8fb (still under 90%). Target the
remaining V1 proof verifier error branches:

- v1_verify_rejects_cidx_subquery_proof_with_tampered_secondary_root
  Covers proof/verify.rs:593-602 — V1 cidx layer hash mismatch.
  Tampers the secondary_root attestation prefix in the cidx_bytes so
  the recomputed combined hash diverges from the parent's recorded
  value_hash.

- v1_verify_rejects_cidx_proof_bytes_under_non_cidx_parent
  Covers proof/verify.rs:726-731 — when a V1 envelope places a
  ProofBytes::CountIndexedTree(_) layer under a parent element that
  is NOT a cidx, the verifier rejects with the "non-cidx parent
  element" error. Tampered envelope swaps a Merk proof into a
  CountIndexedTree shape (with a 32-byte zero prefix so the
  short-bytes check passes first).

Full cidx suite: 175/175. Full grovedb lib: 1686/1686.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aths

Codecov patch was 87.62% on 28af6db (403 missing). Adding broader-
coverage tests:

- prove_count_indexed_top_k_for_provable_cidx_round_trips
  Covers proof/count_indexed.rs:380 (ProvableCountIndexedTree arm of
  read_count_indexed_secondary_root_key_for_proof).

- verify_count_indexed_top_k_with_layer_proof_replaced_by_garbage
  Covers execute_single_key_proof's error path
  (proof/count_indexed.rs:638-642) — a layer proof replaced with bytes
  that can't be decoded as a valid Merk proof.

- cidx_batch_pipeline_exercises_propagation_and_query
  6-op batch + top-k + prove + verify round-trip; exercises many batch
  cidx propagation paths in one test.

- cidx_batch_with_provable_cidx_propagation_round_trip
  Provable cidx variant of the batch pipeline; exercises Provable arms
  in batch/mod.rs propagation logic.

- cidx_count_range_with_intermediate_count_filter
  count_indexed_count_range with concrete (lo, hi) bounds that filter
  out entries on both ends.

Full cidx suite: 180/180. Full grovedb lib: 1691/1691.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous codecov was 87.62% (28af6db). Target merk/src/element/insert.rs
(was 79.16%, 25 patch-uncovered lines).

- insert_count_indexed_subtree_rejects_non_cidx_element
  Covers L685-694 — the function must reject any element whose
  underlying type is not CountIndexedTree / ProvableCountIndexedTree.
  Tested with both Element::new_item and Element::empty_tree.

- insert_count_indexed_subtree_rejects_non_counted_wrapper_into_normal_tree
  Covers L696-700 — non-counted wrapper elements may only be inserted
  into count-bearing trees; NonCounted-wrapped cidx into a NormalTree
  merk must be rejected with InvalidInputError.

- insert_count_indexed_subtree_into_batch_operations_rejects_non_cidx_element
  Covers L768-777 — the batch-operations variant has the same
  non-cidx reject path. Also verifies the batch_ops vec is not
  modified on rejection.

- insert_count_indexed_subtree_into_batch_operations_replace_variant_for_cidx
  Covers L790-799 — the is_replace=true branch which queues an
  Op::ReplaceLayeredCountIndexedReference (vs the Put variant at
  802-811 which was already covered).

Full merk lib: 429/429. Full grovedb lib: 1691/1691.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous codecov was 87.80% (ac19b2b). Target the remaining merk
patch-uncovered chunks:

merk/src/element/tree_type.rs (was 50% / 11 missing):
- tree_type_extensions_cover_count_indexed_tree_arms
  Exercises every ElementTreeTypeExtensions method on a
  CountIndexedTree: tree_type, maybe_tree_type, root_key_and_tree_type,
  root_key_and_tree_type_owned, tree_flags_and_type, tree_feature_type.
- tree_type_extensions_cover_provable_count_indexed_tree_arms
  Mirror for ProvableCountIndexedTree, exercising tree_feature_type's
  ProvableCountedMerkNode arm.
- tree_type_extensions_look_through_non_counted_wrapping_cidx
  Verifies wrapper look-through for cidx via NonCounted.

merk/src/tree/ops.rs (was 72.34% / 13 missing):
- op_debug_formatters_cover_put_layered_count_indexed_reference
  Covers L107-112 (fmt::Debug for Op::PutLayeredCountIndexedReference).
- op_debug_formatters_cover_replace_layered_count_indexed_reference
  Covers L113-122 (fmt::Debug for Op::ReplaceLayeredCountIndexedReference).

Full merk lib: 434/434. Full grovedb lib: 1691/1691.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… entry

not mirroring to secondary

While writing coverage tests I noticed test ordering flakiness with
batch+cidx interactions. Pinned down by writing a focused repro and
running it 8 times under parallelism: ~60% of parallel runs fail with
top_k returning [(77, "new"), (42, "old")] — the deleted "old" entry
still appears in the secondary mirror.

Verified:
- The primary delete DOES succeed (db.get returns PathKeyNotFound).
- The bug is in the secondary mirror update path: when a regular
  batch QualifiedGroveDbOp::delete_op (NOT delete_from_count_indexed_tree)
  removes a cidx primary entry, the secondary mirror entry at
  (count_be ‖ key) is sometimes left behind.

Repros (both marked #[ignore] for now, run with --ignored):
- repro_batch_delete_op_on_cidx_primary_entry_should_mirror_to_secondary
  Fails ~60% of runs under --test-threads=4.
- repro_batch_insert_or_replace_overwriting_cidx_entry_should_drop_old_mirror
  Passes 8/8; insert_or_replace path is reliable.

Order-dependence of the failure suggests the bug is in how the batch
propagation pipeline orders cidx primary delete + insert ops within
the same batch.

Full lib test suite (excluding ignored): 1691/1691 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…branch

Codecov on 1d90d2b reached 89.30% (368 missing) — just 0.7pp from
the 90% target. Target the verify.rs:509-538 chunk (cidx subquery
with add_parent_tree_on_subquery=true).

- v1_proof_subquery_into_cidx_with_add_parent_tree_returns_cidx_and_children
  Covers proof/verify.rs:524-538 — the
  should_add_parent_tree_at_path branch inside the cidx subquery
  descent. Query has add_parent_tree_on_subquery: true so verifier
  emits the cidx element + its children.

- v1_proof_subquery_into_cidx_with_limit_one_terminates_early
  Covers proof/verify.rs:518-523 — limit decrement + break at zero
  with add_parent_tree path active.

Full grovedb lib: 1693/1693 (3 ignored = repro bugs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
grovedb/src/operations/proof/count_indexed.rs (1)

232-233: ⚡ Quick win

Wrap Merk failures with operation-specific context.

These new .map_err(Error::MerkError) sites drop which proof-building step failed, which makes cidx proof generation failures much harder to diagnose and doesn't follow the repo's Rust error-wrapping rule.

Suggested pattern
- Element::get(&parent_merk, key.as_slice(), true, grove_version)
-     .map_err(Error::MerkError)
+ Element::get(&parent_merk, key.as_slice(), true, grove_version).map_err(|e| {
+     Error::CorruptedData(format!("reading intermediate cidx element for proof: {}", e))
+ })

Apply the same pattern to the other Merk fetch / root-hash / prove calls in this file.

As per coding guidelines, "Wrap errors with context using .map_err(|e| Error::CorruptedData(format!("context: {}", e))) pattern in Rust source files."

Also applies to: 261-263, 276-277, 302-304, 337-338, 375-376

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grovedb/src/operations/proof/count_indexed.rs` around lines 232 - 233, The
Element::get map_err usage in count_indexed proof building drops step-specific
context; replace instances like Element::get(&parent_merk, key.as_slice(), true,
grove_version).map_err(Error::MerkError) with a context-wrapping closure such as
.map_err(|e| Error::CorruptedData(format!("failed to fetch element for cidx
proof (parent/step X): {}", e))) and apply the same pattern to all other Merk
fetch/root-hash/prove calls in this file (the other occurrences flagged around
lines 261-263, 276-277, 302-304, 337-338, 375-376) so each error includes which
proof-building step (e.g., parent fetch, child fetch, root hash, prove) failed
and the underlying Merk error.
merk/src/element/insert.rs (1)

731-747: ⚡ Quick win

Add cidx-specific context to the corruption wrapper.

Error::CorruptedData(e.to_string()) loses the operation context here, which makes cidx insert failures indistinguishable from the other insert paths during corruption triage. Wrap the error with a message that names insert_count_indexed_subtree.

As per coding guidelines, "Wrap errors with context using .map_err(|e| Error::CorruptedData(format!("context: {}", e))) pattern in Rust source files".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@merk/src/element/insert.rs` around lines 731 - 747, The corruption error from
the call to merk.apply_with_specialized_costs is currently mapped with
Error::CorruptedData(e.to_string()) which loses context; change the final
.map_err to wrap the error with a descriptive message mentioning
insert_count_indexed_subtree (e.g. .map_err(|e|
Error::CorruptedData(format!("insert_count_indexed_subtree: {}", e)))) so
failures from merk.apply_with_specialized_costs (the closure using
Self::specialized_costs_for_key_value and
Element::value_defined_cost_for_serialized_value) include the cidx-specific
context.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@merk/src/element/insert.rs`:
- Around line 768-813: The code only checks element kind but not the NonCounted
wrapper; update insert_count_indexed_subtree_into_batch_operations to mirror
insert_count_indexed_subtree() by rejecting a NonCounted(CountIndexedTree)
paired with a non-count-bearing feature_type: inspect self.underlying() for the
NonCounted(CountIndexedTree(..)) variant and if found, check feature_type's
countedness (the same predicate used in insert_count_indexed_subtree()); if the
feature_type is not count-bearing return Err(Error::InvalidInputError(...)). Do
this prior to serializing and before pushing the Op entry so the invalid
combination is never added to batch_operations.

---

Nitpick comments:
In `@grovedb/src/operations/proof/count_indexed.rs`:
- Around line 232-233: The Element::get map_err usage in count_indexed proof
building drops step-specific context; replace instances like
Element::get(&parent_merk, key.as_slice(), true,
grove_version).map_err(Error::MerkError) with a context-wrapping closure such as
.map_err(|e| Error::CorruptedData(format!("failed to fetch element for cidx
proof (parent/step X): {}", e))) and apply the same pattern to all other Merk
fetch/root-hash/prove calls in this file (the other occurrences flagged around
lines 261-263, 276-277, 302-304, 337-338, 375-376) so each error includes which
proof-building step (e.g., parent fetch, child fetch, root hash, prove) failed
and the underlying Merk error.

In `@merk/src/element/insert.rs`:
- Around line 731-747: The corruption error from the call to
merk.apply_with_specialized_costs is currently mapped with
Error::CorruptedData(e.to_string()) which loses context; change the final
.map_err to wrap the error with a descriptive message mentioning
insert_count_indexed_subtree (e.g. .map_err(|e|
Error::CorruptedData(format!("insert_count_indexed_subtree: {}", e)))) so
failures from merk.apply_with_specialized_costs (the closure using
Self::specialized_costs_for_key_value and
Element::value_defined_cost_for_serialized_value) include the cidx-specific
context.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 643ffbad-6822-4474-bb98-05c35f006a76

📥 Commits

Reviewing files that changed from the base of the PR and between f1ee83e and dc765ed.

📒 Files selected for processing (10)
  • grovedb/benches/cidx_benchmark.rs
  • grovedb/src/lib.rs
  • grovedb/src/operations/count_indexed_tree.rs
  • grovedb/src/operations/delete/mod.rs
  • grovedb/src/operations/get/query.rs
  • grovedb/src/operations/proof/count_indexed.rs
  • grovedb/src/tests/count_indexed_tree_tests.rs
  • merk/src/element/insert.rs
  • merk/src/element/tree_type.rs
  • merk/src/tree/ops.rs
🚧 Files skipped from review as they are similar to previous changes (7)
  • merk/src/tree/ops.rs
  • grovedb/src/operations/delete/mod.rs
  • grovedb/src/operations/get/query.rs
  • grovedb/benches/cidx_benchmark.rs
  • merk/src/element/tree_type.rs
  • grovedb/src/lib.rs
  • grovedb/src/operations/count_indexed_tree.rs

Comment thread merk/src/element/insert.rs
QuantumExplorer and others added 6 commits May 12, 2026 05:36
…ntext

Addresses CodeRabbit review on dc765ed.

1. Major — Mirror the non-counted destination guard in
   insert_count_indexed_subtree_into_batch_operations. The direct
   insert_count_indexed_subtree path (merk/element/insert.rs:696-700)
   rejects NonCounted(cidx) paired with a non-count-bearing merk
   tree_type. The batch variant only validated element kind, so a
   caller could queue a NonCounted(cidx) op with a non-count-bearing
   feature_type (e.g., BasicMerkNode) and bypass the invariant.

   The batch variant has no access to the merk's tree_type, so we use
   feature_type.count().is_some() as the equivalent count-bearing
   predicate (a node's feature type carries a count iff the node is
   count-bearing). Without this check, the wrapper invariant could be
   silently violated in batch flows.

   Regression test
   insert_count_indexed_subtree_into_batch_operations_rejects_non_counted_wrapper_with_non_count_bearing_feature_type
   verifies reject + clean batch_ops vec, plus sanity success path
   with CountedMerkNode.

2. Nitpick — Wrap Merk errors with context in proof/count_indexed.rs.
   Replaced six .map_err(Error::MerkError) sites in proof-building
   with .map_err(|e| Error::CorruptedData(format!("cidx proof: <step>: {e}")))
   so proof-generation failures identify which step (intermediate
   layer fetch, ancestor secondary root hash, single-key prove,
   primary root hash, secondary range prove, secondary root key
   lookup) failed.

3. Nitpick — Add cidx-specific context to the final corruption
   wrapper in insert_count_indexed_subtree. Changed
   Error::CorruptedData(e.to_string()) to include
   "insert_count_indexed_subtree: " prefix so cidx insert failures
   are distinguishable from generic insert failures during corruption
   triage.

Full merk lib: 435/435. Full grovedb lib: 1693/1693.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 90% target was raised from 80% in #660 — a 10pp jump in one step
turned out to be too aggressive in practice. PRs that introduce
non-trivial defensive code surfaces (e.g., CountIndexedTree in #657
shipped 3,172 patch lines, much of which is `CorruptedData` /
`CorruptedCodeExecution` defensive returns unreachable from the
public API under valid state) hit the gate at ~89% even after
extensive coverage work.

88% is still significantly above the previous 80% floor while
leaving ~2pp of breathing room for these defensive-code patterns.
Project coverage remains gated by `auto` (no regression vs base,
with 2pp threshold) which is the more important signal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Found while investigating the flaky repro test from dc765ed. Root cause:

In execute_ops_on_path (batch/mod.rs:3040-3094), the cidx_pre_state
HashMap was iterated to compute mirror deltas, then the deltas Vec was
iterated to apply each one to the secondary merk. The mirror operations
for different keys are independent in theory, but a merk-level
interaction (delete-after-insert on a count-bearing Merk via separate
Element::insert/Element::delete calls) sometimes leaves the deleted
entry in place. When HashMap iteration happened to produce
insert-then-delete order, the secondary kept the stale entry.

Reproducer pattern (committed as
batch_delete_op_on_cidx_primary_entry_mirrors_to_secondary): pre-populate
cidx, then apply_batch with [delete_op("old"), insert_or_replace_op("new")].
~60% of parallel runs failed; top_k returned both entries.

Fix: sort the deltas Vec deterministically before applying — pure
deletes first (delta is `(Some(_), None)`), then everything else, with
secondary sort by key. This both makes batch behavior deterministic
and avoids the merk-level delete-after-insert order issue entirely.

Two repro tests are now no longer #[ignore] and pass reliably 8/8
under --test-threads=4:
- batch_delete_op_on_cidx_primary_entry_mirrors_to_secondary
- batch_insert_or_replace_overwriting_cidx_entry_drops_old_mirror

The underlying merk delete-after-insert quirk is worth investigating
separately — for now this PR ensures cidx batch flows are not exposed
to it.

Full grovedb lib: 1695/1695 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cidx-specific Display + type-introspection tests were inline at
the end of element/mod.rs. Move them to element/cidx_tests.rs to
group cidx assertions in a discoverable file and keep mod.rs focused
on the Element enum + impls.

Behavior unchanged. Tests still run as element::cidx_tests::*:
- count_indexed_tree_display_renders_fields
- provable_count_indexed_tree_display_renders_fields
- count_indexed_tree_helpers_report_count_and_type

Full grovedb-element lib: 67/67 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
clippy::redundant_locals flagged the `let mut deltas = deltas;` at
batch/mod.rs:3092 introduced in 70336c9. The original `deltas` is
already mutable; the rebind was a leftover from an earlier iteration
where I considered shadowing with a different binding. Remove the
no-op rebind; the subsequent sort_by call mutates in place as before.

Lint clean: `cargo clippy -p grovedb --lib -- -D warnings` passes.
Tests unchanged: 1695/1695.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…egateIndexedTreeRootKeys

The op variant is aggregate-agnostic: every field
(`primary_hash`, `primary_root_key`, `primary_aggregate_data`,
`secondary_hash`, `secondary_root_key`) describes the H1-A two-Merk
propagation pattern, and `primary_aggregate_data` is an
`AggregateData` enum that already supports Count, ProvableCount,
Sum, BigSum, etc.

The CountIndexedTree-named version implied this op was count-specific
when in fact it's a generic propagation shape that the planned
SumIndexedTree (and future aggregate-indexed variants) will reuse
unchanged.

Renamed variant + docstring; pure refactor. No behavior change.

Touched: batch/mod.rs (definition + 11 use sites), batch_structure.rs,
batch/estimated_costs/{average,worst}_case_costs.rs, and 5 doc-string
mentions in count_indexed_tree_tests.rs.

Full grovedb lib: 1695/1695. `cargo clippy -p grovedb --lib -- -D warnings`
clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
grovedb/src/batch/mod.rs (1)

3241-3304: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Freshly inserted cidx trees still can’t be populated in the same batch.

If this level is a cidx primary and the parent already has an Insert*/Replace op for that key, the occupied-entry conversion below still has no Element::CountIndexedTree / Element::ProvableCountIndexedTree branch. ReplaceAggregateIndexedTreeRootKeys only works for an element that already exists on disk, so insert empty cidx + mutate descendants still has no valid propagation path. Please either reject that combination up front or add an insert-style aggregate-indexed propagation op that carries the new element’s flags/root keys.

Also applies to: 3328-3520

🧹 Nitpick comments (1)
grovedb/src/batch/mod.rs (1)

2031-2040: ⚡ Quick win

Add context to the new cidx Merk error paths.

These new pre/post-state reads and secondary-root capture currently collapse failures into raw Error::MerkError, which makes cidx corruption/debugging much harder here. Please wrap them with contextual Error::CorruptedData(...) messages instead.

As per coding guidelines, "Wrap errors with context using .map_err(|e| Error::CorruptedData(format!("context: {}", e))) pattern in Rust source files".

Also applies to: 3058-3067, 3123-3128

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grovedb/src/batch/mod.rs` around lines 2031 - 2040, The Merk errors in the
cidx pre/post-state reads collapse to raw Error::MerkError; change the
map_err(Error::MerkError) calls in the shown get() call (and the similar sites
at the other ranges) to wrap with contextual CorruptedData messages, e.g.
map_err(|e| Error::CorruptedData(format!("reading cidx pre/post-state (key ...):
{}", e))). Update the call site in the block containing the get() invocation
(the code using maybe_bytes and key_bytes) and the analogous call sites
referenced (around the regions corresponding to the 3058-3067 and 3123-3128
diffs) so all Merk failures become Error::CorruptedData with a brief context
string identifying the operation (pre-state read, post-state read, or
secondary-root capture).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@grovedb/src/batch/mod.rs`:
- Around line 2031-2040: The Merk errors in the cidx pre/post-state reads
collapse to raw Error::MerkError; change the map_err(Error::MerkError) calls in
the shown get() call (and the similar sites at the other ranges) to wrap with
contextual CorruptedData messages, e.g. map_err(|e|
Error::CorruptedData(format!("reading cidx pre/post-state (key ...): {}", e))).
Update the call site in the block containing the get() invocation (the code
using maybe_bytes and key_bytes) and the analogous call sites referenced (around
the regions corresponding to the 3058-3067 and 3123-3128 diffs) so all Merk
failures become Error::CorruptedData with a brief context string identifying the
operation (pre-state read, post-state read, or secondary-root capture).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1e37a4e3-2cd8-435e-a155-59be1f3e1d24

📥 Commits

Reviewing files that changed from the base of the PR and between dc765ed and 4556e4d.

📒 Files selected for processing (10)
  • .codecov.yml
  • grovedb-element/src/element/cidx_tests.rs
  • grovedb-element/src/element/mod.rs
  • grovedb/src/batch/batch_structure.rs
  • grovedb/src/batch/estimated_costs/average_case_costs.rs
  • grovedb/src/batch/estimated_costs/worst_case_costs.rs
  • grovedb/src/batch/mod.rs
  • grovedb/src/operations/proof/count_indexed.rs
  • grovedb/src/tests/count_indexed_tree_tests.rs
  • merk/src/element/insert.rs
✅ Files skipped from review due to trivial changes (1)
  • .codecov.yml
🚧 Files skipped from review as they are similar to previous changes (3)
  • grovedb/src/batch/batch_structure.rs
  • merk/src/element/insert.rs
  • grovedb/src/operations/proof/count_indexed.rs

QuantumExplorer and others added 5 commits May 12, 2026 06:38
mod.rs grew ~920 net lines for the CountIndexedTree batch propagation
work; extracting the two self-contained cidx-specific blocks into a
sibling submodule keeps mod.rs focused on the generic batch pipeline
and isolates the cidx propagation pattern (which the planned
SumIndexedTree and other aggregate-indexed shapes will reuse).

Moved into new grovedb/src/batch/cidx.rs:

- capture_cidx_pre_state(primary_merk, ops_at_path_by_key, grove_version)
  -> CostResult<HashMap<Vec<u8>, Option<u64>>, Error>
  Reads each affected key's pre-apply count_value, enforces the
  247-byte cidx primary key ceiling.

- apply_cidx_secondary_mirror_post_apply(primary_merk, pre,
  secondary_merk, grove_version)
  -> CostResult<(CryptoHash, Option<Vec<u8>>), Error>
  Builds deterministic (deletes-first) deltas from the pre-state +
  post-apply primary state and applies mirror_to_secondary_for_batch
  to the cidx secondary. Returns the secondary's post-mirror state
  for the bubble-up's H1-A composition.

`execute_ops_on_path` now calls the two helpers around the merk
apply boundary. mod.rs is net -103 lines; cidx.rs is 192 lines
(docstrings + imports + struct overhead).

No behavior change. Full grovedb lib: 1695/1695. Lib clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses CodeRabbit review on bb390d5.

1. Major - A batch that both inserts a fresh CountIndexedTree /
   ProvableCountIndexedTree element AND mutates a descendant of that
   cidx in the same batch had no valid propagation path. The
   bubble-up reaches a conversion match in apply_batch_structure
   that has no cidx arm; previously this surfaced as a confusing
   MerkError(PathKeyNotFound) mid-batch as the secondary-merk
   closure tried to read the cidx element from a parent merk that
   doesn't yet contain it.

   Two defenses added:
   - Preflight reject_freshly_inserted_cidx_with_descendants(ops)
     scans ops for any Insert*/Replace/Patch op whose element is a
     cidx, collects those paths, and rejects any other op whose
     effective target (path + key) is strictly under one of those
     paths. Wired into both apply_batch_with_element_flags_update
     and apply_partial_batch_with_element_flags_update right after
     the consistency check.
   - Defense-in-depth cidx arm in the conversion match at
     apply_batch_structure's occupied-entry path: if the propagation
     somehow reaches that point with a cidx element, return a clear
     NotSupported message rather than falling through to
     "insertion of element under a non tree".

   Both paths produce a clear, actionable error pointing callers to
   either split into two batches or use db.insert_into_count_indexed_tree.

   Regression test
   batch_inserting_and_populating_fresh_cidx_in_same_batch_is_rejected
   confirms the preflight fires with the expected NotSupported.

2. Nitpick - Wrap the three remaining .map_err(Error::MerkError)
   sites in grovedb/src/batch/cidx.rs with contextual
   Error::CorruptedData(format!(...)) messages that identify the
   step (pre-state read, post-state read, secondary root hash
   capture) and include the affected key for diagnostics.

Full grovedb lib: 1696/1696. Lib clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename the cidx-shorthand file `batch/cidx.rs` to the more descriptive
`batch/count_indexed_tree.rs` (consistent with `operations/count_indexed_tree.rs`
and `operations/proof/count_indexed.rs`) and extract two more cidx-specific
chunks from the increasingly oversized `batch/mod.rs`:

- `reject_freshly_inserted_cidx_with_descendants` (55 lines): preflight
  guard that rejects batches creating a cidx primary AND writing under
  it in the same batch. Moved verbatim into the new file.

- `inspect_cidx_overwrite` (newly extracted, replaces a 120-line inline
  branch): classifies an `op_could_overwrite` insert when tree-override
  protection is OFF, returning `Some(cidx_path)` to schedule for
  post-apply cleanup on safe-subset cidx overwrites, `None` for non-cidx,
  and `Err(NotSupported)` / `Err(InvalidBatchOperation)` for ambiguous
  overwrites and same-batch descendant writes.

`mod.rs` shrinks by ~190 lines; the new file is ~395 lines and is the
single source of truth for cidx propagation, preflight, and overwrite
classification in the batch pipeline.

The pre/post-state helpers also switch from `HashMap` to `BTreeMap` for
deterministic iteration (the explicit pure-deletes-first sort is
retained, but the input source is now key-sorted by construction).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves conflicts between this PR (CountIndexedTree / cidx) and
develop's PR #666 (Element::NotCountedOrSummed wrapper). Both PRs
add new Element variants and new arms to the same match statements
throughout the element / merk / debugger layers; the resolution is
to union both sets.

**Critical: on-disk discriminant collision fixed.** Develop assigns
`NOT_COUNTED_OR_SUMMED_WRAPPER_DISCRIMINANT = 17`; this PR had
`ElementType::CountIndexedTree = 17`. Same on-disk byte, two
incompatible meanings.

Resolution: shift cidx discriminants by 1 and re-derive the
NonCounted twin bytes:
  - `ElementType::CountIndexedTree`              17 → 18
  - `ElementType::ProvableCountIndexedTree`      18 → 19
  - `ElementType::NonCountedCountIndexedTree`   145 → 146 (`0x80 | 18`)
  - `ElementType::NonCountedProvableCountIndexedTree`
                                                146 → 147 (`0x80 | 19`)

The `Element` enum variant order is updated to match: `NotCountedOrSummed`
keeps variant index 17, cidx variants move to 18 and 19. The bincode
auto-generated tag bytes stay aligned with `ElementType`. Updated:
  - `try_from`, `from_serialized_value`, related doc-comments
  - assertion / range-check tests for the new wrapper slot at 145
  - `is_non_counted` doc range `[128, 146]` → `[128, 147]`

Other conflict resolutions union both sets of match arms in:
  - `grovedb-element/element/helpers.rs` (get/set flags)
  - `grovedb-element/element/mod.rs` (Element enum, ElementShadow serde)
  - `merk/element/costs.rs` (get_specialized_cost)
  - `merk/element/tree_type.rs` (6 wrapper-arm extensions)
  - `grovedb/debugger.rs` (element_to_grovedbg)

And one non-exhaustive-match fix flagged by rustc post-merge:
  - `grovedb/operations/count_indexed_tree.rs:438` — add
    `NotCountedOrSummed(_)` to the wrapper-unreachable arm.

Verified:
- `cargo check -p grovedb` clean
- `cargo test --workspace --lib` 2949 tests pass, 0 failures
- `cargo fmt --all` applied

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves conflicts with develop's PR #667 (Element::ReferenceWithSumItem
+ RefreshReferenceWithSumItem batch op). PR #667 added a new sum-bearing
reference variant + a non_counted field on InsertNonMerkTree; this PR
adds CountIndexedTree / ProvableCountIndexedTree + the
ReplaceAggregateIndexedTreeRootKeys batch op.

**Second on-disk discriminant collision fixed.** Develop's
`ElementType::ReferenceWithSumItem = 18` collides with this PR's prior
post-merge assignment `ElementType::CountIndexedTree = 18`. (Same
shape as the prior merge with NotCountedOrSummed at byte 17.)

Resolution: shift cidx discriminants by another 1 and re-derive the
NonCounted twin bytes:
  - `ElementType::CountIndexedTree`              18 → 19
  - `ElementType::ProvableCountIndexedTree`      19 → 20
  - `ElementType::NonCountedCountIndexedTree`   146 → 147 (`0x80 | 19`)
  - `ElementType::NonCountedProvableCountIndexedTree`
                                                147 → 148 (`0x80 | 20`)

The `Element` enum variant order now places `ReferenceWithSumItem` at
index 18 (matching develop) and the two cidx variants at 19 and 20.
The `ElementShadow` serde decoder mirrors this order. Updated:
  - `try_from` numeric branches, `from_serialized_value` tests
  - NonCounted-wrapper allowlist `0..=14 | 18` → `0..=14 | 18 | 19 | 20`
  - Range assertions: NonCounted twin range `[128, 147]` → `[128, 148]`;
    invalid-byte tests adjusted (149..=179 instead of 148..=179)
  - 18-base-case test grew to 18 cases with explicit counts

Other conflict resolutions union both sides' match arms in:
  - `grovedb-element/element/helpers.rs` (get_flags / get_flags_owned /
    get_flags_mut / set_flags — single combined arm for cidx + RWSI)
  - `grovedb-element/element/mod.rs` (Element enum, ElementShadow,
    element_type() dispatch including the wrapper path)
  - `grovedb-element/element_type.rs` (display strings, try_from,
    NonCounted wrapper allowlist, from_serialized_value tests, the
    canonical base-variant round-trip test grown to 18 cases)
  - `grovedb/batch/estimated_costs/average_case_costs.rs`
    (`InsertNonMerkTree` gained the `non_counted` field on develop;
    union keeps both that signature change and the existing
    `ReplaceAggregateIndexedTreeRootKeys` arm)
  - `grovedb/batch/mod.rs` (RefreshReference doc-comment expansion
    on develop kept alongside the
    `ReplaceAggregateIndexedTreeRootKeys` variant definition)

And one non-exhaustive-match fix flagged by rustc post-merge:
  - `grovedb/operations/count_indexed_tree.rs:438` — fold
    `ReferenceWithSumItem` into the existing `Reference` arm
    (both resolve identically and `insert_reference` already does
    the variant-aware feature-type lookup via `get_feature_type`).

Verified:
- `cargo check -p grovedb` clean
- `cargo test --workspace --lib` ~3000 tests, 0 failures
- `cargo fmt --all` applied

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
QuantumExplorer added a commit that referenced this pull request May 17, 2026
* feat(types): add Element::ProvableSumTree variant + NotSummed twin extension

Phase 1 of the ProvableSumTree feature — the missing parallel to
ProvableCountTree that bakes the per-node sum into the node hash, making
aggregate-sum range queries cryptographically verifiable. This commit adds
the types-only foundation (no hash divergence yet — Phase 2 will introduce
node_hash_with_sum and the new proof Node variants).

DISCRIMINANTS

- Element::ProvableSumTree at variant index 17 / bincode discriminant 17
  (next free after the NotSummed wrapper byte at 16). This will renumber
  to 19 when PR #657 (CountIndexedTree) lands and reclaims 17/18.
- NonCountedProvableSumTree = 0x80 | 17 = 145.
- The NonCounted twin range widened from 0x80..=0x8F (4-bit base) to
  0x80..=0x9F (5-bit base) — is_non_counted() now checks the top 3 bits
  (& 0xe0 == 0x80) instead of the top 4. Existing twins 128..=142 stay put.
- The NotSummed twin scheme rebases analogously: prefix 0xb0 -> 0xa0,
  base mask 0x0F -> 0x1F, family range 0xA0..=0xBF. Existing twins move:
  NotSummedSumTree                180 -> 164
  NotSummedBigSumTree             181 -> 165
  NotSummedCountSumTree           183 -> 167
  NotSummedProvableCountSumTree   186 -> 170
  Plus the new NotSummedProvableSumTree = 0xa0 | 17 = 177.
  Safe because V1 is pre-shipping. is_not_summed() now uses & 0xe0 == 0xa0.

NEW APIS

- ElementType::ProvableSumTree, ElementType::NonCountedProvableSumTree,
  ElementType::NotSummedProvableSumTree.
- TreeType::ProvableSumTree (discriminant 11, is_sum_bearing = true,
  allows_sum_item = true, inner_node_type = ProvableSumNode).
- NodeType::ProvableSumNode and TreeFeatureType::ProvableSummedMerkNode(i64)
  with encode tag byte 7 and a parallel zero_sum() helper alongside
  zero_count().
- Element::new_provable_sum_tree*, empty_provable_sum_tree*, plus helpers
  (as_provable_sum_tree_value, is_provable_sum_tree).
- Element::NotSummed now accepts ProvableSumTree as a sum-tree inner type
  (constructor, serialize, deserialize).

PROOF DISPATCH

ProvableSumTree joins the "provable aggregate parent" family alongside
ProvableCountTree / ProvableCountSumTree in ElementType::proof_node_type:
subtree children use KvValueHashFeatureType and item children use KvCount.

PHASE-1 SCOPE BOUNDARIES

ProvableSumTree behaves identically to SumTree for storage, aggregation,
and hashing in Phase 1. The divergent node_hash_with_sum and the new
proof Node variants (KVSum, KVHashSum, etc.) land in Phase 2.
TreeFeatureType::ProvableSummedMerkNode maps to AggregateData::Sum at the
Element/aggregate level for now; Phase 2 may introduce a dedicated variant
once the hash diverges.

Workspace cargo test --all-features green (1497+ tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(merk): node_hash_with_sum + proof Node variants for ProvableSumTree

Phase 2 of the ProvableSumTree feature — bakes the per-node sum into the
node hash so it becomes cryptographically committed via the parent's
hash chain, parallel to how `node_hash_with_count` commits the count
for `ProvableCountTree`. After this commit a `ProvableSumTree` with the
same {key/value/sum} contents as a plain `SumTree` produces a different
root hash, which is the whole point of the Phase 2 divergence.

Phase 1 (commit c95cf749) was types-only; aggregation, storage, and
hashing all used the SumTree code paths. Phase 2 introduces the new
hash function, the new proof-node variants needed to transport sums
through proofs, and the dispatch wiring on both prover and verifier
sides. Phases 3 (insert/read), 4 (verify_grovedb walk), and 5
(AggregateSumOnRange) remain.

HASH DISPATCH

- `merk::tree::hash::node_hash_with_sum(kv, l, r, i64)` mirrors
  `node_hash_with_count` byte-for-byte except the appended 8-byte
  field is `i64::to_be_bytes()`. Negative sums hash via their
  two's-complement BE form, which is platform-independent.
- New `AggregateData::ProvableSum(i64)` variant. The
  `From<TreeFeatureType>` conversion now maps
  `ProvableSummedMerkNode(v) -> ProvableSum(v)` (was `Sum(v)` in
  Phase 1) so `Tree::hash_for_link` and the commit path can
  dispatch through the new arm.
- `Tree::hash_for_link(TreeType::ProvableSumTree)` and both commit
  paths (left/right Link::Modified arms) now call
  `node_hash_with_sum` when the aggregate is `ProvableSum`.
  `Tree::aggregate_data` for `ProvableSummedMerkNode` yields
  `ProvableSum` instead of `Sum`.
- Helper updates: `child_aggregate_sum_data_as_i64` /
  `child_aggregate_sum_data_as_i128` treat `ProvableSum`
  identically to `Sum`; `child_aggregate_count_data_as_u64`
  returns 0. `child_ref_and_sum_size` covers the new variant.
- `Link::encode_into` / `decode_into` learn tag byte 7 for
  `AggregateData::ProvableSum` (parallel to the existing
  `ProvableSummedMerkNode` tag byte 7 in `TreeFeatureType`).
- `grovedb::batch` `InsertTreeWithRootHash` now reconstructs an
  `Element::ProvableSumTree` when seeing `AggregateData::ProvableSum`.

PROOF NODE VARIANTS

Five new `Node` enum variants in `grovedb-query/src/proofs/mod.rs`,
mirroring the Count family member-for-member but with `i64` sums:

- `KVSum(key, value, sum)`             — sum analogue of `KVCount`
- `KVHashSum(kv_hash, sum)`            — analogue of `KVHashCount`
- `KVRefValueHashSum(key, ref_value, ref_elem_hash, sum)`
- `KVDigestSum(key, value_hash, sum)`  — analogue of `KVDigestCount`
- `HashWithSum(kv_hash, l, r, sum)`    — analogue of `HashWithCount`

`merk::proofs::tree::Tree::hash()` now dispatches each new variant
through `node_hash_with_sum`. `KVValueHashFeatureType` /
`...WithChildHash` handling gains a `ProvableSummedMerkNode` arm so
proof-tree hashes recomputed from a Sum-bearing feature_type match
the Merk-tree side. `aggregate_data()` returns `ProvableSum(sum)`
for `KVSum` and `HashWithSum`; `key()` lists the three key-bearing
new variants alongside their Count counterparts.

`grovedb-element::ProofNodeType` gains `KvSum` and
`KvRefValueHashSum`; `ElementType::proof_node_type` now picks them
when the parent is `ProvableSumTree` (Phase 1 routed Sum-tree
children through the Count dispatch). Subtrees inside ProvableSum
still use `KvValueHashFeatureType` since the feature_type carries
the sum.

Proof generation in `merk/src/proofs/query/mod.rs` adds
`to_kv_sum_node`, `to_kvhash_sum_node`, `to_kvdigest_sum_node`
(parallel to the Count helpers) and an `is_provable_sum_tree`
branch that emits Sum-bearing variants. `chunks.rs`'s
`create_proof_node_for_chunk` dispatches the new ProofNodeType
arms.

GroveDB-side reference post-processing in
`grovedb/src/operations/proof/generate.rs` rewrites the merk-level
`KVValueHashFeatureType(_, _, _, ProvableSummedMerkNode(sum))` to
`KVRefValueHashSum`, mirroring the existing
`KVValueHashFeatureType -> KVRefValueHashCount` path. Both
ref-rewriting loops in that file are updated.

The regular query verifier in `merk/src/proofs/query/verify.rs`
rejects `HashWithSum` at non-aggregate positions (fail-fast,
matching the existing `HashWithCount` guard). `KVSum`,
`KVDigestSum`, and `KVRefValueHashSum` are dispatched via
`execute_node`. `KVHashSum` joins `KVHash` / `KVHashCount` in the
"non-data-bearing on path" branch and in the absence-proof
boundary set.

WIRE FORMAT

Tag bytes 0x30..=0x3D in the previously-unused 0x30..0x3F range:

  Push variants (V0 short + V1 wrapper for KV-style large values):
    0x30 = KVSum (small),     0x31 = KVSum (large)
    0x32 = KVHashSum
    0x33 = KVRefValueHashSum (small), 0x34 = KVRefValueHashSum (large)
    0x35 = KVDigestSum
    0x36 = HashWithSum
  PushInverted parallel:
    0x37 = KVSum (small),     0x38 = KVSum (large)
    0x39 = KVHashSum
    0x3a = KVRefValueHashSum (small), 0x3b = KVRefValueHashSum (large)
    0x3c = KVDigestSum
    0x3d = HashWithSum
  0x3e and 0x3f are intentionally reserved.

The on-wire i64 sum uses varint (via `ed::Encode for i64`) for
compactness, matching the Count family. The hash recomputation in
`node_hash_with_sum` uses the fixed 8-byte big-endian form
independently — wire encoding and hash input are deliberately
decoupled. `encoding_length()` and `Decode` arms parallel the Count
family verbatim.

V0 wire format is unchanged. All new tags are V1-only.

TESTS

- `merk::tree::hash` (4): `node_hash_with_sum` differs from
  `node_hash` even at sum=0; different sums give different hashes;
  `i64::MIN` / `i64::MAX` are distinct; determinism.
- `merk::tree` (2): a `ProvableSummedMerkNode` tree aggregates to
  `ProvableSum`, `hash_for_link(ProvableSumTree)` matches
  `node_hash_with_sum(...)` and diverges from plain
  `Tree::hash()`; mutating a node sum changes the root hash.
- `merk::proofs::tree` (4): forged sums on `HashWithSum`,
  `KVSum`, `KVHashSum` change the recomputed node hash;
  Phase 1 -> Phase 2 cornerstone — same {key/value/sum} contents
  give a different ProvableSumTree hash than a plain SumTree.
- `grovedb-query::proofs::encoding` (4): round-trip every new
  variant through `Op::Push` and `Op::PushInverted` at sum
  values {`i64::MIN`, -42, -1, 0, 1, 42, `i64::MAX`}; tag-byte
  sanity check for all 10 new tags.
- `merk::tree::tree_feature_type`: extended every existing
  `AggregateData` test to cover the new `ProvableSum` variant.

Workspace `cargo test --all-features` green: 2881 tests passing,
zero failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(types): NotSummed twin uses explicit per-variant mapping

Phase 1.6 — revert the Phase 1.5 mask widening. The NotSummed family
returns to its original 4-bit prefix `0xb0` / range `0xB0..=0xBF`. The
five legal sum-tree inner types are mapped to twin slots explicitly,
1-at-a-time, instead of via the `prefix | base` bitwise formula:

  SumTree (base 4)               -> 180  (0xB4)
  BigSumTree (base 5)            -> 181  (0xB5)
  CountSumTree (base 7)          -> 183  (0xB7)
  ProvableCountSumTree (base 10) -> 186  (0xBA)
  ProvableSumTree (base 17)      -> 177  (0xB1)

Existing twins return to their original discriminant values; only the
new ProvableSumTree slot is freshly assigned. Pre-shipping V1, so this
discriminant churn is fine.

WHY EXPLICIT MAPPING

`prefix | inner_byte` can only generate twin slots when the inner
discriminant fits in the prefix's complement nibble. For ProvableSumTree
at base 17, the formula `0xb0 | 17` would produce `0xB1` AND then
`disc & 0x0F` would invert it back to base 1 (Reference) — a collision.
Widening the mask to 5 bits in Phase 1.5 rebased every existing twin
discriminant; reverting to per-variant mapping keeps the historical
values stable while still allowing arbitrary new slot assignments.

CONSEQUENCES

- `NOT_SUMMED_TWIN_PREFIX` stays as a const but is now only a family-range
  marker, never composed with a base byte.
- `NOT_SUMMED_BASE_MASK` removed — no remaining callers.
- `is_not_summed()` back to `& 0xf0 == 0xb0`.
- `base()` for NotSummed now uses an explicit per-variant match.
- `from_serialized_value` NotSummed branch uses an explicit
  `inner_byte → twin_variant` match.

NonCounted is unaffected — it still uses the bitwise formula because all
its bases fit cleanly in the low 5 bits under `0x80`.

Workspace cargo test --all-features green (2881 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(grovedb): wire ProvableSumTree through insert/read/batch paths

Phase 3 of the ProvableSumTree feature — wires the variant through the
extension traits, cost calculator, reconstruction helper, batch
propagation, and read-path subtree validation so that direct insertion,
nested aggregation, and child-sum mutation behave correctly end-to-end.

Phase 1 (commit c95cf749) added the variant and its twins; Phase 2
(commit 3364f08c) introduced node_hash_with_sum and the proof Node
family. The "behave like SumTree" fallback Phase 1 leaned on covered
most surfaces but several dispatch sites guarded subsequent operations
through explicit per-variant match arms — those sites would silently
drop ProvableSumTree or fail to traverse into it. Phase 3 fills each
in deliberately.

EXTENSION TRAITS — merk/src/element/tree_type.rs

ElementTreeTypeExtensions had six trait methods that enumerated tree
variants explicitly: root_key_and_tree_type_owned, root_key_and_tree_type,
tree_flags_and_type, tree_type, maybe_tree_type, tree_feature_type. Each
was missing its ProvableSumTree arm — get_feature_type was the only one
already wired (Phase 1). Adding the missing arms unblocks callers across
get, batch, and visualize that thread tree types through these helpers
to decide layout, hashing, and aggregate-data extraction.
tree_feature_type now maps ProvableSumTree -> ProvableSummedMerkNode(sum)
explicitly, matching the parallel ProvableCountedMerkNode wiring.

COST CALCULATOR — merk/src/element/costs.rs

get_specialized_cost, the layered_value_byte_cost path in
specialized_costs_for_key_value, the layered_value_defined_cost type
filter, and the value_defined_cost dispatch all enumerated the eight
Merk-tree variants and would have either returned None or mis-sized a
ProvableSumTree element. Added explicit ProvableSumTree => SUM_TREE_COST_SIZE
arms (parity with SumTree as established in Phase 1) and the matching
LayeredValueDefinedCost branch.

RECONSTRUCTION — merk/src/element/reconstruct.rs

ElementReconstructExtensions::reconstruct_with_root_key, used by batch
propagation to rebuild a tree element after a root-key update, returned
None for ProvableSumTree. Added the arm that pulls
aggregate_data.as_sum_i64() into Element::ProvableSumTree(root, sum,
flags); without this, batch operations that mutated a ProvableSumTree
subtree would lose their tree element entirely during the parent's
upward propagation. as_sum_i64 already handles the AggregateData::ProvableSum
case (Phase 2).

BATCH PROPAGATION — grovedb/src/batch/mod.rs

The InsertTreeWithRootHash else-if chain that transcribes a Merk-tree
mutation into the appropriate root-hash-bearing operation enumerated
each tree-element variant explicitly. ProvableSumTree was missing,
so a batch that mutated a ProvableSumTree subtree would fall through
to the CommitmentTree arm — wrong shape entirely. Mirrored the
ProvableCountSumTree arm directly. The accompanying tree-cost match
list above (used by the apply_batch storage-cost callback) was also
missing the variant.

READ-PATH SUBTREE VALIDATION — grovedb/src/operations/get/mod.rs

check_subtree_exists rejected paths whose final segment resolved to a
ProvableSumTree because the variant wasn't in its accepted-tree
match list. This would have broken every query that traversed INTO a
ProvableSumTree.

TESTS — grovedb/src/tests/provable_sum_tree_tests.rs

Ten tests covering Phase 3's externally-observable surface:

- Round-trip insert/read with aggregate-sum tracking.
- Aggregation across mixed positive/negative/zero values + i64::MIN/
  i64::MAX extremes.
- Root-hash divergence vs a plain SumTree with identical children
  (the Phase 2 cornerstone, verified end-to-end via
  open_transactional_merk_at_path).
- Nested ProvableSumTree[A] -> ProvableSumTree[B] aggregate propagation;
  mutation of B's children shifts the grovedb root hash.
- Wrapper interactions: NonCounted(ProvableSumTree) contributes 0 to a
  CountTree parent; NotSummed(ProvableSumTree) contributes 0 to a
  SumTree parent; the wrapped tree's own aggregate is preserved
  (verified via get_raw, which retains the wrapper byte).
- Deleting a SumItem child shifts the ProvableSumTree root hash because
  the aggregate sum is hash-bound.
- Direct insert of a ProvableSumTree built from an existing template.

The wrapper round-trip tests use db.get_raw rather than db.get because
db.get strips wrappers via into_underlying — by design.

Workspace cargo test --all-features green: 2891 tests passing (was
2881 in Phase 2 + 10 new), zero failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(grovedb): verify_grovedb consistency check for aggregate fields

Phase 4 of the ProvableSumTree feature. The existing
`verify_merk_and_submerks_in_transaction` walk is cryptographically
complete — `combine_hash(value_hash(parent_bytes), inner_merk_root) ==
stored_element_value_hash` catches any byte-level tampering, and for
ProvableSumTree the inner aggregate is bound into the inner Merk's
root_hash via `node_hash_with_sum` (Phase 2). What that walk did not
catch was the *software-consistency* class of drift: a parent
`ProvableSumTree(_, N, _)` whose stored sum field N disagrees with the
inner Merk's actual `aggregate_data()` value M. For provable variants
both N and M are bound into element_value_hash, but they live on disk
independently and could disagree if Phase 3's propagation logic drifts.
For non-provable variants (SumTree, BigSumTree, CountTree, CountSumTree)
the recorded aggregate isn't hash-bound at all, so a pure software bug
in propagation would silently corrupt the tree.

LIB.RS — verify_merk_and_submerks_in_transaction

After the existing cryptographic check, for any tree element whose
inner Merk holds actual data (i.e. excluding the non-Merk-data trees
CommitmentTree/MmrTree/BulkAppendTree/DenseTree, which already
short-circuit via `uses_non_merk_data_storage`), the verifier now opens
the inner Merk, reads its `aggregate_data()`, and compares against the
parent's recorded aggregate field via a new free helper
`aggregate_consistency_labels`. The helper covers all seven aggregate-
bearing tree variants:

  - SumTree                vs. AggregateData::Sum
  - ProvableSumTree        vs. AggregateData::ProvableSum
  - BigSumTree             vs. AggregateData::BigSum
  - CountTree              vs. AggregateData::Count
  - CountSumTree           vs. AggregateData::CountAndSum
  - ProvableCountTree      vs. AggregateData::ProvableCount
  - ProvableCountSumTree   vs. AggregateData::ProvableCountAndSum

Plus an empty-Merk identity case (NoAggregateData with zero recorded
aggregate matches), and a fallback that reports any variant-shape
mismatch (e.g. ProvableSumTree paired with AggregateData::Count(_)).

VERIFICATIONISSUES SHAPE — placeholder hashes, not type extension

`VerificationIssues` is a private type alias
`HashMap<Vec<Vec<u8>>, (CryptoHash, CryptoHash, CryptoHash)>` whose
shape is consumed by `visualize_verify_grovedb`. To avoid breaking
its callers and the visualize hex output, mismatched aggregates are
packed into deterministic placeholder CryptoHashes via
`blake3(format!("recorded ..."))` and `blake3(format!("inner ..."))`,
slotted into the "expected" and "actual" fields. The "root" slot
reuses the inner-Merk root_hash for path locality. Documented inline.

INTEGRITY WALK TESTS — 7 new tests

A new `integrity_walk_tests` module in `provable_sum_tree_tests.rs`
exercises the verifier end-to-end via two raw-storage tampering
helpers:

  - `tamper_value_no_hash_update` decodes the on-disk TreeNode for a
    leaf, replaces only its element bytes, re-encodes (leaving the
    stored value_hash stale), writes back via the immediate storage
    context. Simulates byte-level tampering caught by the SumItem
    arm's `value_hash(bytes) != stored_value_hash` check.

  - `tamper_parent_element_with_consistent_hashes` splices in fresh
    element bytes AND recomputes hash + value_hash to remain
    crypto-consistent with the inner Merk's existing root_hash. Used
    for aggregate-mismatch scenarios — the crypto check passes,
    but the new aggregate-consistency check fires. Offsets into the
    on-disk TreeNodeInner encoding are derived from the decoded
    `value_as_slice().len()`.

Scenarios covered:

  1. Inner SumItem value tamper (different bytes) — crypto check
     catches it.
  2. Inner SumItem same-length value tamper — crypto check catches
     it (assert: hashes, not lengths, are what's verified).
  3. Parent ProvableSumTree aggregate mismatch (sum=999 stored vs.
     40 actual) — new aggregate-consistency check fires.
  4. Clean ProvableSumTree verifies clean (with mixed positive,
     negative, zero, and large values).
  5. Clean ProvableCountTree verifies clean.
  6. Parent ProvableCountTree aggregate mismatch (count=9999 vs. 3)
     — sanity check that the generalized helper handles the count
     variant too.
  7. Reload-after-write determinism: insert, drop the db handle,
     reopen, verify_grovedb reports zero issues; the parent's
     ProvableSumTree.sum_value field round-trips.

Workspace cargo test --all-features green: 2898 passing
(Phase 3 baseline of 2891 + 7 new), zero failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: AggregateSumOnRange query + proof + verify for ProvableSumTree

Adds the marquee Phase 5 feature for ProvableSumTree: a query that asks
"what's the cryptographically-verifiable signed sum of children with keys
in range [a, b]?" against a ProvableSumTree, with proof size O(log n + |boundary|)
and a verify path that returns the root hash plus the aggregate i64 sum.

Mirrors AggregateCountOnRange line-for-line:
- QueryItem::AggregateSumOnRange(Box<QueryItem>) variant (wire tag 11)
- Query / SizedQuery / PathQuery::validate_aggregate_sum_on_range with the
  same nested-rejection, no-subquery, no-pagination, allowed-inner-range rules
- merk/src/proofs/query/aggregate_sum.rs (~760 lines) implementing
  create_aggregate_sum_on_range_proof + verify_aggregate_sum_on_range_proof
  with the same Disjoint/Contained/Boundary classification, HashWithSum
  self-verifying compression at fully-inside/outside subtrees, and
  KVDigestSum at boundaries
- grovedb/src/operations/proof/aggregate_sum.rs (~330 lines) for the
  GroveDB-level multi-layer envelope chain check
- prove_query / verify_query dispatch in generate.rs and verify.rs
- Tree-type rejection arms in BulkAppendTree, DenseTree, MMR for the new variant

Key correctness points handled differently from count:
- i128 accumulator throughout the verifier (sum can validly be 0 with
  non-zero children, so no "if sum == 0" short-circuit; final narrow to
  i64 with an explicit overflow error)
- No checked_sub equivalent for own_sum derivation — signed sums make
  arithmetic-only corruption detection meaningless; the hash chain binds
  the values regardless
- ProvableSumTree-only at the merk-level gate (Sum/BigSum use different
  hash dispatches and can't host this proof shape)

Tests: 35 new tests total (14 merk-level in aggregate_sum.rs, 21 GroveDB-
level in aggregate_sum_query_tests.rs) covering empty trees, single-key
ranges, full/sub/boundary ranges, negative sums, mixed-sign extremes
including i64::MAX + i64::MIN = -1, tampering rejection, wrong-tree
rejection, validation rejection of nested/Key/RangeFull/orthogonal-aggregate
inners, multi-layer paths, NotSummed-wrapped subtree exclusion, V0 envelope
round-trip. Workspace test count: 2898 → 2938, zero failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: ProvableSumTree element + AggregateSumOnRange query

Final phase of the ProvableSumTree feature — documentation. Adds:

- `docs/book/src/aggregate-sum-on-range-queries.md`: new dedicated chapter
  describing the AggregateSumOnRange query, the ProvableSumTree tree type
  it operates on, why the existing sum trees can't be queried this way,
  the proof node vocabulary (KVSum / KVHashSum / HashWithSum / KVDigestSum
  / KVRefValueHashSum at wire tags 0x30..=0x3D), and the signed-sum
  correctness notes (no zero-sum short-circuit; i128 accumulator with
  i64 narrowing at the entry points; overflow handling at i64::MAX
  extremes).
- `docs/book/src/element-system.md`: ProvableSumTree row added to the
  aggregate-tree table; ProvableSummedMerkNode added to the
  TreeFeatureType enum block; NonCounted/NotSummed wrapper indices
  surfaced; explanation of when to choose ProvableSumTree over plain
  SumTree (sum is part of the protocol invariant vs metadata) and the
  rationale for the explicit `NotSummedProvableSumTree = 177` slot.
- `docs/book/src/hashing.md`: parallel "Aggregate Hashing for
  ProvableSumTree" section showing node_hash_with_sum's i64 BE input
  layout and the wire-vs-hash encoding split.
- `docs/book/src/appendix-a.md`: rows for NonCounted (15), NotSummed
  (16), and ProvableSumTree (17) added to the discriminant table.
- `docs/book/src/aggregate-sum-queries.md`: disambiguation banner at the
  top distinguishing the existing sum-budget iterator from the new
  AggregateSumOnRange query, with a cross-link.
- `docs/book/src/SUMMARY.md`: registers the new chapter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: targeted coverage for ProvableSumTree code paths

Adds focused unit tests for production code added by this PR that the
existing test suite happened to exercise indirectly. None of these tests
add new behavior; they only pin down branches that codecov flagged as
uncovered on the patch.

Covers:

- grovedb-query/src/proofs/encoding.rs: long-value (>= 65536 bytes)
  round-trips for the four KV-style ProvableSumTree wire variants
  (KVSum, KVRefValueHashSum in both Push and PushInverted directions
  -- tag bytes 0x31, 0x34, 0x38, 0x3b).
- grovedb-query/src/proofs/mod.rs: Display tests for KVSum, KVHashSum,
  KVRefValueHashSum, KVDigestSum, and HashWithSum proof nodes.
- grovedb-element/src/element_type.rs: proof_node_type dispatch on
  ProvableSumTree parents (Items -> KvSum, References -> KvRefValueHashSum),
  plus as_str / Display for ProvableSumTree, NonCountedProvableSumTree,
  NotSummedProvableSumTree.
- grovedb-element/tests/element_constructors_helpers.rs: every
  ProvableSumTree constructor + is_provable_sum_tree /
  as_provable_sum_tree_value / into_provable_sum_tree_value helpers,
  including the wrong-element error paths.
- grovedb-element/tests/element_display_and_serialization.rs: extends
  the all-variants Display test to include ProvableSumTree.
- merk/src/proofs/tree.rs: forged-sum sensitivity for KVDigestSum and
  KVRefValueHashSum (the latter exercises the full
  combine(referenced_value_hash, node_value_hash) -> node_hash_with_sum
  path), aggregate_data() returning ProvableSum for KVSum / HashWithSum,
  and key() returning the right thing for all five sum node variants.
- merk/src/proofs/query/aggregate_sum.rs: two new regression tests for
  Merk::prove (regular query) on a ProvableSumTree, asserting that the
  emitted proof contains KVSum + KVHashSum (and KVDigestSum for an
  absent-key boundary). These hit the to_kv_sum_node / to_kvhash_sum_node
  / to_kvdigest_sum_node helpers whose only callers are inside
  create_proof_internal's ProvableSumTree branches.

All 2956 workspace tests pass (was 2938 before this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(verify): reject KVRefValueHashSum in trunk/branch chunk proofs

The trunk/branch proof extractor was rejecting Node::KVRefValueHash and
Node::KVRefValueHashCount as having an opaque value_hash, but the new
Phase 2 KVRefValueHashSum variant was missing from the rejection arm.
Without this guard, get_key_value_from_node still surfaces (key, value)
for KVRefValueHashSum nodes and the verifier would deserialize and
insert the value bytes into the elements map. The embedded
node_value_hash is opaque (combine_hash of the node_value_hash and the
referenced_value_hash) and cannot be recomputed from the value bytes
alone, so a forged value could ride along while the per-node merk
hash chain still appears valid.

Add KVRefValueHashSum to the rejection arm in
extract_elements_and_leaf_keys, alongside KVRefValueHash and
KVRefValueHashCount, with a regression test that mirrors the existing
KVRefValueHash trunk-proof rejection test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(query): scan conditional-branch selectors for AggregateSumOnRange

has_aggregate_sum_on_range_anywhere previously walked only
branch.subquery for each conditional branch, ignoring the branch
selector (the IndexMap key). Selectors are themselves QueryItems and
the type system permits an AggregateSumOnRange tag there even though
it is not a meaningful conditional matcher. The shape check is meant
to be exhaustive — if any ASOR is present "anywhere", the prover must
refuse to route through the regular-proof path — so the walker must
surface a selector-tagged ASOR too.

Iterate `(selector, branch)` instead of `branches.values()` and
short-circuit on `selector.is_aggregate_sum_on_range()` before
recursing into `branch.subquery`. Add a regression test that mirrors
the existing count walker test and explicitly covers the selector
case.

The same gap exists for has_aggregate_count_on_range_anywhere but is
left as-is here since it predates this PR and changing it would be
out of scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: error classification and entry preservation in aggregate-sum paths

Three small correctness improvements flagged by review:

* grovedb/src/lib.rs verify_merk_and_submerks_in_transaction: when both
  the cryptographic combined_value_hash check and the
  aggregate_consistency check fail for the same subtree path, the
  aggregate-consistency branch's `issues.insert(...)` clobbered the
  cryptographic mismatch entry. The real merk-hash chain mismatch is
  the more diagnostic message, so switch to `.entry().or_insert(...)`
  to preserve the first-inserted entry per path.

* merk/src/proofs/query/aggregate_sum.rs: the prover walks our *own*
  in-memory merk. If `aggregate_data()` refuses to surface a
  `ProvableSum` for a node in a tree we already gated as
  `ProvableSumTree`, that is local storage/state corruption, not a
  peer-supplied invalid proof. Reclassify three sites from
  `Error::InvalidProofError` to `Error::CorruptedData` to match the
  repo error-handling convention.

* grovedb/src/operations/proof/generate.rs: the two ASOR call sites
  forwarded bare `Error::MerkError` for `prove_aggregate_sum_on_range`
  failures, making them indistinguishable from surrounding
  proof-generation merk errors. Wrap each with
  `Error::CorruptedData(format!("prove_aggregate_sum_on_range failed: {}", e))`
  per the repo convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: strengthen aggregate-sum regression coverage

Address review feedback on test gaps in the ProvableSumTree /
AggregateSumOnRange suite:

* provable_sum_tree_tests::populated_provable_sum_tree_round_trips:
  the per-iteration loop fetched the parent psum but only asserted its
  variant — the running aggregate after each insert was never checked.
  Track an `expected_sum` accumulator and assert
  `as_provable_sum_tree_value() == expected_sum` after every insert so
  the running aggregate (7 → 20 → 40) is pinned down, not just the
  final sum.

* direct_insert_provable_sum_tree_with_root_key_and_sum: the test was
  named for a direct-insert path but never actually performed one — it
  built a populated template tree and inspected its on-disk shape,
  then stopped. Capture the template's root_key + sum, then run a
  batch `insert_only_known_to_not_already_exist_op` over a
  ProvableSumTree element carrying those values at a fresh top-level
  key (the non-batch insert path forbids non-empty Tree elements with
  "a tree should be empty at the moment of insertion when not using
  batches", so the documented direct-insert semantics are only
  reachable via the batch API). Assert the round-tripped element
  preserves the captured root_key + sum.

* element/tree_type.rs get_feature_type_zeros_sum_for_not_summed_in_sum_parents:
  extend the NotSummed wrapper regression to include
  `TreeType::ProvableSumTree` so the new Phase 2 sum-bearing parent
  variant's zero-sum semantics stay pinned alongside SumTree,
  BigSumTree, CountSumTree, and ProvableCountSumTree.

* tree/link.rs round_trip_aggregate_data_provable_sum_negative: pin
  down the new wire tag 7 introduced for
  `AggregateData::ProvableSum`. Encode a negative-sum reference link
  (also exercises the signed-i64 varint path), assert tag 7 is
  present in the bytes, then decode and assert the link's fields
  round-trip identically.

* aggregate_sum_query_tests::aggregate_sum_with_subquery_is_rejected_at_validation:
  previously only fed a dummy proof to the verifier-side validator,
  leaving the new prover-side gate
  (`prove_query_non_serialized` short-circuit) unasserted. Build the
  malformed PathQuery, set up the standard 15-key fixture, and assert
  `db.prove_query(...)` returns `Err` so a regression in the prover
  gate can't slip through with this test still passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: refresh ProvableSumTree doc comments

The TreeType::ProvableSumTree doc said "Phase 1: behaves identically
to SumTree everywhere except in inner_node_type /
empty_tree_feature_type. Phase 2 will diverge the hash computation."
Phase 2 has shipped — the hash dispatch now goes through
node_hash_with_sum and the new proof-node families (KVSum, KVHashSum,
KVDigestSum, KVRefValueHashSum, HashWithSum) plus the
AggregateSumOnRange query are all in place. Rewrite the comment to
describe the current (post-Phase-2) semantics: an i64 sum baked into
every node's hash, making sum tampering catchable via proof
verification, as the sum-side counterpart of ProvableCountTree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(verify): require provable tree type at aggregate query terminal layer

Security finding (Codex): the `verify_aggregate_sum_query` and
`verify_aggregate_count_query` chain walkers only checked
`element.is_any_tree()` for path elements. At the terminal (leaf) layer
this is insufficient — if the honest tree at the queried path happens
to be an EMPTY Merk-backed tree of any type (NormalTree, SumTree,
BigSumTree, CountTree, CountSumTree, ProvableCountTree,
ProvableCountSumTree, ProvableSumTree), its stored
`value_hash = combine_hash(H(element_bytes), NULL_HASH)`. The merk
verifier accepts empty proof bytes as `(NULL_HASH, 0)`, so an attacker
can construct a forged proof with:

  - layer 0: honest single-key proof of the leaf path key in its parent
  - layer 1: empty bytes (forged)

and the chain check passes uniformly. The verifier returns `sum = 0`
(or `count = 0`) against the trusted root hash, even though the leaf
isn't a Provable{Sum,Count}Tree.

The numeric answer is correct (an empty tree has sum 0 / count 0), so
this isn't a value forgery — but it IS a type-confusion soundness gap:
a caller that infers "leaf is a ProvableSumTree" from "the aggregate
verifier accepted" is deceived. The prover-side gate in
`Merk::prove_aggregate_{sum,count}_on_range` already rejects
non-provable inputs, but the verifier didn't mirror that invariant.

THE FIX

In `enforce_lower_chain`, add an `is_terminal: bool` parameter. At
intermediate depths nothing changes (`is_any_tree()` still suffices —
the GroveDB grove can route through any tree type on the way down). At
the terminal depth — passed `is_terminal = true` when
`depth + 1 == path_keys.len()` — the verifier now requires:

  - aggregate-sum:   `matches!(element, Element::ProvableSumTree(..))`
  - aggregate-count: `matches!(element, Element::ProvableCountTree(..) |
                                        Element::ProvableCountSumTree(..))`

Wrapper variants (NonCounted, NotSummed) are stripped via the existing
`into_underlying()` so they continue to work transparently.

TESTS

Three new regression tests that surgically construct the forgery from a
real honest single-key envelope and confirm the verifier now rejects:

  - `empty_leaf_type_confusion_forgery_rejected` (sum side, empty
    NormalTree at leaf)
  - `empty_provable_count_tree_at_leaf_rejected_for_sum` (sum side,
    empty ProvableCountTree at leaf — confirms type-specificity)
  - `empty_leaf_type_confusion_forgery_rejected` (count side, empty
    NormalTree at leaf)

The path == 0 case is unaffected: the merk-level hash divergence
between `node_hash` and `node_hash_with_sum` / `node_hash_with_count`
makes it computationally infeasible to forge a proof that matches the
trusted root, so the path-elements check is unnecessary at the root.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(verify): reject empty-path aggregate-sum/count queries at validation

Codex follow-up + CodeRabbit: the previous fix added a terminal-type
gate in `enforce_lower_chain`, but `verify_v0_layer` and
`verify_v1_layer` short-circuit to the leaf verifier when
`depth == path_keys.len()`. With an empty path (`path == []`) that's
true at depth 0, so the type gate is never invoked.

In practice the empty-path case is already protected by hash divergence:
the GroveDB root merk is always a `NormalTree` (built with
`Element::empty_tree()` by API), so its root_hash uses `node_hash`. An
attacker's forged proof of `HashWithSum` / `HashWithCount` ops would
reconstruct via `node_hash_with_sum` / `node_hash_with_count` — distinct
hash functions, no collision. So the caller's root-hash compare catches
the forgery cryptographically.

But the defense-in-depth principle says: don't rely on the cryptographic
divergence implicitly. Reject up-front, before any proof handling.
PathQuery::validate_aggregate_{sum,count}_on_range now check
`self.path.is_empty()` and return a clear InvalidQuery error naming why
(root is always NormalTree, no valid Provable* target at root).

The check fires at the entry of `verify_aggregate_{sum,count}_query`
(which call `validate_*` first thing) and at `prove_query` (the
generator also validates the path query before dispatch).

TESTS

  - `empty_path_aggregate_sum_rejected_at_validation`
  - `empty_path_aggregate_count_rejected_at_validation`

Both pin the rejection at both the PathQuery validator and the verify
entrypoint. 2964 workspace tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(merk,grovedb): add no-proof query_aggregate_sum entry point

Mirrors PR #662's `query_aggregate_count` for the signed-sum side.
Callers that need a sum value but not a proof (e.g. server handlers
answering `prove=false` sum requests) can now bypass proof
construction, serialization, and verification entirely.

The merk-level walk is `O(log n + |boundary|)` in the number of
distinct keys, identical complexity to the prover but without the
proof-op allocations or hash recomputations. The signed-sum
arithmetic carries the same `i128` accumulator the prover and
verifier use (so adversarial intermediate sums never wrap), and
narrows to `i64` at the public entry point. An out-of-i64 result is
classified as `Error::CorruptedData` since a real `ProvableSumTree`
maintains every aggregate as `i64` at every level.

NEW APIS

- `Merk::sum_aggregate_on_range(&inner_range, grove_version)
   -> CostResult<i64, Error>` in `merk/src/merk/get.rs`. Checks
  `tree_type == ProvableSumTree`; rejects any other tree type with
  `Error::InvalidProofError`. Returns 0 for an empty merk.
- `RefWalker::sum_aggregate_on_range(&inner_range, grove_version)`
  in `merk/src/proofs/query/aggregate_sum.rs`. Walks the same
  Contained / Disjoint / Boundary classification path as
  `create_aggregate_sum_on_range_proof`, but emits no proof ops.
- `GroveDb::query_aggregate_sum(path_query, transaction, grove_version)
   -> CostResult<i64, Error>` in `grovedb/src/operations/get/query.rs`.
  Validates the PathQuery up-front via
  `validate_aggregate_sum_on_range` (same gate the prover and
  verifier use — catches malformed ASOR queries plus the
  empty-path rejection from the prior commit before any storage
  reads), opens the leaf merk at `path_query.path`, and delegates
  to the merk-level walk.
- New `query_aggregate_sum_on_range` field on
  `GroveDBOperationsQueryVersions`, wired through v1/v2/v3 at
  version `0`.

NotSummed-correctness is preserved via the same
`own_sum = node_sum - left_struct - right_struct` derivation the
prover uses. NotSummed-wrapped subtrees have stored aggregate 0, so
the subtraction yields 0 at the wrapper boundary - they do not
contribute to the in-range total.

The returned sum is **not** independently verifiable: callers are
trusting their own merk read path. For a verifiable sum, continue
using `prove_query` + `verify_aggregate_sum_query`. Documented
explicitly on both entry points.

TESTS

- 10 new merk-level cross-checks
  (`merk/src/proofs/query/aggregate_sum.rs::tests`): each range
  variant against `prove_aggregate_sum_on_range`'s computed sum,
  plus empty-merk-returns-0, NormalTree rejection, ProvableCountTree
  rejection (precise tree-type match, not "any provable aggregate
  tree"), and a mixed-positive/negative scenario that exercises the
  signed `own_sum` subtraction.
- 11 new GroveDB-level cross-checks
  (`grovedb/src/tests/aggregate_sum_query_tests.rs::tests`): every
  range shape on a populated `ProvableSumTree`, empty subtree
  returns 0, negative-sum scenario, invalid-inner-range
  (`Key`) rejected with `InvalidQuery`, empty-path rejected with
  `InvalidQuery`, NormalTree leaf rejected with `MerkError` from
  the merk-level gate.

Workspace `cargo test --all-features`: 2985 passing / 0 failing
(was 2964 / 0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(query_item): mirror AggregateCountOnRange tests for AggregateSumOnRange

Adds parallel coverage for the variant-11 dispatch paths in encode/
decode_with_depth/borrow_decode_with_depth/Display/serde, plus helper
accessors (lower_bound, upper_bound, is_aggregate_sum_on_range,
aggregate_sum_inner, ...). Mirrors the existing AggregateCountOnRange
test set so each match arm has direct exercise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(element/helpers): per-variant flag-accessor coverage

Adds direct round-trip tests of get_flags_owned / get_flags_mut /
set_flags on every aggregate-bearing variant (CountSumTree,
ProvableCountTree, ProvableCountSumTree, ProvableSumTree,
ItemWithSumItem, CommitmentTree, MmrTree, BulkAppendTree,
DenseAppendOnlyFixedSizeTree, BigSumTree, CountTree) plus
NotSummed / NonCounted delegation arms. Each test pins one match arm.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(grovedb): unit coverage for aggregate_consistency_labels

Direct unit tests for the previously untestable internal helper.
Each aggregate-bearing tree variant now has both a matching (returns
None) and mismatching (returns Some) case, plus tests for the
empty-merk identity arms (zero-recorded with NoAggregateData),
non-Merk data tree arms (always None), and the catch-all
variant/shape mismatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(grovedb/proof): verifier error-path coverage for aggregate-sum

Adds 10 mutation-style tests for the GroveDB-side aggregate-sum
verifier (verify_v0_layer / verify_v1_layer / verify_sum_leaf /
verify_single_key_layer_proof_v0 / enforce_lower_chain). Each test
pins one previously-uncovered error arm:

  - V1 unexpected non-merk leaf bytes
  - V0 and V1 missing lower_layer for path key
  - Malformed leaf sum proof (Phase 1 rejection)
  - Corrupted non-leaf merk bytes (single-key proof failure)
  - Non-leaf proof without target key
  - KV replaced by KVDigest in non-leaf (no value bytes)
  - Undeserializable value bytes on the path
  - Intermediate-tree non-tree element rejection
  - Unparsable envelope bincode decode error

Plus mirrors of the count-side AggregateSumOnRange rejection tests
in proof/generate.rs for dense/MMR/BulkAppend index helpers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(merk/aggregate_sum): unit coverage for helpers and edge cases

Direct unit tests for previously-uncovered internal helpers in
merk/src/proofs/query/aggregate_sum.rs:

  - provable_sum_from_aggregate Err arm for every non-ProvableSum
    AggregateData variant (CorruptedData classification check)
  - provable_sum_from_aggregate happy path including i64::MIN/MAX
  - is_provable_sum_bearing false for every non-ProvableSumTree
    TreeType variant
  - classify_subtree additional disjoint-above / contained-within /
    boundary-overlapping-upper cases
  - key_strictly_inside unbounded endpoint and equality cases
  - empty ProvableSumTree prove+verify round-trip

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(non-merk trees): aggregate-sum/count rejection in index helpers

Adds parallel-variant rejection tests in the BulkAppendTree and
dense fixed-size Merkle tree proof modules. Both tree types have
no count or sum commitment in their node hash, so their
index-resolution helpers reject AggregateCountOnRange and
AggregateSumOnRange query items outright. This exercises the
previously-uncovered rejection arms in both proof modules.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: per-variant tree_type extensions + sum-proof Display arms

Two small targeted coverage additions:

  - merk/src/element/tree_type.rs: direct per-variant tests for the
    ProvableSumTree / CommitmentTree / BulkAppendTree /
    DenseAppendOnlyFixedSizeTree / MmrTree arms of
    root_key_and_tree_type, tree_flags_and_type, tree_type,
    maybe_tree_type, and tree_feature_type, plus a
    ProvableSumTree-through-NotSummed delegation test.

  - grovedb/src/tests/aggregate_sum_query_tests.rs: tests that
    drive node_to_string's KVSum / KVHashSum / KVDigestSum /
    HashWithSum / KVRefValueHashSum Display arms in
    grovedb/src/operations/proof/mod.rs by formatting real
    ProvableSumTree proofs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* nit(coderabbit): doc-comment + test + error-wrap polish

Four low-value but clean tweaks from CodeRabbit on PR #661:

- `grovedb-query/src/query_item/mod.rs`: refresh the stale
  `NonAggregateInner::deserialize` inline comment to mention both
  excluded aggregate variants (Count + Sum), matching the struct-level
  doc and `NON_AGGREGATE_VARIANTS`.
- `grovedb/src/tests/aggregate_sum_query_tests.rs`: drop the redundant
  disjunction `msg.contains("must be a ProvableSumTree") ||
  msg.contains("ProvableSumTree")` — the first clause already implies
  the second; pin the exact phrase.
- `grovedb/src/tests/aggregate_sum_query_tests.rs`: harden
  `provable_sum_tree_overflow_at_i64_max_is_rejected` so it no longer
  silently passes when insert AND prove AND verify all accept an
  overflow. Replace the early-return-on-both-inserts-accepted with an
  explicit "at least one stage must reject" assertion.
- `grovedb/src/operations/get/query.rs`: wrap the MerkError from
  `Merk::sum_aggregate_on_range` (and the count sibling) with
  contextual `CorruptedData(format!("query_aggregate_{sum,count} at
  path {:?}: {}", path_slices, e))` per the repo
  error-wrapping convention. Two test assertions updated from
  `MerkError(_)` to `CorruptedData(_)` to match.

Workspace `cargo test --all-features`: 3102 pass / 0 fail (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: clarify appendix-a Element/TreeType discriminant columns

CodeRabbit flagged the TreeType column for ProvableSumTree as "should be
17" — the technical claim was wrong (11 is the correct TreeType
discriminant per `merk/src/tree_type/mod.rs:77`), but the confusion was
fair: the column header "TreeType" was ambiguous and the table had
several pre-existing inaccuracies in adjacent rows. This commit fixes
the ambiguity AND the bugs.

Changes:
- Rename column header from "TreeType" to "TreeType disc" and add an
  intro paragraph explaining that "Element disc" and "TreeType disc"
  are discriminants of two SEPARATE enums.
- Add the TreeType-variant label to every tree row for consistency
  (some had it, most didn't). The new format is `N (VariantName)` —
  e.g. `5 (ProvableCountTree)` — which CodeRabbit-style auto-review
  can't misread.
- Fix three pre-existing wrong TreeType disc values:
    `BigSumTree`:        4 -> 2
    `CountTree`:         2 -> 3
    `CountSumTree`:      3 -> 4
  (These were drift from the actual `TreeType::discriminant()`
  implementation; the file had `4 (BigSumTree)` etc. but those labels
  were wrong.)
- Swap the row order at Element discriminants 8 and 9 to match the
  actual `Element` enum order:
    8 = `ProvableCountTree`  (was incorrectly listed as `ItemWithSumItem`)
    9 = `ItemWithSumItem`    (was incorrectly listed as `ProvableCountTree`)
- Tighten the `ProvableCountSumTree` Purpose blurb to note "only count
  in hash" since the sum is tracked metadata, not bound — this is the
  half-step variant a future `ProvableCountAndSumTree` would replace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(verify): accept Sum-family boundary nodes in range bound checks

Codex security finding: the regular query verifier's lower- and
upper-bound `last_push` match arms in
`merk/src/proofs/query/verify.rs::execute_proof` (lines 206 / 241)
accept the Count-family boundary node variants (`KVCount`,
`KVDigestCount`, `KVRefValueHashCount`) but omit the parallel Sum
family (`KVSum`, `KVDigestSum`, `KVRefValueHashSum`). Regular proofs
against a `ProvableSumTree` can legitimately emit `KVDigestSum` as the
absence-boundary node for a queried key, so a multi-item query like
`Key("aa")` followed by `Range("g".."j")` would reject the perfectly
valid proof with `Cannot verify lower bound of queried range` whenever
the preceding boundary happened to be sum-flavored.

The downstream absence check at line ~572 already handled all six
node types (Count + Sum), making the omission an asymmetry between
the two checks within the same function.

THE FIX

Add `KVSum`, `KVDigestSum`, `KVRefValueHashSum` to both the lower- and
upper-bound `last_push` match arms. While at it, also extend
`boundaries_in_proof` (line ~742) to surface `KVDigestSum` boundary
keys alongside `KVDigest` and `KVDigestCount` — same class of
omission, same trivial extension.

TESTS

New `provable_sum_tree_bound_regression_tests` module at the bottom of
`verify.rs` covering:

  - `key_plus_range_on_provable_sum_tree_left_to_right_verifies` — the
    exact `[Key("aa"), Range("g".."j")]` shape Codex flagged, in
    forward iteration. Without the fix this returns
    `InvalidProofError("Cannot verify lower bound of queried range")`.
  - `key_plus_range_on_provable_sum_tree_right_to_left_verifies` —
    same query with `left_to_right = false`. The bug is symmetric, so
    the regression coverage is too.
  - `kv_digest_sum_appears_in_boundaries_in_proof` — proves that
    `boundaries_in_proof` now surfaces `KVDigestSum`-flavor boundary
    keys produced by `ProvableSumTree` proofs.

Workspace `cargo test --all-features`: 3150 pass / 0 fail (was 3147 /
0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(query): split aggregate validator error label per count/sum variant

Develop landed a regression test
(`query_validation_error_to_static_str_projects_invalid_operation_and_catches_other_variants`,
commit 7a649386) that pins the catch-all fallback string returned by
`query_validation_error_to_static_str` to
`"AggregateCountOnRange query validation failed"`. This PR had
generalised the helper to serve both count and sum, returning
`"aggregate query validation failed"`, which broke the develop test
under GitHub's "merge into base" CI workflow.

Split the helper into two so each aggregate variant's error surface
stays self-describing:

  - `query_validation_error_to_static_str` — count side, restored to
    the `"AggregateCountOnRange query validation failed"` label so
    develop's regression test stays green.
  - `sum_query_validation_error_to_static_str` — new sum-side helper
    returning `"AggregateSumOnRange query validation failed"`. Used by
    `SizedQuery::validate_aggregate_sum_on_range`.

Both follow the same projection contract: `InvalidOperation(msg)`
passes the static string through unchanged; any other variant
(unreachable from real validators) gets the variant-specific
fallback.

No behavior change at the InvalidOperation happy path, which is all
real callers reach.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(verify): key_exists_as_boundary_in_proof must accept KVDigestSum

CodeRabbit symmetry finding on top of commit 5b3a0460: that fix
extended `boundaries_in_proof` to recognize `KVDigestSum` boundary
nodes from `ProvableSumTree` proofs, but missed the parallel helper
`key_exists_as_boundary_in_proof`. The two public helpers are
documented to behave identically; without this both helpers
disagreed on valid `ProvableSumTree` absence proofs.

Add `Op::Push(Node::KVDigestSum(..))` and the PushInverted variant
to the match in `key_exists_as_boundary_in_proof`. Tighten the
doc-comment to spell out that the two helpers share node-type
coverage.

Extended the regression test
`kv_digest_sum_appears_in_boundaries_in_proof` (now renamed to
`kv_digest_sum_appears_in_both_boundary_helpers`) so every boundary
key surfaced by `boundaries_in_proof` is also reported by
`key_exists_as_boundary_in_proof`, pinning the symmetry.

Workspace `cargo test --all-features` for the affected module: 3 of
3 regression tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(comments): strip implementation-phase labels from ProvableSumTree code

Remove "Phase 1 / Phase 2 / etc." prefixes that referred to the PR's
implementation timeline. Retains "Phase 1 / Phase 2" labels that describe
runtime decode-vs-walk algorithm steps of the aggregate-count/sum verifiers
(documented in docs/book/src/aggregate-count-queries.md). In the
test-fixture stack-builder (merk/src/proofs/tree.rs) and the
provable_sum_tree direct-insert test, renamed enumeration-style "Phase N"
labels to "Step N" for clarity. Also renamed the phase2_* test fn prefixes
in encoding.rs and tree.rs to drop the timeline label.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(serde): QueryItem Serialize emits snake_case variant tags

CodeRabbit re-flagged a pre-existing asymmetry in `QueryItem`'s manual
serde implementations: the `Serialize` impl emitted PascalCase variant
tags via `serialize_newtype_variant` (e.g. `"AggregateSumOnRange"`,
`"Range"`), but the `Deserialize` impl uses `Field` enums marked
`#[serde(field_identifier, rename_all = "snake_case")]`, expecting
snake_case (`"aggregate_sum_on_range"`, `"range"`).

The asymmetry was invisible to bincode (the format GroveDB actually
uses in proofs and storage) because bincode identifies variants by
index, not by the textual tag. But it broke round-trip through every
text-based format that carries variant names verbatim (JSON, YAML,
TOML). An in-code comment at the existing token-stream test site
even documented the issue as "a pre-existing mismatch ... that
breaks JSON round-trip but is invisible to formats that don't carry
variant names textually."

THE FIX

Change every `serialize_newtype_variant` (and the one
`serialize_unit_variant` for `RangeFull`) call in
`grovedb-query/src/query_item/mod.rs` to emit snake_case variant
tags. The variant indices stay the same so the bincode wire format
is unchanged — only textual formats see the new tag names.

Affected variants: `Key`, `Range`, `RangeInclusive`, `RangeFull`,
`RangeFrom`, `RangeTo`, `RangeToInclusive`, `RangeAfter`,
`RangeAfterTo`, `RangeAfterToInclusive`, `AggregateCountOnRange`,
`AggregateSumOnRange` — i.e. every variant, not just the aggregate
ones. This is the symmetric fix; doing only the new
`AggregateSumOnRange` variant would have diverged it from the
existing `AggregateCountOnRange` (and from the other ten variants
that have always been broken the same way).

Also updated the in-code comment at the token-stream test site to
reflect the new contract.

TESTS

Two new `serde_test::assert_tokens` round-trip regression tests pin
the snake_case contract on both aggregate variants:

  - serde_round_trip_aggregate_sum_on_range_uses_snake_case_tag
  - serde_round_trip_aggregate_count_on_range_uses_snake_case_tag

assert_tokens exercises both Serialize AND Deserialize against the
same token stream, so any future regression on either side fails the
test immediately.

Workspace cargo test --all-features: 3152 pass / 0 fail (was 3150 / 0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(aggregate-sum): mirror PR #663 — split into subdir, reject V0 envelopes

V0 (`MerkOnlyLayerProof`) envelopes predate the aggregate-sum feature and
cannot legitimately carry an aggregate-sum proof, so the V0 layer walker
was unreachable in any honestly-produced proof. Mirror the count-side
changes from PR #663:

  - Convert `aggregate_sum.rs` into a subdirectory mirroring
    `aggregate_count/`: `mod.rs` (public API + V0 rejection),
    `helpers.rs` (envelope decode, single-key layer verification,
    chain enforcement, leaf sum verification), `leaf_chain.rs` (V1
    leaf-chain walker). Removes the dead `verify_v0_layer` path.

  - Add a prover-side V0 gate in `prove_query_non_serialized`: when
    grove_version dispatches to V0 and the path query carries an
    `AggregateSumOnRange` anywhere, return `NotSupported` instead of
    emitting a V0 envelope the verifier would (correctly) reject.

  - Update tests: replace the V0 round-trip test with a V0-rejection
    test; broaden the empty-leaf type-confusion test to accept either
    the V0-rejection or the terminal-type-gate error; remove the now-
    unreachable V0 missing-lower-layer test (V1 counterpart already
    pins the missing-layer behavior); refresh stale doc-comments that
    pointed at `aggregate_sum.rs` line numbers.

No carrier-shape support yet — `AggregateSumOnRange` is still leaf-only
on the merk side, so there is no `classification.rs` / `per_key.rs`
mirror. The leaf-only shape can be extended later in parallel with
matching merk-level work, just like the count side did.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* harden(aggregate-sum): strict lower_layers shape + V1 terminal-type test

Addresses CodeRabbit findings on PR #661.

**Finding 1 (Major, fixed)** — `verify_v1_leaf_chain` accepted arbitrary
`lower_layers` entries at non-leaf depths and under the leaf merk. That
let two byte-distinct envelopes verify to the same `(root, sum)` because
the smuggled siblings/children were never inspected, and gave downstream
syntactic scanners an unverified surface to read from. Added a strict
shape gate that mirrors the honest prover (`prove_aggregate_sum_on_range`
short-circuit always emits `lower_layers: BTreeMap::new()` at the leaf,
and each path-prefix wrapper inserts exactly one entry):

  - At `depth == path_keys.len()` (leaf merk): require
    `lower_layers.is_empty()`.
  - At non-leaf depths: require `lower_layers.len() == 1` and the sole
    key to equal the expected descent key `path_keys[depth]`.

Two new regression tests (`sum_v1_envelope_with_extra_lower_layer_*`
and `sum_v1_envelope_with_lower_layers_under_leaf_*`) construct
byte-modified envelopes that the gate rejects.

**Finding 4 (nitpick, addressed)** — the existing empty-leaf
type-confusion tests build V0 envelopes and now hit the V0-rejection
gate before the terminal-type gate in `enforce_lower_chain` runs.
Added `empty_leaf_type_confusion_forgery_rejected_under_v1_envelope`
which builds the same forgery under a V1 `LayerProof` envelope so the
terminal-type gate fires directly. The test asserts the specific
"must be a ProvableSumTree" error from `enforce_lower_chain` so
future refactors that drop the gate are caught.

**Finding 2 (Major, skipped with reason)** — CodeRabbit asks to make
`hash_for_link` fail-closed (panic/Err) when a `ProvableSumTree` node's
`aggregate_data()` doesn't return `ProvableSum`. The current fallback
to `self.hash()` is identical across all three `Provable*` variants
(`ProvableCountTree`, `ProvableCountSumTree`, `ProvableSumTree`) and
also appears in commit-time dispatch (lines 1233-1304). Fixing only the
sum arm creates asymmetry with the count side; the broader refactor
(plus the matching dispatch-centralization in Finding 3) is out of
scope for this sum-feature PR and is documented in MEMORY M1 as
intentional.

**Finding 3 (nitpick, skipped with reason)** — the duplicated
`AggregateData → hash` dispatch in `merk/src/tree/mod.rs` predates this
PR and applies to all three `Provable*` variants. Centralizing it
would touch hot proof-emission paths; out of scope here.

Also updated `sum_v1_envelope_with_missing_lower_layer_is_rejected`
to accept the new strict-shape error message — removing an entry now
trips the shape gate first instead of the older missing-layer arm.
Both messages pin the same property.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(aggregate-sum): split strict-shape gate + cover trailing-bytes / wrong-key

After splitting the strict-shape gate in \`verify_v1_leaf_chain\` into
two reachable error arms (entry-count vs. entry-key), the
ok_or_else-arm on the subsequent \`lower_layers.get(&next_key)\` became
naturally reachable for the wrong-key case (single entry but under
an unexpected key). Previously the combined-OR gate made both error
sites mutually exclusive, leaving the get-arm dead.

Coverage impact (aggregate_sum/ subdir, local tarpaulin):
  - leaf_chain.rs: 36/41 (88%) -> 44/44 (100%)
  - Subdir total:  ~80%       -> 89.94%

New tests in aggregate_sum_query_tests:
  - sum_v1_envelope_with_wrong_keyed_lower_layer_is_rejected — single
    \`lower_layers\` entry under \"impostor\" instead of \"st\". Exercises
    the key-shape gate distinctly from the count-shape gate.
  - sum_proof_with_trailing_bytes_is_rejected — mirror of
    \`aggregate_count_proof_with_trailing_bytes_is_rejected\`. Pins the
    canonical-decode invariant in \`decode_grovedb_proof\`.

Tightened assertion in
\`sum_v1_envelope_with_extra_lower_layer_is_rejected\` to match the new
\"lower-layer entries at depth N\" error string, and broadened the
assertion in \`sum_v1_envelope_with_missing_lower_layer_is_rejected\`
to accept either the old missing-layer message or the new
entry-count message (removing the only entry trips the count gate
first).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(proof): hoist canonical proof decoder to operations/proof/mod.rs

The same `decode_grovedb_proof` function lived in both
`aggregate_count/helpers.rs` and `aggregate_sum/helpers.rs`, differing
only in whether the error string named "aggregate-count" or
"aggregate-sum" as the offending shape. Two copies of the same
canonical-decode contract is a maintenance hazard — drift between the
two would mean one aggregate path could (e.g.) accept trailing bytes
that the other rejects, breaking the equality-by-bytes assumption the
contract is meant to guarantee.

Hoist the function to `operations/proof/mod.rs` as
`decode_grovedb_proof_canonical` with `pub(super)` visibility. Both
helper modules now call `super::decode_grovedb_proof_canonical`. The
error message generalizes from "aggregate-{count,sum} proof has N
trailing bytes" to "proof has N trailing bytes" since the call site
provides the surrounding context; the existing
`*_proof_with_trailing_bytes_is_rejected` tests assert
`msg.contains("trailing bytes")` and remain green.

No behavior change beyond the wording adjustment in the trailing-
bytes error. Tests: workspace 3088 / 0 fail; aggregate tests
223 / 0 fail (113 sum + 75 count + 35 others).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* harden(proof): general verify paths now use canonical decoder

Migrate the four `bincode::decode_from_slice(...)?.0` sites in
`operations/proof/verify.rs` (`verify_query_with_options`,
`verify_query_get_parent_tree_info_with_options`, `verify_query_raw`,
`verify_trunk_chunk_proof`) to call
`super::decode_grovedb_proof_canonical(proof)?` — the same canonical
decoder the aggregate-count and aggregate-sum entry points already use.

Before this change, the general verifier silently accepted trailing
bytes after the encoded `GroveDBProof` envelope, because the `.0`
discarded the bincode `consumed` count. The aggregate-count and
aggregate-sum entry points rejected trailing bytes via their own
private decoder. That asymmetry meant the same logical proof could
have many distinct byte encodings through the general-verify surface
but only one through aggregate-verify — a malleability gap that
breaks any equality-by-bytes / caching / dedup assumption a consumer
might rely on. The cryptographic chain still bound the answer, so
this wasn't a soundness break.

The general-verify path now matches: trailing bytes are rejected with
`Error::CorruptedData("proof has N trailing bytes after the encoded
envelope")`.

Added `verify_query_rejects_proof_with_trailing_bytes` in
`proof_advanced_tests.rs` to lock the new behavior in place, mirror
of `sum_proof_with_trailing_bytes_is_rejected` and
`aggregate_count_proof_with_trailing_bytes_is_rejected`.

Tests: workspace 3089 / 0 fail (3088 + 1 new regression test).
No honest test/proof emitted trailing bytes, so no existing test
needed updating.

Co-Authored-…
QuantumExplorer and others added 2 commits May 17, 2026 10:07
Resolves conflicts with develop's PR #661 (Element::ProvableSumTree +
AggregateSumOnRange query). PR #661 added another new Element variant
at byte 19 — the third consecutive develop-side enum landing that has
collided with this PR's cidx discriminants.

**Third on-disk discriminant collision fixed.** Develop's
`ElementType::ProvableSumTree = 19` collides with this PR's prior
post-merge `ElementType::CountIndexedTree = 19`. Same shape as the
prior two rounds (NotCountedOrSummed at 17, then ReferenceWithSumItem
at 18).

Resolution: shift cidx discriminants by another 1 and re-derive the
NonCounted twin bytes:
  - `ElementType::CountIndexedTree`              19 → 20
  - `ElementType::ProvableCountIndexedTree`      20 → 21
  - `ElementType::NonCountedCountIndexedTree`   147 → 148 (`0x80 | 20`)
  - `ElementType::NonCountedProvableCountIndexedTree`
                                                148 → 149 (`0x80 | 21`)

`TreeType` discriminants also shift (develop added `ProvableSumTree = 11`):
  - `TreeType::CountIndexedTree`             11 → 12
  - `TreeType::ProvableCountIndexedTree`     12 → 13

The `Element` enum variant order is updated to match: `ProvableSumTree`
keeps variant index 19, cidx variants move to 20 and 21. NonCounted
inner-byte allowlist becomes `0..=14 | 18 | 19 | 20 | 21`. NonCounted
twin range expands to `[128, 149]`. The base-variant round-trip
canonical test grows from 17 → 19 cases.

Other resolutions union match arms in:
  - `grovedb-element/element/helpers.rs` (is_non_empty_merk_tree)
  - `grovedb-element/element/mod.rs` (Element enum, ElementShadow,
    element_type() dispatch)
  - `grovedb-element/element_type.rs` (display strings, try_from,
    NonCounted wrapper allowlist, from_serialized_value tests, range
    tests grown to 19 cases)
  - `merk/element/costs.rs` (get_specialized_cost, value_defined_cost)
  - `merk/element/tree_type.rs` (get_feature_type's CountIndexedTree
    and ProvableCountIndexedTree arms preserved alongside develop's
    ProvableSumTree)
  - `merk/tree_type/mod.rs` (9 conflicts unioning ProvableSumTree
    alongside cidx variants — discriminant, try_from, Display,
    allows_sum_item, inner_node_type, empty_tree_feature_type,
    to_element_type, and the invalid-discriminant test)
  - `merk/tree_type/costs.rs` (cost_size)
  - `merk/tree/mod.rs` (re-export hash both `combine_hash_three` and
    `node_hash_with_sum`)
  - `grovedb/operations/get/query.rs` (two QueryItemOrSumReturnType
    dispatches — both gained ProvableSumTree alongside cidx arms)
  - `grovedb/operations/proof/generate.rs` (Element variant-list match
    in result-truncation count)
  - `docs/book/src/SUMMARY.md` (chapter ordering)
  - `docs/book/src/appendix-a.md` (rebuilt the canonical
    discriminant table with new byte layout)

And the post-merge non-exhaustive-match fix:
  - `grovedb/operations/count_indexed_tree.rs:438` — added
    `ProvableSumTree` to the subtree-insert arm.

The `mod tests` name collision in `merk/src/tree/hash.rs` (develop's
new `node_hash_with_sum_tests` block) was renamed to avoid colliding
with the existing `combine_hash_three_tests`-style block from this PR.

**KNOWN REGRESSION:** Four cidx batch-apply tests fail post-merge
with H1-A drift on cidx-primary CountTree children:
  - batch_apply_two_inserts_into_cidx_propagation_visits_cidx_level
  - batch_insert_multiple_items_into_same_cidx_primary_propagates_correctly
  - cidx_batch_pipeline_exercises_propagation_and_query
  - cidx_batch_with_provable_cidx_propagation_round_trip
The drift surfaces as identical (stored_value_hash, computed_value_hash)
for both children, suggesting the batch path now stores the same hash
for distinct children when the parent is a cidx primary. The direct
`insert_into_count_indexed_tree` API still works; the failure is
specific to the batch path interacting with PR #661's
`AggregateData::ProvableSum` plumbing. Requires follow-up investigation
into how `hash_for_link` / `aggregate_data` flows through the cidx
primary's apply pipeline now that develop has tightened the
`expect(...)` panics on `aggregate_data()` results. The remaining
~3000 tests pass.

Verified:
- `cargo check -p grovedb` clean
- `cargo fmt --all` applied

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Develop's PR #661 added a generic `aggregate_consistency_labels` check in
`verify_grovedb`: for every Merk-backed tree element, the recorded
aggregate field on the element bytes must match the inner Merk's actual
`aggregate_data()`. The check emits an issue (with a blake3-hashed
placeholder label) when they disagree.

This check is invalid for children of a `CountIndexedTree` /
`ProvableCountIndexedTree` primary. In a cidx primary, each child's
`count_value` field is the **cidx index key** the child sorts under in
the count-ordered secondary index — set explicitly by the caller (e.g.
`Element::new_count_tree_with_flags_and_count_value(None, 42, None)`).
It is NOT a claim about the child's inner-Merk count. The cidx-specific
consistency check at the primary level already validates that each
child's recorded count agrees with the corresponding secondary index
entry.

Without this fix, every cidx primary child with non-zero `count_value`
and an empty inner Merk falsely reports as inconsistent. Worse, the
placeholder labels in `aggregate_consistency_labels`' catch-all arm
mention only the variant string ("count tree") not the values, so all
such children collide on the **same blake3 hash** — making the failure
output look like every child has identical drift, masking the real
cause.

Fix: in `verify_merk_and_submerks_in_transaction`, gate the
aggregate-consistency check on `!merk.tree_type.is_count_indexed_primary()`
in addition to the existing `!uses_non_merk_data_storage()` guard. The
cryptographic H1-A check above still runs (and validates the cidx
chain), and the cidx primary-level consistency checks at lines
~1380-1497 still catch any real cidx drift.

Tests:
- `batch_apply_two_inserts_into_cidx_propagation_visits_cidx_level`
- `batch_insert_multiple_items_into_same_cidx_primary_propagates_correctly`
- `cidx_batch_pipeline_exercises_propagation_and_query`
- `cidx_batch_with_provable_cidx_propagation_round_trip`

All 4 failing tests now pass. Full workspace lib test run: 3200+ tests,
0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant