memtable: Shrink sparse raft-engine memtable entry index buffers more proactively after compaction. #401
Conversation
… proactively after compaction. Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
📝 WalkthroughWalkthroughEntry-index shrinking now uses utilization and capacity-planning gates: it computes Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/memtable.rs (1)
69-74: Document why these thresholds are written asN - 1.
128 - 1and64 - 1look like accidental off-by-one values at first glance. A short note explaining the intent would make this harder to “clean up” incorrectly later.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/memtable.rs` around lines 69 - 74, Explain in a short doc comment above the three constants that the subtraction (e.g. `128 - 1`, `64 - 1`) is intentional to produce an inclusive threshold (power-of-two minus one) rather than an accidental off-by-one; mention that this form makes the intent explicit for comparisons that use <= or bitmasking and prevents future accidental simplification to just `128` or `64`. Update the comment above CAPACITY_SHRINK_THRESHOLD, CAPACITY_SHRINK_UTILIZATION_FACTOR, and CAPACITY_SHRINK_MIN_RECLAIM to state this rationale succinctly so maintainers understand the deliberate `N - 1` choice.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/memtable.rs`:
- Around line 69-74: Explain in a short doc comment above the three constants
that the subtraction (e.g. `128 - 1`, `64 - 1`) is intentional to produce an
inclusive threshold (power-of-two minus one) rather than an accidental
off-by-one; mention that this form makes the intent explicit for comparisons
that use <= or bitmasking and prevents future accidental simplification to just
`128` or `64`. Update the comment above CAPACITY_SHRINK_THRESHOLD,
CAPACITY_SHRINK_UTILIZATION_FACTOR, and CAPACITY_SHRINK_MIN_RECLAIM to state
this rationale succinctly so maintainers understand the deliberate `N - 1`
choice.
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/memtable.rs (1)
627-640: Consider making reclaim math explicit for readability.The logic looks correct; a small cleanup can make the “minimum reclaim” gate easier to scan and maintain.
♻️ Suggested refactor
fn maybe_shrink_entry_indexes(&mut self) { let len = self.entry_indexes.len(); let capacity = self.entry_indexes.capacity(); let target = std::cmp::max(len.saturating_add(CAPACITY_INIT), CAPACITY_INIT); + let reclaimable = capacity.saturating_sub(target); if capacity < CAPACITY_SHRINK_THRESHOLD || len.saturating_mul(CAPACITY_SHRINK_UTILIZATION_FACTOR) > capacity - || capacity <= target.saturating_add(CAPACITY_SHRINK_MIN_RECLAIM) + || reclaimable <= CAPACITY_SHRINK_MIN_RECLAIM { return; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/memtable.rs` around lines 627 - 640, The capacity-shrink gating is correct but hard to read—refactor the condition in the method that manipulates self.entry_indexes (using symbols: entry_indexes, CAPACITY_INIT, CAPACITY_SHRINK_THRESHOLD, CAPACITY_SHRINK_UTILIZATION_FACTOR, CAPACITY_SHRINK_MIN_RECLAIM, target, len, capacity, shrink_to) by computing named intermediate values (e.g., target, reclaimable = capacity.saturating_sub(target), utilization_limit = capacity / CAPACITY_SHRINK_UTILIZATION_FACTOR or len.saturating_mul(...), and a boolean for min_reclaim_ok) and then use those named booleans in a single if that returns early; this makes the “minimum reclaim” gate explicit and improves readability without changing behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/memtable.rs`:
- Around line 627-640: The capacity-shrink gating is correct but hard to
read—refactor the condition in the method that manipulates self.entry_indexes
(using symbols: entry_indexes, CAPACITY_INIT, CAPACITY_SHRINK_THRESHOLD,
CAPACITY_SHRINK_UTILIZATION_FACTOR, CAPACITY_SHRINK_MIN_RECLAIM, target, len,
capacity, shrink_to) by computing named intermediate values (e.g., target,
reclaimable = capacity.saturating_sub(target), utilization_limit = capacity /
CAPACITY_SHRINK_UTILIZATION_FACTOR or len.saturating_mul(...), and a boolean for
min_reclaim_ok) and then use those named booleans in a single if that returns
early; this makes the “minimum reclaim” gate explicit and improves readability
without changing behavior.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #401 +/- ##
==========================================
+ Coverage 97.78% 97.82% +0.04%
==========================================
Files 33 33
Lines 11315 11400 +85
==========================================
+ Hits 11064 11152 +88
+ Misses 251 248 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: glorv, overvenus The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
… proactively after compaction. (#401) (#402) close tikv/tikv#19544 Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Problem
Close tikv/tikv#19544
raft-enginememtable memory can grow slowly over time even when workload and region count stay relatively stable.A notable source is
MemTable::entry_indexes: after entries are compacted away, the underlying deque may continue to retain a large amount of spare capacity. The previous shrink policy only triggered once capacity grew beyond1023, and when triggered it usedshrink_to_fit(), which was coarse and could still leave room for improvement in long-running stable clusters.Solution
This PR makes entry index shrinking more proactive, but keeps it conservative enough to avoid unnecessary capacity churn.
The new policy still only runs after
compact_to(), and shrinksentry_indexesonly when all of the following are true:Instead of shrinking all the way down to
len, the new logic keeps a small headroom window (len + CAPACITY_INIT) so a freshly compacted memtable does not immediately regrow on the next small append burst.In short, this change targets long-lived sparse memtables while avoiding a simple “lower the threshold and always shrink harder” strategy.
Why this should not introduce a meaningful performance regression
The new logic does not run on the append hot path. It is only evaluated after
compact_to(), and the fast path is just a few integer checks.Actual reallocations become possible earlier than before, but only for clearly sparse memtables. The added utilization gate, minimum reclaim gate, and retained headroom together are intended to avoid frequent shrink/regrow oscillation.
This means the tradeoff is:
Given that the logic stays off the append hot path and avoids
shrink_to_fit()churn, the risk of a meaningful performance regression is low.Validation
Added targeted regression tests in
src/memtable.rs:test_memtable_compact_shrinks_sparse_entry_indexeslen.test_memtable_compact_keeps_busy_entry_indexes_capacitytest_memtable_compact_keeps_capacity_when_reclaim_is_too_smalltest_memtable_append_after_compact_shrink_keeps_span_consistentBuild validation:
cargo +nightly-2025-04-03 fmt --all --checkcargo +nightly-2025-04-03 check --libcargo +nightly-2025-04-03 check --lib --no-default-features --features internalsNote: full
cargo testis currently blocked in this environment by the existing Darwin/arm64grpcio-systoolchain issue rather than by this patch itself.Summary by CodeRabbit
Refactor
Tests
Style