HBASE-29039 Seek past delete markers instead of skipping one at a time#8001
Open
junegunn wants to merge 1 commit intoapache:masterfrom
Open
HBASE-29039 Seek past delete markers instead of skipping one at a time#8001junegunn wants to merge 1 commit intoapache:masterfrom
junegunn wants to merge 1 commit intoapache:masterfrom
Conversation
Member
Author
|
TestVisibilityLabelsWithDeletes is failing, which likely explains the additional changes in #6557. I'll try to fix it, but if it ends up resembling the previous approach, I'll drop this. |
When a DeleteColumn or DeleteFamily marker is encountered during a normal user scan, the matcher currently returns SKIP, forcing the scanner to advance one cell at a time. This causes read latency to degrade linearly with the number of accumulated delete markers for the same row or column. Since these are range deletes that mask all remaining versions of the column, seek past the entire column immediately via columns.getNextRowOrNextColumn(). This is safe because cells arrive in timestamp descending order, so any puts newer than the delete have already been processed. For DeleteFamily, also fix getKeyForNextColumn in ScanQueryMatcher to bypass the empty-qualifier guard (HBASE-18471) when the cell is a DeleteFamily marker. Without this, the seek barely advances past the current cell instead of jumping to the first real qualified column. The optimization is only applied with plain ScanDeleteTracker, and skipped when: - seePastDeleteMarkers is true (KEEP_DELETED_CELLS) - newVersionBehavior is enabled (sequence IDs determine visibility) - visibility labels are in use (delete/put label mismatch)
018a268 to
3a87682
Compare
Member
Author
Fixed by: - !seePastDeleteMarkers && !(deletes instanceof NewVersionBehaviorTracker)
+ !seePastDeleteMarkers && deletes.getClass() == ScanDeleteTracker.class |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
HBASE-30036 (#7993) consolidates redundant delete markers on flush, preventing them from growing unbounded in HFiles. However, markers still accumulate in the memstore before flush, degrading read performance. HBASE-29039 addresses this from the read path side. Both are needed for full coverage. There is an open PR (#6557), but the review process has been stalled. This is an alternative approach with fewer code changes, hopefully making it easier to reach consensus.
Test result
Using the test code in HBASE-30036.
DeleteFamilyDeleteColumnContiguousDeleteColumnInterleavedDescription
When a DeleteColumn or DeleteFamily marker is encountered during a normal user scan, the matcher currently returns SKIP, forcing the scanner to advance one cell at a time. This causes read latency to degrade linearly with the number of accumulated delete markers for the same row or column.
Since these are range deletes that mask all remaining versions of the column, seek past the entire column immediately via columns.getNextRowOrNextColumn(). This is safe because cells arrive in timestamp descending order, so any puts newer than the delete have already been processed.
For DeleteFamily, also fix getKeyForNextColumn in ScanQueryMatcher to bypass the empty-qualifier guard (HBASE-18471) when the cell is a DeleteFamily marker. Without this, the seek barely advances past the current cell instead of jumping to the first real qualified column.
The optimization is skipped when: