Skip to content

Shrinked QEB#241

Draft
gshigin wants to merge 117 commits intoppfrom
shrinked_encoding_bimap
Draft

Shrinked QEB#241
gshigin wants to merge 117 commits intoppfrom
shrinked_encoding_bimap

Conversation

@gshigin
Copy link
Copy Markdown
Collaborator

@gshigin gshigin commented Feb 19, 2026

No description provided.

@gshigin gshigin requested a review from cherep58 February 19, 2026 15:38
@gshigin gshigin self-assigned this Feb 19, 2026
@vporoshok vporoshok marked this pull request as draft April 22, 2026 09:12
Comment thread pp/bare_bones/bitset.h Outdated
Comment thread pp/bare_bones/bitset.h Outdated
Comment thread pp/bare_bones/bitset.h Outdated
Comment thread pp/bare_bones/bitset.h
Comment thread pp/bare_bones/tests/bitset_tests.cpp Outdated
Comment thread pp/bare_bones/tests/bitset_tests.cpp
Comment thread pp/bare_bones/snug_composite.h Outdated
}

[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t size() const noexcept { return storage_.count(); }
[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t series_count() const noexcept {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t series_count() const noexcept {
[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t items_count() const noexcept {

or

Suggested change
[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t series_count() const noexcept {
[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t count() const noexcept {

}
}
// Returns exclusive upper bound for ids that may be requested from the table.
[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t max_item_index() const noexcept {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only implementation max_item_index_impl in code is:

[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t max_item_index_impl() const noexcept { return next_item_index_impl(); }

Maybe remove this method?

Comment thread pp/wal/hashdex/go_head.h Outdated
Comment thread pp/series_index/queryable_encoding_bimap.h Outdated
Comment thread pp/series_index/queryable_encoding_bimap.h Outdated
Comment thread pp/series_index/queryable_encoding_bimap.h Outdated
Comment thread pp/series_index/queryable_encoding_bimap.h Outdated
Comment thread pp/series_index/queryable_encoding_bimap.h Outdated
}

uint32_t max_ls_id = 0;
for (auto ls_id : ls_id_set_) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accept max_ls_id as input parameter

const auto from_find = lss_.find(ls3);

// Assert
EXPECT_GE(new_id, shrink_boundary);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of EXPECT/ASSERT_GE/LT use EXPECT/ASSERT with concrete value

Lss loaded;
load_lss_with_single_series(ls, loaded, target_id);
const bool was_active_before = target_id < loaded.added_series().size() && loaded.added_series()[target_id];
EXPECT_FALSE(was_active_before);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXPECT in arrange section.

And what this test testing?

EXPECT_EQ(2U, lss_copy.series_count());
}

TEST_F(BimapCopierFixture, CopyFindsCopiedSeries) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CopyFindsCopiedSeries and CopyKeepsSeriesCount can be merged into single test

copier2.copy_added_series_and_build_indexes();

// Act
[[maybe_unused]] const auto ls_id = lss_copy_of_copy.find_or_emplace(label_set3);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[[maybe_unused]] const auto ls_id = lss_copy_of_copy.find_or_emplace(label_set3);
std::ignore = lss_copy_of_copy.find_or_emplace(label_set3);

EXPECT_EQ(ls3_, lss_[2]);
}

TEST_F(BimapShrinkFixture, ShrunkStateSeriesCountMatchesStorage) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test can be merged with FinalizeShrinkMapsSeriesInOrder

EXPECT_EQ(3U, lss_.max_item_index());
}

TEST_F(BimapShrinkFixture, IndexWriteContextDedupesSymbolsAfterFullShrink) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexWriteContextDedupesSymbolsAfterFullShrink and IndexWriteContextResolvesRefsAfterFullShrink are test for index writer


struct ExportSymbolIdHasher {
[[nodiscard]] size_t operator()(const ExportSymbolId& id) const noexcept {
const uint64_t composite = (static_cast<uint64_t>(id.source) << 62U) ^ (static_cast<uint64_t>(id.name_id) << 31U) ^ static_cast<uint64_t>(id.value_id);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why xor and not or?

public:
#pragma pack(push, 1)
struct ExportSymbolId {
SymbolSource source{SymbolSource::kCurrent};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely, name_id will not be a big value and you can use the highest bit for source


template <class Callback>
void for_each_symbol(Callback&& callback) const {
for (uint32_t symbol_ref = 0; symbol_ref < symbols_.size(); ++symbol_ref) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that loop with pointer to symbols_ item or loop on iterators will be more efficient


void collect_current_symbols_from_shrunk_series(std::vector<SymbolIdWithView>& symbol_ids) const {
for (uint32_t ls_id = lss_.shrink_state().shift; ls_id < lss_.max_item_index(); ++ls_id) {
if (lss_.symbol_source_for_series(ls_id) != SymbolSource::kCurrent) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any added item after shrink will be with the type SymbolSource::kCurrent. It's redundant if


using SymbolReference = PromPP::Prometheus::tsdb::index::SymbolReference;
using SymbolReferencesMap = phmap::flat_hash_map<ExportSymbolId, SymbolReference, ExportSymbolIdHasher>;
using SymbolIdWithView = std::pair<std::string_view, ExportSymbolId>;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string_view add for each symbol 16 bytes overhead. It may be expensive and should be benchmarked

symbols_.clear();
symbol_refs_.clear();

std::vector<SymbolIdWithView> symbol_ids;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

symbol_ids and methods collect_empty_symbol, collect_current_symbols, collect_snapshot_symbols can be extracted into separate class SymbolIdsCollector

QueryableEncodingBimap lss_;
SymbolReferencesMap symbol_references_;
LabelIndicesWriter<QueryableEncodingBimap, decltype(stream_)> label_indices_writer{lss_, symbol_references_, stream_writer_};
std::optional<series_index::prometheus::tsdb::index::IndexWriteContext<QueryableEncodingBimap>> index_write_context_;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::optional not good pattern for unit tests. Instead of std::optional create variables in the place where they are used: TEST_P(LabelIndicesWriterFixture, Test).
Or you can manually call index_write_context_.rebuild() after lss filling

@@ -49,16 +54,18 @@ class LabelIndicesWriterFixture : public testing::TestWithParam<LabelIndicesWrit

std::ostringstream stream;
StreamWriter<decltype(stream_)> stream_writer{&stream};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use stream_writer_

QueryableEncodingBimap lss_;
SymbolReferencesMap symbol_references_;
SeriesReferencesMap series_references_;
std::optional<series_index::prometheus::tsdb::index::IndexWriteContext<QueryableEncodingBimap>> index_write_context_;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use index_write_context_.rebuild instead of std::optional

QueryableEncodingBimap lss_;
SymbolReferencesMap symbol_references_;
SeriesReferencesMap series_references_;
std::optional<series_index::prometheus::tsdb::index::IndexWriteContext<QueryableEncodingBimap>> index_write_context_;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use index_write_context_.rebuild instead of std::optional

shrunk_lss_.finalize_copy_and_shrink(*snapshot_copy_, dst_src_ids_mapping);
}

static std::string write_index(const Lss& lss) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You remove IndexWriter::write method, but implement it here. Better restore IndexWriter::write and use it with std::stringstream

}
}

std::shared_ptr<Lss> load_lss_from_file() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need std::unique_ptr?

return {};
}

void assert_added_series_suffix_marked(const Lss& lss, uint32_t begin_id) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's benchmark, not unit test. We should not validate data

Comment thread pp/wal/decoder.h
decoder_.process_segment(processor);
decoder_.process_segment([this, &processor](Primitives::LabelSetID ls_id, Primitives::Timestamp timestamp, double value) PROMPP_LAMBDA_INLINE {
if constexpr (requires(LSS& label_set, uint32_t id) { label_set.mark_active(id); }) {
label_set_.mark_active(ls_id);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we should do this in decoder?

Comment thread pp/wal/decoder.h
decoder_.process_segment([&last_ls_id, &samples, &container](uint32_t ls_id, int64_t ts, double v) PROMPP_LAMBDA_INLINE {
decoder_.process_segment([this, &last_ls_id, &samples, &container](uint32_t ls_id, int64_t ts, double v) PROMPP_LAMBDA_INLINE {
if constexpr (requires(LSS& label_set, uint32_t id) { label_set.mark_active(id); }) {
label_set_.mark_active(ls_id);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we should do this in GenericDecoder?

Comment thread pp/wal/wal.h
inline __attribute__((always_inline)) const checkpoint_type& label_sets_checkpoint() const noexcept { return label_sets_checkpoint_; }

// Exclusive upper bound for label set ids written to WAL.
inline __attribute__((always_inline)) uint32_t max_item_index() const noexcept { return label_sets_checkpoint_.next_item_index(); }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline __attribute__((always_inline)) uint32_t max_item_index() const noexcept { return label_sets_checkpoint_.next_item_index(); }
inline __attribute__((always_inline)) uint32_t max_written_item_index() const noexcept { return label_sets_checkpoint_.next_item_index(); }

@@ -0,0 +1,120 @@
#include <benchmark/benchmark.h>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All comments also apply to the queryable_encoding_bimap_resolve_benchmark.cpp

symbols_ids_sequences_type new_symbols_ids_sequences;
// reserve 3 extra bytes to avoid problems with streamvbyte
new_symbols_ids_sequences.reserve(symbols_ids_sequences_.size() - drop_seq_offset + 3);
std::ranges::copy(symbols_ids_sequences_.begin() + drop_seq_offset, symbols_ids_sequences_.end(), std::back_inserter(new_symbols_ids_sequences));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think resize + std::memcpy will be faster

symbols_ids_sequences_.shrink_to_fit();
Vector<item_type> new_items;
new_items.reserve(items_.size() - drop_count);
std::ranges::copy(items_.begin() + drop_count, items_.end(), std::back_inserter(new_items));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think resize + std::memcpy will be faster

Comment thread pp/entrypoint/head/lss.h
class ShrinkAwareSnapshotLSS : public SnapshotLSS {
public:
explicit ShrinkAwareSnapshotLSS(const QueryableEncodingBimap& lss)
: SnapshotLSS(lss), shrink_state_(lss.shrink_state().clone_for_snapshot()), added_series_(lss.added_series()) {}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inside ShrinkState we copy BareBones::Vector<uint32_t>. Use BareBones::SharedVector

Comment thread pp/entrypoint/head/lss.h
return resolve_shrunk_series(id);
}

if (shrink_state_.is_fixed() && is_hidden_in_fixed_state(id)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_normal and is_fixed should have similar behavior

}
}

extern "C" void prompp_head_wal_encoder_max_item_index(void* args, void* res) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
extern "C" void prompp_head_wal_encoder_max_item_index(void* args, void* res) {
extern "C" void prompp_head_wal_encoder_max_written_item_index(void* args, void* res) {

Comment thread pp/go/cppbridge/primitives_lss_test.go
Comment thread pp/go/cppbridge/primitives_lss_test.go Outdated
Comment thread pp/go/cppbridge/primitives_lss_test.go Outdated
Comment thread pp/go/cppbridge/primitives_lss_test.go Outdated
}

// makeRotatedLSS creates a rotated LSS from the original LSS.
func (s *RotateLSSSuite) makeRotatedLSS(lName string) *rotatedLSS {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify method. Fill fields of rotatedLSS manually, not by code

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but filling in the structure is a constructor.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but don't use loops, slice.Sort, if i%2 == 0 and other logic blocks. Only manually filling

u-veles-a added 2 commits May 5, 2026 14:26
Signed-off-by: Alexandr Yudin <57181751+u-veles-a@users.noreply.github.com>
Signed-off-by: Alexandr Yudin <57181751+u-veles-a@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants