Shrinked QEB by gshigin · Pull Request #241 · deckhouse/prompp

gshigin · 2026-02-19T15:38:20Z

No description provided.

…site-type

…g_bimap

cherep58 · 2026-04-30T08:48:37Z

+  }

  [[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t size() const noexcept { return storage_.count(); }
+  [[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t series_count() const noexcept {


Suggested change

[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t series_count() const noexcept {

[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t items_count() const noexcept {

or

Suggested change

[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t series_count() const noexcept {

[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t count() const noexcept {

cherep58 · 2026-04-30T09:08:20Z

+    }
+  }
+  // Returns exclusive upper bound for ids that may be requested from the table.
+  [[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t max_item_index() const noexcept {


The only implementation max_item_index_impl in code is:

[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t max_item_index_impl() const noexcept { return next_item_index_impl(); }

Maybe remove this method?

cherep58 · 2026-04-30T13:50:20Z

+    }
+
+    uint32_t max_ls_id = 0;
+    for (auto ls_id : ls_id_set_) {


accept max_ls_id as input parameter

cherep58 · 2026-04-30T15:03:37Z

+  const auto from_find = lss_.find(ls3);
+
+  // Assert
+  EXPECT_GE(new_id, shrink_boundary);


Instead of EXPECT/ASSERT_GE/LT use EXPECT/ASSERT with concrete value

cherep58 · 2026-04-30T15:09:39Z

+  Lss loaded;
+  load_lss_with_single_series(ls, loaded, target_id);
+  const bool was_active_before = target_id < loaded.added_series().size() && loaded.added_series()[target_id];
+  EXPECT_FALSE(was_active_before);


EXPECT in arrange section.

And what this test testing?

cherep58 · 2026-04-30T15:17:41Z

+  EXPECT_EQ(2U, lss_copy.series_count());
+}

+TEST_F(BimapCopierFixture, CopyFindsCopiedSeries) {


CopyFindsCopiedSeries and CopyKeepsSeriesCount can be merged into single test

cherep58 · 2026-04-30T15:20:19Z

+  copier2.copy_added_series_and_build_indexes();
+
+  // Act
+  [[maybe_unused]] const auto ls_id = lss_copy_of_copy.find_or_emplace(label_set3);


Suggested change

[[maybe_unused]] const auto ls_id = lss_copy_of_copy.find_or_emplace(label_set3);

std::ignore = lss_copy_of_copy.find_or_emplace(label_set3);

cherep58 · 2026-05-03T19:39:40Z

+  EXPECT_EQ(ls3_, lss_[2]);
+}
+
+TEST_F(BimapShrinkFixture, ShrunkStateSeriesCountMatchesStorage) {


This test can be merged with FinalizeShrinkMapsSeriesInOrder

cherep58 · 2026-05-03T19:43:12Z

+  EXPECT_EQ(3U, lss_.max_item_index());
+}
+
+TEST_F(BimapShrinkFixture, IndexWriteContextDedupesSymbolsAfterFullShrink) {


IndexWriteContextDedupesSymbolsAfterFullShrink and IndexWriteContextResolvesRefsAfterFullShrink are test for index writer

cherep58 · 2026-05-03T19:49:29Z

+
+  struct ExportSymbolIdHasher {
+    [[nodiscard]] size_t operator()(const ExportSymbolId& id) const noexcept {
+      const uint64_t composite = (static_cast<uint64_t>(id.source) << 62U) ^ (static_cast<uint64_t>(id.name_id) << 31U) ^ static_cast<uint64_t>(id.value_id);


Why xor and not or?

cherep58 · 2026-05-03T19:51:15Z

+ public:
+#pragma pack(push, 1)
+  struct ExportSymbolId {
+    SymbolSource source{SymbolSource::kCurrent};


Most likely, name_id will not be a big value and you can use the highest bit for source

cherep58 · 2026-05-03T19:53:40Z

+
+  template <class Callback>
+  void for_each_symbol(Callback&& callback) const {
+    for (uint32_t symbol_ref = 0; symbol_ref < symbols_.size(); ++symbol_ref) {


I think that loop with pointer to symbols_ item or loop on iterators will be more efficient

cherep58 · 2026-05-03T20:21:00Z

+
+  void collect_current_symbols_from_shrunk_series(std::vector<SymbolIdWithView>& symbol_ids) const {
+    for (uint32_t ls_id = lss_.shrink_state().shift; ls_id < lss_.max_item_index(); ++ls_id) {
+      if (lss_.symbol_source_for_series(ls_id) != SymbolSource::kCurrent) {


Any added item after shrink will be with the type SymbolSource::kCurrent. It's redundant if

cherep58 · 2026-05-03T20:27:54Z

+
+  using SymbolReference = PromPP::Prometheus::tsdb::index::SymbolReference;
+  using SymbolReferencesMap = phmap::flat_hash_map<ExportSymbolId, SymbolReference, ExportSymbolIdHasher>;
+  using SymbolIdWithView = std::pair<std::string_view, ExportSymbolId>;


std::string_view add for each symbol 16 bytes overhead. It may be expensive and should be benchmarked

cherep58 · 2026-05-03T20:34:44Z

+    symbols_.clear();
+    symbol_refs_.clear();
+
+    std::vector<SymbolIdWithView> symbol_ids;


symbol_ids and methods collect_empty_symbol, collect_current_symbols, collect_snapshot_symbols can be extracted into separate class SymbolIdsCollector

cherep58 · 2026-05-04T13:49:31Z

  QueryableEncodingBimap lss_;
-  SymbolReferencesMap symbol_references_;
-  LabelIndicesWriter<QueryableEncodingBimap, decltype(stream_)> label_indices_writer{lss_, symbol_references_, stream_writer_};
+  std::optional<series_index::prometheus::tsdb::index::IndexWriteContext<QueryableEncodingBimap>> index_write_context_;


std::optional not good pattern for unit tests. Instead of std::optional create variables in the place where they are used: TEST_P(LabelIndicesWriterFixture, Test).
Or you can manually call index_write_context_.rebuild() after lss filling

cherep58 · 2026-05-04T13:49:48Z

@@ -49,16 +54,18 @@ class LabelIndicesWriterFixture : public testing::TestWithParam<LabelIndicesWrit

    std::ostringstream stream;
    StreamWriter<decltype(stream_)> stream_writer{&stream};


use stream_writer_

cherep58 · 2026-05-04T14:42:57Z

  QueryableEncodingBimap lss_;
-  SymbolReferencesMap symbol_references_;
  SeriesReferencesMap series_references_;
+  std::optional<series_index::prometheus::tsdb::index::IndexWriteContext<QueryableEncodingBimap>> index_write_context_;


use index_write_context_.rebuild instead of std::optional

cherep58 · 2026-05-04T14:43:17Z

  QueryableEncodingBimap lss_;
-  SymbolReferencesMap symbol_references_;
  SeriesReferencesMap series_references_;
+  std::optional<series_index::prometheus::tsdb::index::IndexWriteContext<QueryableEncodingBimap>> index_write_context_;


use index_write_context_.rebuild instead of std::optional

cherep58 · 2026-05-04T14:52:08Z

+    shrunk_lss_.finalize_copy_and_shrink(*snapshot_copy_, dst_src_ids_mapping);
+  }
+
+  static std::string write_index(const Lss& lss) {


You remove IndexWriter::write method, but implement it here. Better restore IndexWriter::write and use it with std::stringstream

cherep58 · 2026-05-04T14:59:30Z

+  }
+}
+
+std::shared_ptr<Lss> load_lss_from_file() {


Why do we need std::unique_ptr?

cherep58 · 2026-05-04T15:03:16Z

+  return {};
+}
+
+void assert_added_series_suffix_marked(const Lss& lss, uint32_t begin_id) {


It's benchmark, not unit test. We should not validate data

cherep58 · 2026-05-04T15:09:49Z

-    decoder_.process_segment(processor);
+    decoder_.process_segment([this, &processor](Primitives::LabelSetID ls_id, Primitives::Timestamp timestamp, double value) PROMPP_LAMBDA_INLINE {
+      if constexpr (requires(LSS& label_set, uint32_t id) { label_set.mark_active(id); }) {
+        label_set_.mark_active(ls_id);


Why we should do this in decoder?

cherep58 · 2026-05-04T15:10:22Z

-    decoder_.process_segment([&last_ls_id, &samples, &container](uint32_t ls_id, int64_t ts, double v) PROMPP_LAMBDA_INLINE {
+    decoder_.process_segment([this, &last_ls_id, &samples, &container](uint32_t ls_id, int64_t ts, double v) PROMPP_LAMBDA_INLINE {
+      if constexpr (requires(LSS& label_set, uint32_t id) { label_set.mark_active(id); }) {
+        label_set_.mark_active(ls_id);


Why we should do this in GenericDecoder?

cherep58 · 2026-05-04T15:12:04Z

+  inline __attribute__((always_inline)) const checkpoint_type& label_sets_checkpoint() const noexcept { return label_sets_checkpoint_; }
+
+  // Exclusive upper bound for label set ids written to WAL.
+  inline __attribute__((always_inline)) uint32_t max_item_index() const noexcept { return label_sets_checkpoint_.next_item_index(); }


Suggested change

inline __attribute__((always_inline)) uint32_t max_item_index() const noexcept { return label_sets_checkpoint_.next_item_index(); }

inline __attribute__((always_inline)) uint32_t max_written_item_index() const noexcept { return label_sets_checkpoint_.next_item_index(); }

cherep58 · 2026-05-04T15:14:24Z

@@ -0,0 +1,120 @@
+#include <benchmark/benchmark.h>


All comments also apply to the queryable_encoding_bimap_resolve_benchmark.cpp

cherep58 · 2026-05-05T08:05:28Z

+      symbols_ids_sequences_type new_symbols_ids_sequences;
+      // reserve 3 extra bytes to avoid problems with streamvbyte
+      new_symbols_ids_sequences.reserve(symbols_ids_sequences_.size() - drop_seq_offset + 3);
+      std::ranges::copy(symbols_ids_sequences_.begin() + drop_seq_offset, symbols_ids_sequences_.end(), std::back_inserter(new_symbols_ids_sequences));


I think resize + std::memcpy will be faster

cherep58 · 2026-05-05T08:05:50Z

-      symbols_ids_sequences_.shrink_to_fit();
+      Vector<item_type> new_items;
+      new_items.reserve(items_.size() - drop_count);
+      std::ranges::copy(items_.begin() + drop_count, items_.end(), std::back_inserter(new_items));


I think resize + std::memcpy will be faster

cherep58 · 2026-05-05T08:27:46Z

+class ShrinkAwareSnapshotLSS : public SnapshotLSS {
+ public:
+  explicit ShrinkAwareSnapshotLSS(const QueryableEncodingBimap& lss)
+      : SnapshotLSS(lss), shrink_state_(lss.shrink_state().clone_for_snapshot()), added_series_(lss.added_series()) {}


inside ShrinkState we copy BareBones::Vector<uint32_t>. Use BareBones::SharedVector

cherep58 · 2026-05-05T11:32:33Z

+      return resolve_shrunk_series(id);
+    }
+
+    if (shrink_state_.is_fixed() && is_hidden_in_fixed_state(id)) {


is_normal and is_fixed should have similar behavior

cherep58 · 2026-05-05T11:53:24Z

  }
 }

+extern "C" void prompp_head_wal_encoder_max_item_index(void* args, void* res) {


Suggested change

extern "C" void prompp_head_wal_encoder_max_item_index(void* args, void* res) {

extern "C" void prompp_head_wal_encoder_max_written_item_index(void* args, void* res) {

cherep58 · 2026-05-05T13:43:43Z

+}
+
+// makeRotatedLSS creates a rotated LSS from the original LSS.
+func (s *RotateLSSSuite) makeRotatedLSS(lName string) *rotatedLSS {


Simplify method. Fill fields of rotatedLSS manually, not by code

but filling in the structure is a constructor.

Yes, but don't use loops, slice.Sort, if i%2 == 0 and other logic blocks. Only manually filling

Signed-off-by: Alexandr Yudin <57181751+u-veles-a@users.noreply.github.com>

gshigin added 10 commits February 11, 2026 12:59

in storage_type next item is size

590870a

shrink table to any valid chekpoint size

301222b

drop_front accounts for shared memory

4adb682

decoding table fix

aa26eda

wal max_lsid getter

dde0daa

Merge remote-tracking branch 'origin/pp' into shrinked_encoding_bimap

31d185b

iwyu

5118995

shrinkable qeb

0a4c96d

id resolve benchmark

d338389

tidy fix

ed1ed0c

gshigin requested a review from cherep58 February 19, 2026 15:38

gshigin self-assigned this Feb 19, 2026

gshigin added 18 commits February 20, 2026 13:46

split finalize function

f4b96dc

need to change QEB* to DT*

2f0b6ea

make QueryableEncodingBimap LabelSet level

406e5ec

Merge branch 'qeb-labelset' into shrinked_encoding_bimap

314843b

composite_type for Symbol and LabelNameSet

3bac3b0

SymbolTableView and all composite_type's

ca1f961

Merge remote-tracking branch 'origin/pp' into symbol-table-view-compo…

f305922

…site-type

LabelSetComposite

703cab8

fix

7121b25

merge

2a38577

Merge branch 'symbol-table-view-composite-type' into shrinked_encodin…

21355d1

…g_bimap

merge fix

39d968d

benchmark fix

ff59254

callback implementation

c6e6963

fixes

b3dd4cf

another callback approach

19bdab3

benchmark build

0ba066b

One more iteration

add5e08

bug test

be11cad

vporoshok marked this pull request as draft April 22, 2026 09:12

gshigin and others added 15 commits April 22, 2026 15:17

bug fix

2a560d0

Merge remote-tracking branch 'origin/pp' into shrinked_encoding_bimap

622eccc

format fix

5239b07

Fix after merge pp

ed97ed3

Merge remote-tracking branch 'origin/pp' into shrinked_encoding_bimap

cbe1cc4

update benchmarks

a6469e6

remove unused IndexWriter::write

96dc853

new header

5aeb03b

benchmark update

e2a618d

state helpers refactoring

fc3b067

Merge remote-tracking branch 'origin/pp' into shrinked_encoding_bimap

457d03c

format fix

a350100

bitset fix

6ba3a2f

rotation metric loss fix

6176151

build dependency

3ca3cd9

cherep58 reviewed Apr 30, 2026

View reviewed changes

cherep58 reviewed May 3, 2026

View reviewed changes

gshigin added 4 commits May 4, 2026 07:16

bitset review fixes

d58e1c3

qeb review fixes

a7c7dca

Merge remote-tracking branch 'origin/pp' into shrinked_encoding_bimap

d76d401

merge fix

304d60a

cherep58 reviewed May 4, 2026

View reviewed changes

ShrinkState methods

0bbe115

cherep58 reviewed May 5, 2026

View reviewed changes

u-veles-a added 2 commits May 5, 2026 14:26

WIP: fix test review

d7861e8

Signed-off-by: Alexandr Yudin <57181751+u-veles-a@users.noreply.github.com>

WIP: fix review 2

9e0c868

Signed-off-by: Alexandr Yudin <57181751+u-veles-a@users.noreply.github.com>

	[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t series_count() const noexcept {
	[[nodiscard]] PROMPP_ALWAYS_INLINE uint32_t items_count() const noexcept {

	[[maybe_unused]] const auto ls_id = lss_copy_of_copy.find_or_emplace(label_set3);
	std::ignore = lss_copy_of_copy.find_or_emplace(label_set3);

		@@ -49,16 +54,18 @@ class LabelIndicesWriterFixture : public testing::TestWithParam<LabelIndicesWrit

		std::ostringstream stream;
		StreamWriter<decltype(stream_)> stream_writer{&stream};

	inline __attribute__((always_inline)) uint32_t max_item_index() const noexcept { return label_sets_checkpoint_.next_item_index(); }
	inline __attribute__((always_inline)) uint32_t max_written_item_index() const noexcept { return label_sets_checkpoint_.next_item_index(); }

	extern "C" void prompp_head_wal_encoder_max_item_index(void* args, void* res) {
	extern "C" void prompp_head_wal_encoder_max_written_item_index(void* args, void* res) {

Conversation

gshigin commented Feb 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!