Skip to content

Use bitpack encoding for LID blocks #312

@cheb0

Description

@cheb0

LIDs (posting lists) blocks are the most crucial data structure for search along with tokens. Currently, we utilize varints, which are slow and inefficient.

The proposed solution is to use bitpack encoding. There are two libraries to test in Go:

  • ready to use library intcomp which works on top of 128 numbers chunks. Bit width is defined for every 32 numbers.
  • Daniel Lemire's fastPFOR delta which is ported to Go. It tries to find optimal bit width for 128 numbers at the same time cutting large detas and placing them in a footer of the page. The algorithm is tuned for 64k pages

Need to test:

  • dataset size overhead
  • search and throughput performance metrics
  • estimate how much is the overhead if we disable zstd for LID blocks

Metadata

Metadata

Assignees

Labels

performanceFeatures or improvements that positively affect seq-db performance

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions