-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Labels
performanceFeatures or improvements that positively affect seq-db performanceFeatures or improvements that positively affect seq-db performance
Description
LIDs (posting lists) blocks are the most crucial data structure for search along with tokens. Currently, we utilize varints, which are slow and inefficient.
The proposed solution is to use bitpack encoding. There are two libraries to test in Go:
- ready to use library intcomp which works on top of 128 numbers chunks. Bit width is defined for every 32 numbers.
- Daniel Lemire's fastPFOR delta which is ported to Go. It tries to find optimal bit width for 128 numbers at the same time cutting large detas and placing them in a footer of the page. The algorithm is tuned for 64k pages
Need to test:
- dataset size overhead
- search and throughput performance metrics
- estimate how much is the overhead if we disable zstd for LID blocks
Metadata
Metadata
Assignees
Labels
performanceFeatures or improvements that positively affect seq-db performanceFeatures or improvements that positively affect seq-db performance