We are in the process of adding ALP to the Parquet specification, a new encoding for floating point integers.
The change to the parquet-format specification from @prtkgaur is going well,
However, the spec largely focuses on the low level technical details and it will be somewhat awkward to refer to to help people understand what ALP, when they should use it, and why they should add it to their systems. Therefore I think it would help to add a high level overview / link to point people at.
There is a lot of this content in the Google Doc draft ALP Proposal : ALP : Floating point encoding for Parquet but it is intermixed with low level technical implementation details and commentary.
Thus, I suggest a blog post on https://parquet.apache.org/blog/ with a high level
- Explains the usecase of ALP (decimal data that was stored in FLOAT/DOUBLE columns), single row decode
- References the ALP paper (and probably mentions its adoption by other formats like Vortex)
- Gives a technical overview of the encoding (with diagrams)
- Gives some more examples
I also would like to have this post to continue the a story arc that highlights major new addition we are making to Parquet spec, how we are adopting cutting edge research, and and how they are being adopted (e.g. https://parquet.apache.org/docs/file-format/implementationstatus/)
We are in the process of adding ALP to the Parquet specification, a new encoding for floating point integers.
The change to the parquet-format specification from @prtkgaur is going well,
However, the spec largely focuses on the low level technical details and it will be somewhat awkward to refer to to help people understand what ALP, when they should use it, and why they should add it to their systems. Therefore I think it would help to add a high level overview / link to point people at.
There is a lot of this content in the Google Doc draft ALP Proposal : ALP : Floating point encoding for Parquet but it is intermixed with low level technical implementation details and commentary.
Thus, I suggest a blog post on https://parquet.apache.org/blog/ with a high level
I also would like to have this post to continue the a story arc that highlights major new addition we are making to Parquet spec, how we are adopting cutting edge research, and and how they are being adopted (e.g. https://parquet.apache.org/docs/file-format/implementationstatus/)