-
Notifications
You must be signed in to change notification settings - Fork 462
Open
Labels
Description
Describe the enhancement requested
Hi, I'm new around here, please let me know if this request is better elsewhere.
I'd like to propose an optional type parameter called Offset to TIMESTAMP logical types.
In my common use case of Parquet files, the data is a running log with many rows, such that any one row group is unlikely to have more than a few days at a time.
The idea of the Offset parameter would be to store for each row group (in Int64) an offset from Unix epoch, then the data would be stored relative to that offset.
This provides a couple of benefits:
- row groups could be selectively downsized (when possible) to INT32 physical types. This could save significant amounts of file size if I understand correctly. At millisecond level accuracy, INT32 could support row groups up to ~48 days long.1
- The docs identify that all TIMESTAMPs, but particularly those with NANOs accuracy have range limitations due to the INT64 limitation. Adding an
Offsetwould allow practically unlimited ranges for TIMESTAMPs.
Footnotes
-
with an offset set in the middle of row group values, given the signed nature of INT32 ↩