Added a new documentation page for faster GRIB aggregations#495
Added a new documentation page for faster GRIB aggregations#495martindurant merged 7 commits intofsspec:mainfrom
Conversation
| GRIB Aggregations | ||
| ----------------- | ||
|
|
||
| This new method for reference aggregation, developed by **Camus Energy**, is based on GRIB2 files. Utilizing |
There was a problem hiding this comment.
I won't be "new" for long, so drop this.
I would put the restrictions first:
- must have .idx files
- specialised for time-series data, each file having identical message structure
@emfdavid : an opinion on whether Camus wants to be referenced here and how that would look.
There was a problem hiding this comment.
@emfdavid Should I mention Camus Energy like @martindurant asked here?
There was a problem hiding this comment.
I checked in with the team and it would be great if you could link https://www.camus.energy/ and if there is a place for attribution you can include my github handle @emfdavid, otherwise the contributors list is fine.
| The index in ``idx`` file indexes the GRIB messages where as the ``k_index`` (kerchunk index) | ||
| we build as part of this workflow index the variables in those messages. |
There was a problem hiding this comment.
This note explains my question above, but is not very clear. Map out the steps we need to do before launching into the details.
There was a problem hiding this comment.
Steps for how we build the index? Should I include the code for building the index?
There was a problem hiding this comment.
No code, just brief points.
emfdavid
left a comment
There was a problem hiding this comment.
Great start - one suggestion re limitations.
| - The ``.idx`` file must be of *text* type. | ||
| - Only specialised for time-series data, where GRIB files | ||
| have *identical* structure. | ||
| - Aggregation only works for a specific **forecast horizon** files. |
There was a problem hiding this comment.
The reference index can be combined across many horizons but each horizon must be indexed separately.
Looking forward to seeing what you make of the reinflate api... there you can see all of the FMRC slices are supported against a collection of indexed data from many horizons, runtimes and valid times.
There was a problem hiding this comment.
the reinflate api
ooh, what is this?
There was a problem hiding this comment.
The method to turn the k_index and the metadata back into a ref_spec you can use in zarr/xarray
https://github.com/asascience-open/nextgen-dmac/blob/main/grib_index_aggregation/dynamic_zarr_store.py#L198
I think @Anu-Ra-g is already working on adding it into kerchunk?
There was a problem hiding this comment.
Oh, in that case I suspect it works already, right @Anu-Ra-g : but you can only work on one set of horizons OR one set of timepoints, not both at once? Something like that.
There was a problem hiding this comment.
I think you can return an array with multiple dimensions.
I didn't have a strong use for this so struggled to do something general and practical.
For instance if you request by Horizon, you can provide multiple horizon axis and you dimensions should include 'horizon' and 'valid_time". Similarly you can request multiple runtimes and then your dimensions should include 'runtime' and 'step'.
Honestly not sure if this is helpful or over complicated.
There was a problem hiding this comment.
@martindurant I tried it out with one set of horizons with the original code. Actually, I'm still figuring out the reinflating part of the code, aggregation types and the new indexes.
I noticed that for reinflating can also work with a grib_tree model made from a single grib file.
@emfdavid can you confirm this in this notebook that I made?
|
Let me know when this PR is ready for another look. |
|
@martindurant I've made the changes like you've suggested. It is ready for review. |
This PR adds a new page in the
kerchunkdocumentation for faster reference consolidation for GRIB files.