Added a new documentation page for faster GRIB aggregations by Anu-Ra-g · Pull Request #495 · fsspec/kerchunk

Anu-Ra-g · 2024-08-27T15:40:32Z

This PR adds a new page in the kerchunk documentation for faster reference consolidation for GRIB files.

martindurant · 2024-08-27T16:53:37Z

+GRIB Aggregations
+-----------------
+
+This new method for reference aggregation, developed by **Camus Energy**, is based on GRIB2 files. Utilizing


I won't be "new" for long, so drop this.

I would put the restrictions first:

must have .idx files

specialised for time-series data, each file having identical message structure

@emfdavid : an opinion on whether Camus wants to be referenced here and how that would look.

@emfdavid Should I mention Camus Energy like @martindurant asked here?

I checked in with the team and it would be great if you could link https://www.camus.energy/ and if there is a place for attribution you can include my github handle @emfdavid, otherwise the contributors list is fine.

martindurant · 2024-08-27T16:57:55Z

+    The index in ``idx`` file indexes the GRIB messages where as the ``k_index`` (kerchunk index)
+    we build as part of this workflow index the variables in those messages.


This note explains my question above, but is not very clear. Map out the steps we need to do before launching into the details.

Steps for how we build the index? Should I include the code for building the index?

No code, just brief points.

emfdavid

Great start - one suggestion re limitations.

emfdavid · 2024-08-28T14:52:53Z

+  - The ``.idx`` file must be of *text* type.
+  - Only specialised for time-series data, where GRIB files
+    have *identical* structure.
+  - Aggregation only works for a specific **forecast horizon** files.


The reference index can be combined across many horizons but each horizon must be indexed separately.
Looking forward to seeing what you make of the reinflate api... there you can see all of the FMRC slices are supported against a collection of indexed data from many horizons, runtimes and valid times.

the reinflate api

ooh, what is this?

The method to turn the k_index and the metadata back into a ref_spec you can use in zarr/xarray
https://github.com/asascience-open/nextgen-dmac/blob/main/grib_index_aggregation/dynamic_zarr_store.py#L198
I think @Anu-Ra-g is already working on adding it into kerchunk?

Oh, in that case I suspect it works already, right @Anu-Ra-g : but you can only work on one set of horizons OR one set of timepoints, not both at once? Something like that.

I think you can return an array with multiple dimensions.
I didn't have a strong use for this so struggled to do something general and practical.
For instance if you request by Horizon, you can provide multiple horizon axis and you dimensions should include 'horizon' and 'valid_time". Similarly you can request multiple runtimes and then your dimensions should include 'runtime' and 'step'.
Honestly not sure if this is helpful or over complicated.

@martindurant I tried it out with one set of horizons with the original code. Actually, I'm still figuring out the reinflating part of the code, aggregation types and the new indexes.

I noticed that for reinflating can also work with a grib_tree model made from a single grib file.
@emfdavid can you confirm this in this notebook that I made?

martindurant · 2024-08-30T15:00:20Z

Let me know when this PR is ready for another look.

Anu-Ra-g · 2024-08-30T15:06:51Z

@martindurant I've made the changes like you've suggested. It is ready for review.

Anu-Ra-g added 2 commits August 27, 2024 21:05

Added a new page for faster aggregations

1f40cf5

updated the description

6eaf758

Anu-Ra-g changed the title ~~Added a new page for faster aggregations~~ Added a new documentation page for faster GRIB aggregations Aug 27, 2024

added the presentation link

1c7a15b

martindurant reviewed Aug 27, 2024

View reviewed changes

updated according to suggestions

29e3ede

emfdavid reviewed Aug 28, 2024

View reviewed changes

martindurant added the GSoC-2024 label Aug 28, 2024

Anu-Ra-g added 3 commits August 30, 2024 11:00

made the suggested changes

8088448

made some refactoring

274b1d5

added some other details

73c040b

martindurant reviewed Aug 30, 2024

View reviewed changes

Comment thread docs/source/reference_aggregation.rst

martindurant merged commit af4c5dd into fsspec:main Sep 9, 2024

Anu-Ra-g mentioned this pull request Sep 13, 2024

Proposal to add all-contributors bot to Kerchunk #503

Closed

		The index in ``idx`` file indexes the GRIB messages where as the ``k_index`` (kerchunk index)
		we build as part of this workflow index the variables in those messages.

Conversation

Anu-Ra-g commented Aug 27, 2024

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

emfdavid left a comment

Choose a reason for hiding this comment

Uh oh!

emfdavid Aug 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martindurant commented Aug 30, 2024

Uh oh!

Anu-Ra-g commented Aug 30, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

emfdavid Aug 28, 2024 •

edited

Loading