Skip to content

Added a new documentation page for faster GRIB aggregations#495

Merged
martindurant merged 7 commits intofsspec:mainfrom
Anu-Ra-g:fast_grib_doc
Sep 9, 2024
Merged

Added a new documentation page for faster GRIB aggregations#495
martindurant merged 7 commits intofsspec:mainfrom
Anu-Ra-g:fast_grib_doc

Conversation

@Anu-Ra-g
Copy link
Copy Markdown
Contributor

This PR adds a new page in the kerchunk documentation for faster reference consolidation for GRIB files.

@Anu-Ra-g Anu-Ra-g changed the title Added a new page for faster aggregations Added a new documentation page for faster GRIB aggregations Aug 27, 2024
Comment thread docs/source/reference_aggregation.rst Outdated
Comment thread docs/source/reference_aggregation.rst Outdated
Comment thread docs/source/reference_aggregation.rst Outdated
GRIB Aggregations
-----------------

This new method for reference aggregation, developed by **Camus Energy**, is based on GRIB2 files. Utilizing
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't be "new" for long, so drop this.

I would put the restrictions first:

  • must have .idx files
  • specialised for time-series data, each file having identical message structure

@emfdavid : an opinion on whether Camus wants to be referenced here and how that would look.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emfdavid Should I mention Camus Energy like @martindurant asked here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked in with the team and it would be great if you could link https://www.camus.energy/ and if there is a place for attribution you can include my github handle @emfdavid, otherwise the contributors list is fine.

Comment thread docs/source/reference_aggregation.rst Outdated
Comment thread docs/source/reference_aggregation.rst Outdated
Comment thread docs/source/reference_aggregation.rst Outdated
Comment thread docs/source/reference_aggregation.rst Outdated
Comment on lines +114 to +115
The index in ``idx`` file indexes the GRIB messages where as the ``k_index`` (kerchunk index)
we build as part of this workflow index the variables in those messages.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note explains my question above, but is not very clear. Map out the steps we need to do before launching into the details.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steps for how we build the index? Should I include the code for building the index?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No code, just brief points.

Comment thread docs/source/reference_aggregation.rst Outdated
Copy link
Copy Markdown
Contributor

@emfdavid emfdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start - one suggestion re limitations.

Comment thread docs/source/reference_aggregation.rst Outdated
- The ``.idx`` file must be of *text* type.
- Only specialised for time-series data, where GRIB files
have *identical* structure.
- Aggregation only works for a specific **forecast horizon** files.
Copy link
Copy Markdown
Contributor

@emfdavid emfdavid Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference index can be combined across many horizons but each horizon must be indexed separately.
Looking forward to seeing what you make of the reinflate api... there you can see all of the FMRC slices are supported against a collection of indexed data from many horizons, runtimes and valid times.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reinflate api

ooh, what is this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method to turn the k_index and the metadata back into a ref_spec you can use in zarr/xarray
https://github.com/asascience-open/nextgen-dmac/blob/main/grib_index_aggregation/dynamic_zarr_store.py#L198
I think @Anu-Ra-g is already working on adding it into kerchunk?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, in that case I suspect it works already, right @Anu-Ra-g : but you can only work on one set of horizons OR one set of timepoints, not both at once? Something like that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can return an array with multiple dimensions.
I didn't have a strong use for this so struggled to do something general and practical.
For instance if you request by Horizon, you can provide multiple horizon axis and you dimensions should include 'horizon' and 'valid_time". Similarly you can request multiple runtimes and then your dimensions should include 'runtime' and 'step'.
Honestly not sure if this is helpful or over complicated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martindurant I tried it out with one set of horizons with the original code. Actually, I'm still figuring out the reinflating part of the code, aggregation types and the new indexes.

I noticed that for reinflating can also work with a grib_tree model made from a single grib file.
@emfdavid can you confirm this in this notebook that I made?

@martindurant
Copy link
Copy Markdown
Member

Let me know when this PR is ready for another look.

@Anu-Ra-g
Copy link
Copy Markdown
Contributor Author

@martindurant I've made the changes like you've suggested. It is ready for review.

Comment thread docs/source/reference_aggregation.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants