Skip to content

Performance issues of domain cropping #83

Description

@observingClouds

This issue collects a few performance issues with the cropping implementation of #45 that I experience when wanting to crop a 11M cell grid. The time for such a cropping is > 2 days.

Main function

For a quick overview the main function is in mllam_data_prep/ops/cropping.py create_convex_hull_mask:

    da_lon, da_lat = _get_latlon_coords(ds)
    da_lon_ref, da_lat_ref = _get_latlon_coords(ds_reference)

    assert da_lat.dims == da_lon.dims
    assert da_lat_ref.dims == da_lon_ref.dims

    # latlon to (x, y, z) on unit sphere
    da_ref_xyz = _latlon_to_unit_sphere_xyz(da_lat=da_lat_ref, da_lon=da_lon_ref)

    chull_lam = SphericalPolygon.convex_hull(da_ref_xyz.values)

    # call .load() to avoid using dask arrays in the following apply_ufunc
    da_interior_mask = xr.apply_ufunc(
        chull_lam.contains_lonlat, da_lon.load(), da_lat.load(), vectorize=True
    ).astype(bool)
    da_interior_mask.attrs[
        "long_name"
    ] = "contained in convex hull of source dataset (da_ref)"

Redudant calculations

The slow part of create_convex_hull_mask is the test of whether or not a point is part of the hull, i.e. chull_lam.contains_lonlat. While apply_ufunc might parallelize this computation, contains_lonlat is still quite inefficient.
E.g. vector.lonlat_to_vector(lon, lat, degrees=degrees) is called on each call of contains_lonlat, even though it does not change.

contains_lonlat:

    def contains_lonlat(self, lon, lat, degrees=True):
        point = vector.lonlat_to_vector(lon, lat, degrees=degrees)  # is executed for each point call, though it does not change
        return self._contains_point(point, self._points, self._inside)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions