style: add missing type hints to public functions in ops and config#93
style: add missing type hints to public functions in ops and config#93arnavsharma990 wants to merge 3 commits into
Conversation
- loading.py: load_input_dataset(fp: str) -> xr.Dataset - selection.py: select_by_kwargs, check_point_in_dataset, check_step; _normalize_slice_startstop, _normalize_slice_step - subsetting.py: extract_variable with Union/Optional types; use None default for coords_to_sample to support Optional type - mapping.py: map_dims_and_variables, _check_for_malformed_list_arg; dim_mapping: Dict[str, DimMapping] - stacking.py: stack_variables_as_coord_values, stack_variables_by_coord_values - chunking.py: check_chunk_size, chunk_dataset with Dict[str, int] - config.py: validate_config(config_inputs: Dict[str, InputDataset]) -> None Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Pull request overview
This PR adds type annotations to public functions across the ops and config modules to improve static analysis, IDE support, and code maintainability. The changes are purely additive and do not modify runtime behavior.
Changes:
- Added complete type hints (parameters and return types) to 13 public functions across 7 modules
- Improved mutable default argument handling by using Optional with None defaults
- Added necessary typing imports to support the new type annotations
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| mllam_data_prep/ops/subsetting.py | Added type hints to extract_variable function and changed default argument from dict() to None to avoid mutable default |
| mllam_data_prep/ops/stacking.py | Added type hints to stack_variables_as_coord_values and stack_variables_by_coord_values |
| mllam_data_prep/ops/selection.py | Added type hints to select_by_kwargs, check_point_in_dataset, check_step, and private normalize functions |
| mllam_data_prep/ops/mapping.py | Added type hints to map_dims_and_variables and _check_for_malformed_list_arg, importing DimMapping type |
| mllam_data_prep/ops/loading.py | Added type hint to load_input_dataset |
| mllam_data_prep/ops/chunking.py | Added type hints to check_chunk_size and chunk_dataset |
| mllam_data_prep/config.py | Added type hint to validate_config using forward reference for InputDataset |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| def extract_variable( | ||
| ds: xr.Dataset, | ||
| var_name: Union[str, Dict[str, Any], List[str]], |
There was a problem hiding this comment.
The type hint for var_name is incorrect. Based on the actual usage in create_dataset.py (lines 192-200), this parameter is always a string representing a single variable name to extract from the dataset. The function implementation at line 37 (da = ds[var_name]) also requires a string to index into the dataset. The type hint should be str instead of Union[str, Dict[str, Any], List[str]]. The docstring at lines 20-23 also incorrectly describes this parameter as accepting Dict or List types.
| def _normalize_slice_startstop( | ||
| s: Union[pd.Timestamp, str, Any] | ||
| ) -> Union[pd.Timestamp, str, Any]: |
There was a problem hiding this comment.
The type hint uses the catch-all Any type instead of the more specific types from Range. Based on the Range class definition (config.py lines 78-79), start and end values can be Union[str, int, float]. The type should be Union[pd.Timestamp, str, int, float] for both input and return to provide better type safety while covering all valid input types.
| def _normalize_slice_step( | ||
| s: Union[pd.Timedelta, str, Any] | ||
| ) -> Union[pd.Timedelta, str, Any]: |
There was a problem hiding this comment.
The type hint has multiple issues: 1) It uses the catch-all Any type instead of the more specific types from Range (int, float). 2) It doesn't account for None values - Range.step is Optional (config.py line 80), and the function returns None unchanged in the else branch (line 35). The type should be Optional[Union[pd.Timedelta, str, int, float]] for both input and return to accurately reflect the actual behavior and provide better type safety.
| def extract_variable( | ||
| ds: xr.Dataset, | ||
| var_name: Union[str, Dict[str, Any], List[str]], | ||
| coords_to_sample: Optional[Dict[str, Any]] = None, |
There was a problem hiding this comment.
The type hint Dict[str, Any] could be more specific. Based on the actual usage in create_dataset.py (line 192-196) and the InputDataset.variables type (config.py line 238), coords_to_sample should be Dict[str, ValueSelection]. This would provide better type safety and IDE support. You can import ValueSelection from ..config similar to how selection.py imports Range.
- Change var_name type from Union[str, Dict, List] to str (var_name is always a string when used to index ds[var_name]) - Change coords_to_sample type from Dict[str, Any] to Dict[str, ValueSelection] (matches actual usage where sampling.values and sampling.units are accessed) - Update docstring to reflect correct types - Import ValueSelection from ..config Co-authored-by: Cursor <cursoragent@cursor.com>
- _normalize_slice_startstop: Change from Union[pd.Timestamp, str, Any] to Union[pd.Timestamp, str, int, float] to match Range.start/end types - _normalize_slice_step: Change from Union[pd.Timedelta, str, Any] to Optional[Union[pd.Timedelta, str, int, float]] to match Range.step (which is Optional) and account for None values - Remove unused Any import, add Optional import Co-authored-by: Cursor <cursoragent@cursor.com>
|
Please revert to using the default PR template here. Once you've done that you can tag me for a review :) thank you! |
|
@leifdenby I've updated the PR to follow the default template. Please let me know if any adjustments are needed. Thanks! |
Describe your changes
This PR adds missing type annotations to several public functions across the codebase to improve static analysis, readability, and maintainability.
Type hints were added or refined across multiple modules to make function contracts clearer and improve compatibility with static type checking tools.
Updated modules and functions
loading.py
load_input_dataset(fp: str) -> xr.Datasetselection.py
select_by_kwargscheck_point_in_datasetcheck_step_normalize_slice_startstop_normalize_slice_stepsubsetting.py
extract_variablewithUnion/Optionaltypes_sampleto support optional coordinate typesmapping.py
map_dims_and_variables_check_for_malformed_list_argdim_mapping: Dict[str, DimMapping]stacking.py
stack_variables_as_coord_valuesstack_variables_by_coord_valueschunking.py
check_chunk_sizechunk_datasetwithDict[str, int]config.py
validate_config(config_inputs: Dict[str, InputDataset]) -> NoneMotivation
Adding type hints:
No runtime behavior was modified.
Dependencies:
No new dependencies were introduced.
Issue Link
Improves type safety and maintainability of the codebase.
Type of change
Checklist before requesting a review
pullwith--rebaseoption if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
Checklist for assignee