Skip to content

style: add missing type hints to public functions in ops and config#93

Open
arnavsharma990 wants to merge 3 commits into
mllam:mainfrom
arnavsharma990:missing-type-hint
Open

style: add missing type hints to public functions in ops and config#93
arnavsharma990 wants to merge 3 commits into
mllam:mainfrom
arnavsharma990:missing-type-hint

Conversation

@arnavsharma990

@arnavsharma990 arnavsharma990 commented Feb 20, 2026

Copy link
Copy Markdown

Describe your changes

This PR adds missing type annotations to several public functions across the codebase to improve static analysis, readability, and maintainability.

Type hints were added or refined across multiple modules to make function contracts clearer and improve compatibility with static type checking tools.

Updated modules and functions

loading.py

  • load_input_dataset(fp: str) -> xr.Dataset

selection.py

  • select_by_kwargs
  • check_point_in_dataset
  • check_step
  • _normalize_slice_startstop
  • _normalize_slice_step

subsetting.py

  • extract_variable with Union / Optional types
  • updated _sample to support optional coordinate types

mapping.py

  • map_dims_and_variables
  • _check_for_malformed_list_arg
  • dim_mapping: Dict[str, DimMapping]

stacking.py

  • stack_variables_as_coord_values
  • stack_variables_by_coord_values

chunking.py

  • check_chunk_size
  • chunk_dataset with Dict[str, int]

config.py

  • validate_config(config_inputs: Dict[str, InputDataset]) -> None

Motivation

Adding type hints:

  • Improves IDE support and autocompletion
  • Strengthens static type checking (mypy compatibility)
  • Makes function contracts clearer for contributors
  • Improves long-term maintainability

No runtime behavior was modified.

Dependencies:
No new dependencies were introduced.


Issue Link

Improves type safety and maintainability of the codebase.


Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form
  • I have requested a reviewer and an assignee

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section reflecting type of change:
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog
  • Once the PR is ready to be merged, squash commits and merge the PR.

- loading.py: load_input_dataset(fp: str) -> xr.Dataset
- selection.py: select_by_kwargs, check_point_in_dataset, check_step;
  _normalize_slice_startstop, _normalize_slice_step
- subsetting.py: extract_variable with Union/Optional types; use None
  default for coords_to_sample to support Optional type
- mapping.py: map_dims_and_variables, _check_for_malformed_list_arg;
  dim_mapping: Dict[str, DimMapping]
- stacking.py: stack_variables_as_coord_values, stack_variables_by_coord_values
- chunking.py: check_chunk_size, chunk_dataset with Dict[str, int]
- config.py: validate_config(config_inputs: Dict[str, InputDataset]) -> None

Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings February 20, 2026 07:43

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds type annotations to public functions across the ops and config modules to improve static analysis, IDE support, and code maintainability. The changes are purely additive and do not modify runtime behavior.

Changes:

  • Added complete type hints (parameters and return types) to 13 public functions across 7 modules
  • Improved mutable default argument handling by using Optional with None defaults
  • Added necessary typing imports to support the new type annotations

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
mllam_data_prep/ops/subsetting.py Added type hints to extract_variable function and changed default argument from dict() to None to avoid mutable default
mllam_data_prep/ops/stacking.py Added type hints to stack_variables_as_coord_values and stack_variables_by_coord_values
mllam_data_prep/ops/selection.py Added type hints to select_by_kwargs, check_point_in_dataset, check_step, and private normalize functions
mllam_data_prep/ops/mapping.py Added type hints to map_dims_and_variables and _check_for_malformed_list_arg, importing DimMapping type
mllam_data_prep/ops/loading.py Added type hint to load_input_dataset
mllam_data_prep/ops/chunking.py Added type hints to check_chunk_size and chunk_dataset
mllam_data_prep/config.py Added type hint to validate_config using forward reference for InputDataset

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mllam_data_prep/ops/subsetting.py Outdated

def extract_variable(
ds: xr.Dataset,
var_name: Union[str, Dict[str, Any], List[str]],

Copilot AI Feb 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type hint for var_name is incorrect. Based on the actual usage in create_dataset.py (lines 192-200), this parameter is always a string representing a single variable name to extract from the dataset. The function implementation at line 37 (da = ds[var_name]) also requires a string to index into the dataset. The type hint should be str instead of Union[str, Dict[str, Any], List[str]]. The docstring at lines 20-23 also incorrectly describes this parameter as accepting Dict or List types.

Copilot uses AI. Check for mistakes.
Comment thread mllam_data_prep/ops/selection.py Outdated
Comment on lines +10 to +12
def _normalize_slice_startstop(
s: Union[pd.Timestamp, str, Any]
) -> Union[pd.Timestamp, str, Any]:

Copilot AI Feb 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type hint uses the catch-all Any type instead of the more specific types from Range. Based on the Range class definition (config.py lines 78-79), start and end values can be Union[str, int, float]. The type should be Union[pd.Timestamp, str, int, float] for both input and return to provide better type safety while covering all valid input types.

Copilot uses AI. Check for mistakes.
Comment thread mllam_data_prep/ops/selection.py Outdated
Comment on lines +24 to +26
def _normalize_slice_step(
s: Union[pd.Timedelta, str, Any]
) -> Union[pd.Timedelta, str, Any]:

Copilot AI Feb 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type hint has multiple issues: 1) It uses the catch-all Any type instead of the more specific types from Range (int, float). 2) It doesn't account for None values - Range.step is Optional (config.py line 80), and the function returns None unchanged in the else branch (line 35). The type should be Optional[Union[pd.Timedelta, str, int, float]] for both input and return to accurately reflect the actual behavior and provide better type safety.

Copilot uses AI. Check for mistakes.
Comment thread mllam_data_prep/ops/subsetting.py Outdated
def extract_variable(
ds: xr.Dataset,
var_name: Union[str, Dict[str, Any], List[str]],
coords_to_sample: Optional[Dict[str, Any]] = None,

Copilot AI Feb 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type hint Dict[str, Any] could be more specific. Based on the actual usage in create_dataset.py (line 192-196) and the InputDataset.variables type (config.py line 238), coords_to_sample should be Dict[str, ValueSelection]. This would provide better type safety and IDE support. You can import ValueSelection from ..config similar to how selection.py imports Range.

Copilot uses AI. Check for mistakes.
arnavsharma990 and others added 2 commits February 20, 2026 13:20
- Change var_name type from Union[str, Dict, List] to str
  (var_name is always a string when used to index ds[var_name])
- Change coords_to_sample type from Dict[str, Any] to Dict[str, ValueSelection]
  (matches actual usage where sampling.values and sampling.units are accessed)
- Update docstring to reflect correct types
- Import ValueSelection from ..config

Co-authored-by: Cursor <cursoragent@cursor.com>
- _normalize_slice_startstop: Change from Union[pd.Timestamp, str, Any]
  to Union[pd.Timestamp, str, int, float] to match Range.start/end types
- _normalize_slice_step: Change from Union[pd.Timedelta, str, Any]
  to Optional[Union[pd.Timedelta, str, int, float]] to match Range.step
  (which is Optional) and account for None values
- Remove unused Any import, add Optional import

Co-authored-by: Cursor <cursoragent@cursor.com>
@leifdenby

Copy link
Copy Markdown
Member

Please revert to using the default PR template here. Once you've done that you can tag me for a review :) thank you!

@arnavsharma990

Copy link
Copy Markdown
Author

@leifdenby I've updated the PR to follow the default template. Please let me know if any adjustments are needed. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants