Conversation
d943446 to
c8ece6b
Compare
7c91c97 to
8ed6d54
Compare
8ed6d54 to
b7c57c1
Compare
There was a problem hiding this comment.
Many nice changes, to assess the functional improvements, I'd need to more time to understand the changes in depth.
Some higher-level comments:
- For typing, it might be nice to add some static typechecking like pyright or ty (Astral)
- Would be helpful to have more docstrings/explainers for the new classes and functions introduced in Writer
- What do we use for logging? I've seen a couple of raise and specific errors but no logging.
I'll test it on our HPC once we've implemented the fixes.
clessig
left a comment
There was a problem hiding this comment.
See comments. I will also rebase.
| self._append_dataset(self.source, "source") | ||
|
|
||
| if self.key.with_target(forecast_offset): | ||
| # TODO requiring target data currently prevents predictions with unknowable target |
There was a problem hiding this comment.
Does the code work when target is None? Or should we have a tensor of size (0,num_channels)?
There was a problem hiding this comment.
No currently neither option works. But it should work with target=None Writing an entire extra dataset to zarr that is totally empty is a bad band aid solution. I think we should rename ItemKey.with_target to ItemKey.with_prediction (since it is the prediction we are always interested in if the target is present it just means some ground truth is available and if not that is also fine) and require just that prediction is not None. Will implement.
src/weathergen/datasets/batch.py
Outdated
|
|
||
| # device of the tensors in the batch | ||
| device: str | torch.device | ||
| The data a instance contains is associated with one particular inital sampling window. |
There was a problem hiding this comment.
This should replace the whole paragraph:
Class representing one batch processed by the model.
This is not the place to explain how some of the general processing works.
There was a problem hiding this comment.
At least I was missing this context when processing the data, where should I put the information instead?
There was a problem hiding this comment.
MultiStreamDataSampler is responsible for assembling batches
| targets = [ | ||
| targets | ||
| for loss_name, targets in targets_and_auxs.items() | ||
| if loss_name == PHYSICAL_LOSS_KEY |
There was a problem hiding this comment.
What about latent outputs?
There was a problem hiding this comment.
If possible I would like to keep this out of scope for now and wait for #1860 to be finalized.
src/weathergen/utils/output.py
Outdated
| try: # TODO: do this in config | ||
| _output_streams = val_cfg.output.streams | ||
| if _output_streams is None: | ||
| raise ConfigAttributeError("") |
There was a problem hiding this comment.
This should generate a warning and then terminate regularly
There was a problem hiding this comment.
Currently not every training/validation/test config implements a "output" section, this guarantees a fallback/default. If the output section is missing should the default instead be no output/ no-op writer? Failing instead would mean that configs where no output is needed would require to specify a "output" section.
There was a problem hiding this comment.
If there's no output section or no streams are specified or the specified streams are not present should all generate a warning and terminate regularly.
src/weathergen/utils/output.py
Outdated
| name: config for name, config in streams.items() if name in _output_streams | ||
| } | ||
| try: # TODO: do this in config | ||
| self._forecast_offset = val_cfg.forecast.offset |
There was a problem hiding this comment.
Why not use self._forecast_offset = val_cfg.forecast.get( "offset", 0)--that's much more readable than the try/except
There was a problem hiding this comment.
The try except also handles the case where a training/val/test config does not contain a "forecast" section. Can I always assume this is always there I am happy to change.
There was a problem hiding this comment.
val_cfg.get( "forecast", {}).get( "offset, 0) is better
src/weathergen/utils/output.py
Outdated
| data = self._predictions.get_physical_prediction_normalized(key, self._normalizer) | ||
| except Exception as e: | ||
| # TODO: if preds are empty so create copy of target and add ensemble dimension | ||
| # preds = [targets[0].clone().unsqueeze(0)] |
There was a problem hiding this comment.
We need this. What's the problem?
| return self | ||
|
|
||
| @property | ||
| def sample_idxs(self) -> list[int]: |
There was a problem hiding this comment.
Just quality of life: Use batch.target_samples.sample_idxs instead of sample_idxs = [sample .sample_idx for sample in batch.target_samples.samples]
There was a problem hiding this comment.
Why do we need sample .sample_idx--this is not on current develop
…nto sgrasse/develop/1526-update-io
Description
Issue Number
Closes #1526
Is this PR a draft? Mark it as draft.
Checklist before asking for review
./scripts/actions.sh lint./scripts/actions.sh unit-test./scripts/actions.sh integration-testlaunch-slurm.py --time 60