Sophiex/dev/feat qk rmsnorm#2033
Conversation
…iex/dev/warm-and-frozen-teachers
…iex/dev/warm-and-frozen-teachers
…to be (at least here) identical
…x/dev/exp-week9-local2global
| return config | ||
|
|
||
|
|
||
| def _check_qk_norm_type(config: Config) -> Config: |
There was a problem hiding this comment.
This backfilling leads to problems elsewhere. We can just use cf.get( ...)
There was a problem hiding this comment.
This was necessary I think for loading the teacher model somehow. I forget the details
There was a problem hiding this comment.
Remove. This should not be added here. _check_logging above let to all kinds of problems lately. Using cf.get("config.qk_norm_type", "LayerNorm") is much more robust and useful
| return tokens_global_c | ||
|
|
||
|
|
||
| class Local2GlobalSumEngine(torch.nn.Module): |
There was a problem hiding this comment.
Assuming the PR with this change is merged first
…nto sophiex/dev/feat-qk-rmsnorm
clessig
left a comment
There was a problem hiding this comment.
Thanks, PR looks good (has been rebased to latest develop). Can be merged with two changes:
- Remove the sum aggregation engine (or merge a clean PR with this before)
- Move the fix(?) / change to plot_utils.py to a separate PR--also see my comment there
@shmh40 : can you approve and merge
| return config | ||
|
|
||
|
|
||
| def _check_qk_norm_type(config: Config) -> Config: |
There was a problem hiding this comment.
Remove. This should not be added here. _check_logging above let to all kinds of problems lately. Using cf.get("config.qk_norm_type", "LayerNorm") is much more robust and useful
| return tokens_global_c | ||
|
|
||
|
|
||
| class Local2GlobalSumEngine(torch.nn.Module): |
Description
Improves performance see https://gitlab.jsc.fz-juelich.de/hedgedoc/OgGOfs2-RVOZcEEjB0108w?view
Issue Number
Closes #2032
Is this PR a draft? Mark it as draft.
Checklist before asking for review
./scripts/actions.sh lint./scripts/actions.sh unit-test./scripts/actions.sh integration-testlaunch-slurm.py --time 60