PSGD_Nuon

Not Muon

Use single sided whitening that is dynamic and learned instead of being instantanious like Muon. This means we don't have to do it every iteration -- think of the savings.

SIREN Example

Siren exmple with Nuon beats Muon tuned

Hyper-params for Muon(Keller): reaches loss of 0.000982 PSGD Nuon reaches loss of 0.000898

    # Assuming Muon is defined elsewhere
    optimizer = Muon(
        muon_params,
        lr=0.005,
        momentum=0.9,
        adamw_params=adamw_params,
        adamw_lr=3e-4,
        adamw_betas=(0.90, 0.95),
        adamw_wd=0
    )

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
README.md		README.md
nuon_exp_varient1.py		nuon_exp_varient1.py
nuon_exp_varient2.py		nuon_exp_varient2.py
psgd_nuon.py		psgd_nuon.py
psgd_nuon_instantanious.py		psgd_nuon_instantanious.py
psgd_nuon_no_solve.py		psgd_nuon_no_solve.py
train_siren.py		train_siren.py
train_siren_instantanious.py		train_siren_instantanious.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PSGD_Nuon

SIREN Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PSGD_Nuon

SIREN Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages