Skip to content

opooladz/PSGD_Nuon

Repository files navigation

PSGD_Nuon

Not Muon

Use single sided whitening that is dynamic and learned instead of being instantanious like Muon. This means we don't have to do it every iteration -- think of the savings.

SIREN Example

image image

Siren exmple with Nuon beats Muon tuned

Hyper-params for Muon(Keller): reaches loss of 0.000982 PSGD Nuon reaches loss of 0.000898

    # Assuming Muon is defined elsewhere
    optimizer = Muon(
        muon_params,
        lr=0.005,
        momentum=0.9,
        adamw_params=adamw_params,
        adamw_lr=3e-4,
        adamw_betas=(0.90, 0.95),
        adamw_wd=0
    )

About

Not Muon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages