Skip to content

(feat) add attention logits to model output, add attention soft_cap to vanilla attention; (fix) DP sharding of batch, update dtype of memory tracking interval#209

Open
dvruette wants to merge 88 commits into
erfanzar:mainfrom
dvruette:main

make training loop compatible with new data sampler

69f5617
Select commit
Loading
Failed to load commit list.

Workflow runs completed with no jobs