Hi authors,
Thanks for your impressive work. The paper states training was performed on 32 GPUs with a global batch size of 32, yet details regarding per-GPU VRAM requirements are not specified. Could you kindly share the minimum GPU memory specification needed to reproduce your training setup?
Thanks in advance.
Hi authors,
Thanks for your impressive work. The paper states training was performed on 32 GPUs with a global batch size of 32, yet details regarding per-GPU VRAM requirements are not specified. Could you kindly share the minimum GPU memory specification needed to reproduce your training setup?
Thanks in advance.