About GPU requirements for training

Hi authors, 

Thanks for your impressive work. The paper states training was performed on 32 GPUs with a global batch size of 32, yet details regarding per-GPU VRAM requirements are not specified. Could you kindly share the minimum GPU memory specification needed to reproduce your training setup?

Thanks in advance.