Hi authors, thanks for providing code for this excellent work! I noticed that in the paper there are optimizations for the non-linear functions (GELU, Softmax, Layernorm), but I did not find the non-linear functions in the code, could you guide me to this part? Thanks a lot if you could give me any help.
Hi authors, thanks for providing code for this excellent work! I noticed that in the paper there are optimizations for the non-linear functions (GELU, Softmax, Layernorm), but I did not find the non-linear functions in the code, could you guide me to this part? Thanks a lot if you could give me any help.