Skip to content

Latest commit

 

History

History
47 lines (40 loc) · 1.75 KB

File metadata and controls

47 lines (40 loc) · 1.75 KB

AccelMoE: Accelerated Mixture-of-Expert model

AccelMoE is a project that optimizes a CPU-based mixture-of-experts architecture into GPU-based accelerated code. The project utilizes CUDA kernel programming to effectively execute computations on the GPU. The project was awarded 3rd Place at the Accelerator Programming School competition.

Note

This work is a project conducted as part of the Accelerator Programming School at Seoul National University.

Optimization Overview

Archeture

Optimization Techniques

  • GPU formatting using CUDA kernel programming
  • Kernel fusion to combine Conv1D or Linear and ReLU operations
  • CUDA streaming for efficient parallel processing
  • Batch processing to maximize throughput
  • Warp occupancy optimization

Improved Performance

Achieved a 650× speedup when executed on the GPU.

CPU version

Initializing inputs and parameters...Done!
Predicting sentiment...Done!
Elapsed time: 1.467701 (sec)
Throughput: 0.681338 (sentences/sec)
Finalizing...Done!
Saving outputs to ./data/outputs.bin...Done!
Validating...PASSED!

GPU version

Initializing inputs and parameters...Done!
Predicting sentiment...Done!
Elapsed time: 0.074036 (sec)
Throughput: 432.224966 (sentences/sec)
Finalizing...Done!
Saving outputs to ./data/outputs.bin...Done!
Validating...PASSED!

Contributors

Haeseung Jeon Suyeon Jo
@Ewha Womans Univ. @Myongji Univ.