AccelMoE: Accelerated Mixture-of-Expert model

AccelMoE is a project that optimizes a CPU-based mixture-of-experts architecture into GPU-based accelerated code. The project utilizes CUDA kernel programming to effectively execute computations on the GPU. The project was awarded 3rd Place at the Accelerator Programming School competition.

Note

This work is a project conducted as part of the Accelerator Programming School at Seoul National University.

Optimization Overview

Optimization Techniques

GPU formatting using CUDA kernel programming
Kernel fusion to combine Conv1D or Linear and ReLU operations
CUDA streaming for efficient parallel processing
Batch processing to maximize throughput
Warp occupancy optimization

Improved Performance

Achieved a 650× speedup when executed on the GPU.

CPU version

Initializing inputs and parameters...Done!
Predicting sentiment...Done!
Elapsed time: 1.467701 (sec)
Throughput: 0.681338 (sentences/sec)
Finalizing...Done!
Saving outputs to ./data/outputs.bin...Done!
Validating...PASSED!

GPU version

Initializing inputs and parameters...Done!
Predicting sentiment...Done!
Elapsed time: 0.074036 (sec)
Throughput: 432.224966 (sentences/sec)
Finalizing...Done!
Saving outputs to ./data/outputs.bin...Done!
Validating...PASSED!

Contributors



Haeseung Jeon	Suyeon Jo
@Ewha Womans Univ.	@Myongji Univ.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AccelMoE: Accelerated Mixture-of-Expert model

Optimization Overview

Optimization Techniques

Improved Performance

CPU version

GPU version

Contributors

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

AccelMoE: Accelerated Mixture-of-Expert model

Optimization Overview

Optimization Techniques

Improved Performance

CPU version

GPU version

Contributors