Add megablocks support for MLP MoE #2

Spico197 · 2024-12-07T18:43:49Z

What's New

Add megablocks support for MLP MoE. Dumping & Reloading test is passed by observing the continuous loss decline. But further downstream metrics are not tested. Please use this with caution.

Conversion from the dense LLaMA model: smoe/utils/expert_construction/convert_llama_to_mixtral_mb.py
Add moe_type="megablocks" support for smoe/models/mixtral/modeling_mixtral.py

Performance Test

Experiments are conducted on 4*A100 GPUs with parameters converted from LLaMA-3-8B (8 experts, top-2).
The dataset is composed of 50 samples from OpenHermes-2.5.
bsz=2, grad accum=4, seq len=4096

Setting	Tokens/GPU/Second
w/o MegaBlocks	13485
w/ MegaBlocks	19051

Spico197 added 2 commits December 8, 2024 02:31

add megablocks support to mlp moe in mixtral

83301da

Merge remote-tracking branch 'origin/main'

a571197

Spico197 requested a review from XiaoYee December 7, 2024 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add megablocks support for MLP MoE #2

Add megablocks support for MLP MoE #2

Uh oh!

Spico197 commented Dec 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add megablocks support for MLP MoE #2

Are you sure you want to change the base?

Add megablocks support for MLP MoE #2

Uh oh!

Conversation

Spico197 commented Dec 7, 2024

What's New

Performance Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant