Skip to content

Conversation

@Spico197
Copy link
Collaborator

@Spico197 Spico197 commented Dec 7, 2024

What's New

Add megablocks support for MLP MoE. Dumping & Reloading test is passed by observing the continuous loss decline. But further downstream metrics are not tested. Please use this with caution.

  1. Conversion from the dense LLaMA model: smoe/utils/expert_construction/convert_llama_to_mixtral_mb.py
  2. Add moe_type="megablocks" support for smoe/models/mixtral/modeling_mixtral.py

Performance Test

  • Experiments are conducted on 4*A100 GPUs with parameters converted from LLaMA-3-8B (8 experts, top-2).
  • The dataset is composed of 50 samples from OpenHermes-2.5.
  • bsz=2, grad accum=4, seq len=4096
Setting Tokens/GPU/Second
w/o MegaBlocks 13485
w/ MegaBlocks 19051

@Spico197 Spico197 requested a review from XiaoYee December 7, 2024 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant