-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Adding ScatterMoE kernel support for Granite models. #41458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
MekkCyber
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this pr @shawntan ! can you open a pr here : https://github.com/huggingface/kernels-community to add your kernel without the build folder, so that we can review the source code, and once merged we will upload the builds on the hub, and then we can merge this pr so everyone can use the kernel. If you don't have time i'm happy to help 🤗
|
So far I've simply been copying from Should I follow the example here: https://github.com/huggingface/kernels-community/tree/main/trimul_gpumode It's another triton based model with |
|
Should I be targetting |
* update modeling mixtral * oups[13;2u * fix * better naming? * compute softmax and top_k inside the experts * update minamax as well * models that will need an update * more models that need a fix * stash * fix mixtral * update olmoe * update * update * current changes * nits * molmoe is now fixed * olmoe is good to go! * refactor qwen2_moe * fixes * fixed moe * fix qwen2 modular * nit * qwen2_moie test script works * tricky rope ! * fix qwen3 * DeepSeek v3 MoE Standardization (#40538) * DeepSeek-v3 Shared Shared * Dependents of DS3 * Standardize GLM4V MoE (#40539) * up * Standardize VitPose's MoE (#40549) * VitPose * outside * outside * outside * fix * update dbrx * dbrx... the magix * Refactor Ernie 4.5's MoE (#40547) * Isolate Ernie fixes * fix moe --------- Co-authored-by: Vasqu <antonprogamer@gmail.com> * fix style * style * fix copies * style * latest changes * fixes * had to stage * current updaters * up * another modular * modular graniteMoe * some update * draft another modular moe * updaters * up * fix nit * q3 nit * fix phi moe * we're going up up up up its our mooooment * fix switch transformers this time around * up * gptsan japanese is deprecated forget about it * fix mixtral to not be a linear (gives us more freedom) * update * fix copies gone wrong try catch nothing * fix mixtral * new refactor again * update aria as well * up dbrx and deepseekv3 * nit * fix phimoe? * fix deepseek v3 * nits * don't bother with this one please * up olmoe * ?? * fix olmoe * yups * fiupx * ish * hot patch * new qwen3 * updates * up * nit * fix copies * fix * nits * we're going up up up * nits * switch_transformesr edge case * lol modular gptsan? * fix deepseek * finally all modeling match modular * update * up * up * dang * up * up aria * fix dbrx * nits here and there * finish fixing dbrx * fix deepseek * upd * up * fix flex olmo * updated * update jamba * JAMBA is stil a bit todo * forward forward * fix dots11 * update * fix hunyuan * fix some other * update phimoe * fuck you phimoe you are now submitted * submit granitemoe as well * try to fix some other models, reduces some of the failures * fix olmoe and qwem2moe * up * up * fix qwen2_moe * update modular make it again, simpler * nits * up * up * fix * someswitch reductions * up * fix qwen3vl * some fixes to jetmo * these should be shipped to the modular to fix jetmoe * fix most of the nllb failures * more nllb fixes * fix the modular * remove nllb modular as it sucks for now * ? * fix granitemoe * granitemoehybrid don't have rope * use rope when rope, no rope when no rope * updates * finish fixing dumbgrainite * fix most of minimax * fix * update modular * ? * up * up jetmoe still broken * up * fix, now align the moe * fix jetmoe * fix styling and qwen3 repo consitency * updatge * up up * update ruff? * nits * modeling is goot now for switch * fix * more fixses to switch! * fix some siwtch test * ? * ? * up * fix switch modular! * nit? * uip * subtest * can't believe I wasted so much time on this... * fix * updates * nits * nit jamba is fucking annoying * ? * fix? * oups * good good * styling * up * make sure qwen2 sliding works! * fix dbrx small * lol * nits * fix one test * fix load balancing loss issue * fix jamba * fix nllbmoe * fix jamba consistency and doc? * up * thse are correct * up * up * up * some of the final cleanup * update * up * fix some revert in granimoe * bring back attention multipliers for the granite family we'll see later on if they need removal * small jamba fix docstring and typing * fix phimoe * yup * fix unk returndict in granitemoes * up * fix qwen config * fix phiemoe check quality * nits * update based on caught non relative imports! * fix dbrx * Apply suggestions from code review Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> * fix copies * fiuxp * fix dot1 regression! * fix phimoe issue * fix phi moe * fix float() for some models * fix jamba regression * ui * more dtype issues * fix deepseek2 and 3? * proper update * fix modular deepseek! * jamba jambaaaaaa --------- Co-authored-by: Lysandre Debut <hi@lysand.re> Co-authored-by: Vasqu <antonprogamer@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
|
Some changes have been made to GraniteMoE in the latest I haven't been able to test out the current
|
4fd05b1 to
2208a35
Compare
|
@MekkCyber will need some help here. Which branch should I target for the PR:
|
|
Hey @shawntan, let's target main! I think you did some faulty rebase, let's only keep the relevant files in the PR. |
|
I've reverted the PR back to the original one against My issue was this line:
|
Yes it was removed in the MoE refactor for vllm compatibility with transformers but this shouldn't break anything in the transformers implementation
What do you mean ? |
I see. I will need to change the layer definition in the kernel since it will only produce one output, instead of the tuple it returns right now. I am also confused as to how the auxiliary loss will work if the MoE doesn't return the Update: The attribute is checked for during the forward call:
But it is never passed to the model: transformers/src/transformers/models/granitemoe/modeling_granitemoe.py Lines 696 to 704 in ac81541
There is also no pathway for passing the router logits, or recomputing them in transformers/src/transformers/models/granitemoe/modeling_granitemoe.py Lines 492 to 551 in ac81541
What's the ideal way that the HF team is thinking of allowing for this while still maintaining vllm compatibility? I can make the necessary changes.
Hmm, sorry, I can't seem to reproduce the same issue I saw before with the jinja files. Still, there are some incompatibilities from the Mamba end of things, right now with both repositories on |
|
TL;DR:
|
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! #41580 was merged, IDK if you need more changes but indeed for collecting router logits you might need a small change! You could put the collection on the gate linear layer?
|
[For maintainers] Suggested jobs to run (before merge) run-slow: granitemoe, granitemoehybrid, granitemoeshared |
What does this PR do?
Adds ScatterMoE kernel support for Granite MoE models.
Started in #40365 but has significantly deviated in approach, so starting a new pull request.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@MekkCyber already started to provide some comments in #40365.