Commit 7938e91
MoE + vllm = 😻 (#40132)
* update modeling mixtral
* oups[13;2u
* fix
* better naming?
* compute softmax and top_k inside the experts
* update minamax as well
* models that will need an update
* more models that need a fix
* stash
* fix mixtral
* update olmoe
* update
* update
* current changes
* nits
* molmoe is now fixed
* olmoe is good to go!
* refactor qwen2_moe
* fixes
* fixed moe
* fix qwen2 modular
* nit
* qwen2_moie test script works
* tricky rope !
* fix qwen3
* DeepSeek v3 MoE Standardization (#40538)
* DeepSeek-v3
Shared
Shared
* Dependents of DS3
* Standardize GLM4V MoE (#40539)
* up
* Standardize VitPose's MoE (#40549)
* VitPose
* outside
* outside
* outside
* fix
* update dbrx
* dbrx... the magix
* Refactor Ernie 4.5's MoE (#40547)
* Isolate Ernie fixes
* fix moe
---------
Co-authored-by: Vasqu <antonprogamer@gmail.com>
* fix style
* style
* fix copies
* style
* latest changes
* fixes
* had to stage
* current updaters
* up
* another modular
* modular graniteMoe
* some update
* draft another modular moe
* updaters
* up
* fix nit
* q3 nit
* fix phi moe
* we're going up up up up its our mooooment
* fix switch transformers this time around
* up
* gptsan japanese is deprecated forget about it
* fix mixtral to not be a linear (gives us more freedom)
* update
* fix copies gone wrong try catch nothing
* fix mixtral
* new refactor again
* update aria as well
* up dbrx and deepseekv3
* nit
* fix phimoe?
* fix deepseek v3
* nits
* don't bother with this one please
* up olmoe
* ??
* fix olmoe
* yups
* fiupx
* ish
* hot patch
* new qwen3
* updates
* up
* nit
* fix copies
* fix
* nits
* we're going up up up
* nits
* switch_transformesr edge case
* lol modular gptsan?
* fix deepseek
* finally all modeling match modular
* update
* up
* up
* dang
* up
* up aria
* fix dbrx
* nits here and there
* finish fixing dbrx
* fix deepseek
* upd
* up
* fix flex olmo
* updated
* update jamba
* JAMBA is stil a bit todo
* forward forward
* fix dots11
* update
* fix hunyuan
* fix some other
* update phimoe
* fuck you phimoe you are now submitted
* submit granitemoe as well
* try to fix some other models, reduces some of the failures
* fix olmoe and qwem2moe
* up
* up
* fix qwen2_moe
* update modular make it again, simpler
* nits
* up
* up
* fix
* someswitch reductions
* up
* fix qwen3vl
* some fixes to jetmo
* these should be shipped to the modular to fix jetmoe
* fix most of the nllb failures
* more nllb fixes
* fix the modular
* remove nllb modular as it sucks for now
* ?
* fix granitemoe
* granitemoehybrid don't have rope
* use rope when rope, no rope when no rope
* updates
* finish fixing dumbgrainite
* fix most of minimax
* fix
* update modular
* ?
* up
* up jetmoe still broken
* up
* fix, now align the moe
* fix jetmoe
* fix styling and qwen3 repo consitency
* updatge
* up up
* update ruff?
* nits
* modeling is goot now for switch
* fix
* more fixses to switch!
* fix some siwtch test
* ?
* ?
* up
* fix switch modular!
* nit?
* uip
* subtest
* can't believe I wasted so much time on this...
* fix
* updates
* nits
* nit jamba is fucking annoying
* ?
* fix?
* oups
* good good
* styling
* up
* make sure qwen2 sliding works!
* fix dbrx small
* lol
* nits
* fix one test
* fix load balancing loss issue
* fix jamba
* fix nllbmoe
* fix jamba consistency and doc?
* up
* thse are correct
* up
* up
* up
* some of the final cleanup
* update
* up
* fix some revert in granimoe
* bring back attention multipliers for the granite family we'll see later on if they need removal
* small jamba fix docstring and typing
* fix phimoe
* yup
* fix unk returndict in granitemoes
* up
* fix qwen config
* fix phiemoe check quality
* nits
* update based on caught non relative imports!
* fix dbrx
* Apply suggestions from code review
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* fix copies
* fiuxp
* fix dot1 regression!
* fix phimoe issue
* fix phi moe
* fix float() for some models
* fix jamba regression
* ui
* more dtype issues
* fix deepseek2 and 3?
* proper update
* fix modular deepseek!
* jamba jambaaaaaa
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>1 parent e6a8e7d commit 7938e91
File tree
86 files changed
+8433
-9185
lines changed- docs/source/en/model_doc
- src/transformers
- models
- aria
- bamba
- dbrx
- deepseek_v2
- deepseek_v3
- deprecated/gptsan_japanese
- dots1
- ernie4_5_moe
- falcon_h1
- flex_olmo
- glm4_moe
- glm4v_moe
- gpt_oss
- granitemoehybrid
- granitemoe
- hunyuan_v1_moe
- jamba
- jetmoe
- longcat_flash
- minimax
- mixtral
- moshi
- nllb_moe
- olmoe
- phimoe
- qwen2_moe
- qwen3_moe
- qwen3_next
- qwen3_omni_moe
- qwen3_vl_moe
- seamless_m4t_v2
- seamless_m4t
- switch_transformers
- vitpose_backbone
- zamba2
- zamba
- tests
- models
- dbrx
- jamba
- nllb_moe
- switch_transformers
- utils
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
86 files changed
+8433
-9185
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
108 | | - | |
109 | 108 | | |
110 | 109 | | |
111 | 110 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
357 | 357 | | |
358 | 358 | | |
359 | 359 | | |
| 360 | + | |
360 | 361 | | |
361 | 362 | | |
362 | 363 | | |
| |||
494 | 495 | | |
495 | 496 | | |
496 | 497 | | |
| 498 | + | |
497 | 499 | | |
498 | 500 | | |
499 | 501 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
309 | 309 | | |
310 | 310 | | |
311 | 311 | | |
312 | | - | |
313 | | - | |
314 | | - | |
315 | | - | |
316 | | - | |
317 | | - | |
318 | | - | |
319 | | - | |
320 | | - | |
| 312 | + | |
321 | 313 | | |
322 | 314 | | |
323 | 315 | | |
324 | 316 | | |
325 | 317 | | |
326 | 318 | | |
327 | | - | |
328 | | - | |
329 | | - | |
330 | | - | |
331 | | - | |
332 | | - | |
333 | | - | |
334 | | - | |
335 | | - | |
336 | | - | |
337 | | - | |
338 | | - | |
339 | | - | |
340 | | - | |
341 | | - | |
342 | | - | |
343 | | - | |
344 | | - | |
345 | | - | |
346 | | - | |
347 | | - | |
348 | | - | |
349 | | - | |
350 | | - | |
351 | | - | |
352 | | - | |
353 | | - | |
354 | | - | |
355 | | - | |
356 | | - | |
357 | | - | |
358 | | - | |
359 | | - | |
360 | | - | |
361 | | - | |
362 | | - | |
363 | | - | |
364 | | - | |
365 | | - | |
366 | | - | |
367 | | - | |
368 | | - | |
369 | | - | |
370 | | - | |
371 | | - | |
372 | | - | |
373 | | - | |
374 | | - | |
375 | | - | |
376 | | - | |
377 | | - | |
378 | | - | |
379 | | - | |
380 | | - | |
381 | | - | |
382 | | - | |
383 | | - | |
384 | | - | |
385 | | - | |
386 | | - | |
387 | | - | |
388 | | - | |
| 319 | + | |
| 320 | + | |
389 | 321 | | |
| 322 | + | |
390 | 323 | | |
391 | | - | |
392 | | - | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
393 | 327 | | |
394 | | - | |
| 328 | + | |
395 | 329 | | |
396 | 330 | | |
397 | 331 | | |
398 | 332 | | |
399 | | - | |
| 333 | + | |
400 | 334 | | |
401 | | - | |
402 | 335 | | |
403 | 336 | | |
404 | 337 | | |
405 | 338 | | |
406 | | - | |
407 | | - | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
408 | 343 | | |
409 | | - | |
410 | 344 | | |
411 | | - | |
| 345 | + | |
412 | 346 | | |
413 | 347 | | |
414 | 348 | | |
415 | 349 | | |
416 | 350 | | |
417 | 351 | | |
418 | | - | |
| 352 | + | |
| 353 | + | |
419 | 354 | | |
420 | | - | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
421 | 369 | | |
422 | | - | |
| 370 | + | |
423 | 371 | | |
424 | 372 | | |
425 | 373 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1120 | 1120 | | |
1121 | 1121 | | |
1122 | 1122 | | |
1123 | | - | |
1124 | | - | |
1125 | | - | |
1126 | | - | |
1127 | | - | |
1128 | | - | |
1129 | | - | |
1130 | | - | |
1131 | | - | |
| 1123 | + | |
1132 | 1124 | | |
1133 | 1125 | | |
1134 | 1126 | | |
1135 | 1127 | | |
1136 | 1128 | | |
1137 | 1129 | | |
1138 | | - | |
1139 | | - | |
1140 | | - | |
1141 | | - | |
1142 | | - | |
1143 | | - | |
1144 | | - | |
1145 | | - | |
1146 | | - | |
1147 | | - | |
1148 | | - | |
1149 | | - | |
1150 | | - | |
1151 | | - | |
1152 | | - | |
1153 | | - | |
1154 | | - | |
1155 | | - | |
1156 | | - | |
1157 | | - | |
1158 | | - | |
1159 | | - | |
1160 | | - | |
1161 | | - | |
1162 | | - | |
1163 | | - | |
1164 | | - | |
1165 | | - | |
1166 | | - | |
1167 | | - | |
1168 | | - | |
1169 | | - | |
1170 | | - | |
1171 | | - | |
1172 | | - | |
1173 | | - | |
1174 | | - | |
1175 | | - | |
1176 | | - | |
1177 | | - | |
1178 | | - | |
1179 | | - | |
1180 | | - | |
1181 | | - | |
1182 | | - | |
1183 | | - | |
1184 | | - | |
1185 | | - | |
1186 | | - | |
1187 | | - | |
1188 | | - | |
1189 | | - | |
1190 | | - | |
1191 | | - | |
1192 | | - | |
1193 | | - | |
1194 | | - | |
1195 | | - | |
1196 | | - | |
1197 | | - | |
1198 | | - | |
1199 | | - | |
| 1130 | + | |
| 1131 | + | |
1200 | 1132 | | |
| 1133 | + | |
1201 | 1134 | | |
1202 | | - | |
1203 | | - | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
1204 | 1138 | | |
1205 | | - | |
| 1139 | + | |
1206 | 1140 | | |
1207 | 1141 | | |
1208 | 1142 | | |
1209 | 1143 | | |
1210 | | - | |
| 1144 | + | |
1211 | 1145 | | |
1212 | | - | |
1213 | 1146 | | |
1214 | 1147 | | |
1215 | 1148 | | |
1216 | 1149 | | |
1217 | | - | |
1218 | | - | |
| 1150 | + | |
| 1151 | + | |
| 1152 | + | |
| 1153 | + | |
1219 | 1154 | | |
1220 | | - | |
1221 | 1155 | | |
1222 | | - | |
| 1156 | + | |
1223 | 1157 | | |
1224 | 1158 | | |
1225 | 1159 | | |
1226 | 1160 | | |
1227 | 1161 | | |
1228 | 1162 | | |
1229 | | - | |
| 1163 | + | |
| 1164 | + | |
1230 | 1165 | | |
1231 | | - | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
| 1171 | + | |
| 1172 | + | |
| 1173 | + | |
| 1174 | + | |
| 1175 | + | |
| 1176 | + | |
| 1177 | + | |
| 1178 | + | |
| 1179 | + | |
1232 | 1180 | | |
1233 | | - | |
| 1181 | + | |
1234 | 1182 | | |
1235 | 1183 | | |
1236 | 1184 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | | - | |
170 | | - | |
171 | | - | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
172 | 180 | | |
173 | 181 | | |
174 | 182 | | |
| |||
0 commit comments