Commit 6930085
feat: MxInt4 x Bf16 TRT-LLM Gen MoE support (#2159)
<!-- .github/pull_request_template.md -->
## π Description
Add the MxInt4 x BF16 TRTLLM GEN moe
## π Related Issues
<!-- Link any related issues here -->
## π Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### β
Pre-commit Checks
- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## π§ͺ Tests
- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* MXInt4 MoE inference path with public API and test coverage; MXInt4 +
BF16 supported end-to-end.
* Exposed new MXInt4 op and helper in the package exports.
* **Refactor**
* Block-scale/interleave routines generalized to support uint8 and
bfloat16 inputs and outputs.
* GEMM/BatchedGemm configs now include an element-wise activation option
and are arch-aware (CUDA arch).
* **Tests**
* Added MXInt4 quantization and runtime tests for MoE.
* **Chores**
* Updated packaged artifact path/checksum.
<sub>βοΈ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>1 parent 4db4ac0 commit 6930085
File tree
21 files changed
+1167
-172
lines changed- csrc
- nv_internal
- cpp/kernels
- tensorrt_llm
- kernels
- thop
- flashinfer
- fused_moe
- include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export
- trtllm/gen
- tests/moe
21 files changed
+1167
-172
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
240 | 240 | | |
241 | 241 | | |
242 | 242 | | |
| 243 | + | |
243 | 244 | | |
244 | | - | |
245 | | - | |
| 245 | + | |
| 246 | + | |
246 | 247 | | |
247 | 248 | | |
248 | 249 | | |
249 | | - | |
| 250 | + | |
250 | 251 | | |
251 | 252 | | |
252 | 253 | | |
| |||
287 | 288 | | |
288 | 289 | | |
289 | 290 | | |
290 | | - | |
291 | | - | |
292 | | - | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
293 | 294 | | |
294 | 295 | | |
295 | 296 | | |
296 | 297 | | |
297 | 298 | | |
298 | 299 | | |
299 | | - | |
300 | | - | |
| 300 | + | |
| 301 | + | |
301 | 302 | | |
302 | 303 | | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
303 | 314 | | |
304 | 315 | | |
305 | 316 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
70 | | - | |
71 | | - | |
72 | | - | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
137 | 137 | | |
138 | 138 | | |
139 | 139 | | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
140 | 175 | | |
141 | 176 | | |
142 | 177 | | |
| |||
148 | 183 | | |
149 | 184 | | |
150 | 185 | | |
151 | | - | |
| 186 | + | |
| 187 | + | |
152 | 188 | | |
153 | 189 | | |
154 | 190 | | |
| |||
166 | 202 | | |
167 | 203 | | |
168 | 204 | | |
169 | | - | |
170 | | - | |
171 | | - | |
172 | | - | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
173 | 219 | | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
184 | | - | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | | - | |
189 | | - | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
190 | 227 | | |
191 | 228 | | |
192 | 229 | | |
| |||
0 commit comments