Commit fdc8338
authored
misc: Various Updates to Attention Microbenchmark Suite (#1891)
<!-- .github/pull_request_template.md -->
## 📌 Description
Current PR brings a host of updates to the the attention microbenchmark
suites in `flashinfer_benchmark.py`
* `testBatchPrefillWithPagedKVCacheWrapper`:
* `trtllm-gen-native` that calls
`flashinfer.prefill.trtllm_batch_context_with_kv_cache` is added as a
backend. Disabled for batch size 1 due to various errors. An issue will
be filed to track the error.
* `trtllm-gen` and `trtllm-gen-native` backends can now be benchmarked
for FP8
* `trtllm-gen` and `trtllm-gen-native` are now disabled for
`causal=False`. Previous behavior was silently ignoring the flag and
running `causal=True`
* `testBatchPrefillWithRaggedKVCacheWrapper`:
* `trtllm-gen-native` that calls
`flashinfer.prefill.trtllm_ragged_attention_deepseek` is added as a
backend. Disabled for batch size 1 due to various errors. An issue will
be filed to track the error.
* `testBatchMLAPagedAttentionWrapper`:
* `cutlass` backend has been added as a backend that can be benchmarked
* Misc minor fixes such as correct refcheck failure messages
Examples:
```
# python3 flashinfer_benchmark.py --routine BatchMLAPagedAttentionWrapper --backends trtllm-gen-native fa2 cutlass --page_size 32 --batch_size 16 --s_qo 1 --s_kv 8192 --num_qo_heads 128 --num_kv_heads 128 --head_dim_ckv 512 --head_dim_kpe 64 --random_actual_seq_len --refcheck --q_dtype bfloat16 --kv_dtype bfloat16
[PERF] trtllm-gen-nati:: median time 0.031 ms; std 0.000 ms; achieved tflops 553.684 TFLOPs/sec; achieved tb_per_sec 4.960 TB/sec
[PERF] fa2 :: median time 0.091 ms; std 0.001 ms; achieved tflops 190.364 TFLOPs/sec; achieved tb_per_sec 1.705 TB/sec
[PERF] cutlass :: median time 0.221 ms; std 0.000 ms; achieved tflops 78.342 TFLOPs/sec; achieved tb_per_sec 0.702 TB/sec
# python3 flashinfer_benchmark.py --routine BatchPrefillWithPagedKVCacheWrapper --backends fa2 cudnn trtllm-gen trtllm-gen-native --page_size 16 --batch_size 16 --s_qo 8192 --s_kv 8192 --num_qo_heads 64 --num_kv_heads 8 --head_dim_qk 128 --head_dim_vo 128 --random_actual_seq_len --causal --refcheck --q_dtype bfloat16 --kv_dtype bfloat16
[PERF] fa2 :: median time 17.342 ms; std 0.011 ms; achieved tflops 397.579 TFLOPs/sec; achieved tb_per_sec 0.161 TB/sec
[PERF] cudnn :: median time 6.230 ms; std 0.032 ms; achieved tflops 1106.685 TFLOPs/sec; achieved tb_per_sec 0.449 TB/sec
[PERF] trtllm-gen :: median time 7.181 ms; std 0.040 ms; achieved tflops 960.135 TFLOPs/sec; achieved tb_per_sec 0.390 TB/sec
[PERF] trtllm-gen-nati:: median time 6.453 ms; std 0.012 ms; achieved tflops 1068.434 TFLOPs/sec; achieved tb_per_sec 0.434 TB/sec
# python3 flashinfer_benchmark.py --routine BatchPrefillWithRaggedKVCacheWrapper --backends fa2 cutlass cudnn trtllm-gen-native --batch_size 16 --s_qo 8192 --s_kv 8192 --num_qo_heads 128 --num_kv_heads 128 --head_dim_qk 192 --head_dim_vo 128 --random_actual_seq_len --refcheck --causal --q_dtype bfloat16 --kv_dtype bfloat16
[PERF] fa2 :: median time 39.797 ms; std 0.023 ms; achieved tflops 433.137 TFLOPs/sec; achieved tb_per_sec 0.312 TB/sec
[PERF] cutlass :: median time 18.509 ms; std 0.348 ms; achieved tflops 931.281 TFLOPs/sec; achieved tb_per_sec 0.672 TB/sec
[PERF] cudnn :: median time 14.778 ms; std 0.336 ms; achieved tflops 1166.391 TFLOPs/sec; achieved tb_per_sec 0.841 TB/sec
[PERF] trtllm-gen-nati:: median time 14.339 ms; std 0.291 ms; achieved tflops 1202.155 TFLOPs/sec; achieved tb_per_sec 0.867 TB/sec
```
**No changes to library code**
<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->
## 🔍 Related Issues
<!-- Link any related issues here -->
## 🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### ✅ Pre-commit Checks
- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## 🧪 Tests
- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->1 parent d910f9a commit fdc8338
File tree
3 files changed
+125
-36
lines changed- benchmarks
- routines
3 files changed
+125
-36
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
217 | 217 | | |
218 | 218 | | |
219 | 219 | | |
220 | | - | |
221 | | - | |
222 | | - | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
223 | 223 | | |
224 | 224 | | |
225 | 225 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
545 | 545 | | |
546 | 546 | | |
547 | 547 | | |
548 | | - | |
| 548 | + | |
549 | 549 | | |
550 | 550 | | |
551 | 551 | | |
| |||
689 | 689 | | |
690 | 690 | | |
691 | 691 | | |
692 | | - | |
693 | | - | |
694 | | - | |
695 | | - | |
696 | | - | |
| 692 | + | |
| 693 | + | |
697 | 694 | | |
698 | 695 | | |
699 | 696 | | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
700 | 708 | | |
701 | 709 | | |
702 | 710 | | |
| |||
1006 | 1014 | | |
1007 | 1015 | | |
1008 | 1016 | | |
1009 | | - | |
| 1017 | + | |
1010 | 1018 | | |
1011 | 1019 | | |
1012 | 1020 | | |
| |||
1129 | 1137 | | |
1130 | 1138 | | |
1131 | 1139 | | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
| 1145 | + | |
| 1146 | + | |
1132 | 1147 | | |
1133 | 1148 | | |
1134 | 1149 | | |
| |||
1161 | 1176 | | |
1162 | 1177 | | |
1163 | 1178 | | |
| 1179 | + | |
| 1180 | + | |
| 1181 | + | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
1164 | 1198 | | |
1165 | 1199 | | |
1166 | 1200 | | |
| |||
1372 | 1406 | | |
1373 | 1407 | | |
1374 | 1408 | | |
| 1409 | + | |
| 1410 | + | |
| 1411 | + | |
| 1412 | + | |
| 1413 | + | |
| 1414 | + | |
| 1415 | + | |
| 1416 | + | |
| 1417 | + | |
| 1418 | + | |
| 1419 | + | |
| 1420 | + | |
| 1421 | + | |
| 1422 | + | |
| 1423 | + | |
| 1424 | + | |
| 1425 | + | |
| 1426 | + | |
| 1427 | + | |
| 1428 | + | |
1375 | 1429 | | |
1376 | 1430 | | |
1377 | 1431 | | |
| |||
1416 | 1470 | | |
1417 | 1471 | | |
1418 | 1472 | | |
1419 | | - | |
| 1473 | + | |
1420 | 1474 | | |
1421 | 1475 | | |
1422 | 1476 | | |
| |||
1484 | 1538 | | |
1485 | 1539 | | |
1486 | 1540 | | |
1487 | | - | |
| 1541 | + | |
1488 | 1542 | | |
1489 | 1543 | | |
1490 | 1544 | | |
| |||
1565 | 1619 | | |
1566 | 1620 | | |
1567 | 1621 | | |
| 1622 | + | |
| 1623 | + | |
| 1624 | + | |
| 1625 | + | |
| 1626 | + | |
| 1627 | + | |
| 1628 | + | |
| 1629 | + | |
| 1630 | + | |
| 1631 | + | |
| 1632 | + | |
| 1633 | + | |
| 1634 | + | |
| 1635 | + | |
| 1636 | + | |
| 1637 | + | |
| 1638 | + | |
| 1639 | + | |
| 1640 | + | |
| 1641 | + | |
| 1642 | + | |
| 1643 | + | |
| 1644 | + | |
| 1645 | + | |
1568 | 1646 | | |
1569 | 1647 | | |
1570 | 1648 | | |
| |||
1629 | 1707 | | |
1630 | 1708 | | |
1631 | 1709 | | |
1632 | | - | |
| 1710 | + | |
1633 | 1711 | | |
1634 | 1712 | | |
1635 | 1713 | | |
| |||
1657 | 1735 | | |
1658 | 1736 | | |
1659 | 1737 | | |
1660 | | - | |
| 1738 | + | |
1661 | 1739 | | |
1662 | 1740 | | |
1663 | 1741 | | |
| |||
1674 | 1752 | | |
1675 | 1753 | | |
1676 | 1754 | | |
1677 | | - | |
| 1755 | + | |
1678 | 1756 | | |
1679 | 1757 | | |
1680 | 1758 | | |
| |||
1684 | 1762 | | |
1685 | 1763 | | |
1686 | 1764 | | |
1687 | | - | |
1688 | | - | |
1689 | | - | |
1690 | | - | |
1691 | | - | |
1692 | | - | |
1693 | | - | |
1694 | | - | |
1695 | | - | |
1696 | | - | |
1697 | | - | |
1698 | | - | |
1699 | | - | |
1700 | | - | |
| 1765 | + | |
| 1766 | + | |
| 1767 | + | |
| 1768 | + | |
| 1769 | + | |
| 1770 | + | |
| 1771 | + | |
| 1772 | + | |
| 1773 | + | |
| 1774 | + | |
| 1775 | + | |
| 1776 | + | |
| 1777 | + | |
| 1778 | + | |
| 1779 | + | |
1701 | 1780 | | |
1702 | 1781 | | |
1703 | 1782 | | |
| |||
1713 | 1792 | | |
1714 | 1793 | | |
1715 | 1794 | | |
| 1795 | + | |
| 1796 | + | |
| 1797 | + | |
| 1798 | + | |
| 1799 | + | |
| 1800 | + | |
| 1801 | + | |
| 1802 | + | |
| 1803 | + | |
| 1804 | + | |
1716 | 1805 | | |
1717 | 1806 | | |
1718 | 1807 | | |
| |||
1767 | 1856 | | |
1768 | 1857 | | |
1769 | 1858 | | |
1770 | | - | |
| 1859 | + | |
1771 | 1860 | | |
1772 | 1861 | | |
1773 | 1862 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
177 | 177 | | |
178 | 178 | | |
179 | 179 | | |
180 | | - | |
181 | | - | |
| 180 | + | |
| 181 | + | |
182 | 182 | | |
183 | 183 | | |
184 | 184 | | |
| |||
187 | 187 | | |
188 | 188 | | |
189 | 189 | | |
190 | | - | |
191 | | - | |
| 190 | + | |
| 191 | + | |
192 | 192 | | |
193 | 193 | | |
194 | 194 | | |
| |||
197 | 197 | | |
198 | 198 | | |
199 | 199 | | |
200 | | - | |
201 | | - | |
| 200 | + | |
| 201 | + | |
202 | 202 | | |
203 | 203 | | |
204 | 204 | | |
| |||
0 commit comments