Skip to content

Conversation

@leonling-ll
Copy link
Contributor

@leonling-ll leonling-ll commented Mar 31, 2025

The initial enabling for sglang benchmarks.
Include sglang prefill/decode/extended attention and fp8 quant gemm into third-party benchmark.

@leonling-ll leonling-ll self-assigned this Mar 31, 2025
@leonling-ll leonling-ll requested a review from vlad-penkin March 31, 2025 05:56
@leonling-ll leonling-ll force-pushed the liyang/init_sglang_benchmark branch 3 times, most recently from b43b5e2 to bbd10a8 Compare April 9, 2025 02:34
@leonling-ll leonling-ll force-pushed the liyang/init_sglang_benchmark branch 2 times, most recently from 34e464a to 29e2711 Compare April 10, 2025 06:07
@leonling-ll leonling-ll marked this pull request as ready for review April 10, 2025 07:47
@leonling-ll leonling-ll force-pushed the liyang/init_sglang_benchmark branch from 29e2711 to 4b7241c Compare April 11, 2025 05:33
@leonling-ll leonling-ll force-pushed the liyang/init_sglang_benchmark branch 2 times, most recently from 1296327 to 7d4d837 Compare April 14, 2025 08:15
@leonling-ll
Copy link
Contributor Author

Benchmark is still blocked by #3748, #3749.
Let's merge this PR when the blocking issues are resolved with the new agama release, expect in late April.

@leonling-ll leonling-ll force-pushed the liyang/init_sglang_benchmark branch from 9651ebe to e609f5d Compare April 15, 2025 08:14
@etiotto etiotto marked this pull request as draft April 17, 2025 14:27
@etiotto
Copy link
Contributor

etiotto commented Apr 24, 2025

@LiyangLingIntel the test are passing. Is this PR still going to wait on #3748 and #3749 ?

@leonling-ll
Copy link
Contributor Author

@LiyangLingIntel the test are passing. Is this PR still going to wait on #3748 and #3749 ?

@etiotto Yes, these 2 depend on the new agama release.
The target workflow is Triton Third-party benchmark, it is not included in CI and scheduled once per day.

@leonling-ll leonling-ll force-pushed the liyang/init_sglang_benchmark branch 2 times, most recently from 48b96ec to 458e06d Compare May 23, 2025 12:53
@leonling-ll
Copy link
Contributor Author

leonling-ll commented May 26, 2025

@whitneywhtsang
Copy link
Contributor

@Egor-Krivov @vlad-penkin Please take another look.

@leonling-ll leonling-ll force-pushed the liyang/init_sglang_benchmark branch 2 times, most recently from 10da795 to ead512a Compare May 29, 2025 07:18
@leonling-ll leonling-ll changed the title [benchmarks][ci] Initial integration of sglang kernels to benchmarks [SGLANG] [Benchmarks] Initial integration of sglang kernels to benchmarks Jun 3, 2025
@leonling-ll leonling-ll requested a review from vlad-penkin June 3, 2025 07:06
@leonling-ll
Copy link
Contributor Author

@vlad-penkin If there is no more mature plan for integrating third-party benchmarks in short time, what do you think of merging this PR for now? It's been in a good shape and uses the same logic as existing Liger-Kernel benchmarks.
We can leave the refactoring or restructuring in separate PRs.

@whitneywhtsang
Copy link
Contributor

@vlad-penkin If there is no more mature plan for integrating third-party benchmarks in short time, what do you think of merging this PR for now? It's been in a good shape and uses the same logic as existing Liger-Kernel benchmarks. We can leave the refactoring or restructuring in separate PRs.

I agree with merging this PR first, and have a separate PR for refactoring and restructuring, so that we can start testing sglang benchmark. @vlad-penkin WDYT?

@whitneywhtsang
Copy link
Contributor

@vlad-penkin ping.

leonling-ll and others added 3 commits July 22, 2025 05:42
Port prefill attn and decode attn from sglang

Add validation

temp add extend attention

disable debug ir dump

Update three stage attention benchmark

Add sglang kernel benchmark to action

use 1e-3 atol

remove sglang benchmark from triton-benchmarks

Fix setup bdist_wheel

Add sglang to thirdparty test

Address review comments

Remove sglang from tests

Fix CI

Address review comments

Integrate sglang prefill/decode/extend kernel to benchmarks

Port prefill attn and decode attn from sglang

Add validation

temp add extend attention

disable debug ir dump

Update three stage attention benchmark

Add sglang kernel benchmark to action

use 1e-3 atol

remove sglang benchmark from triton-benchmarks

Fix setup bdist_wheel

Add sglang to thirdparty test

Address review comments

Remove sglang from tests

Adjust params term

Adjust tflops computation
fix bugs

rtol

atol

Move fp8 gemm to sglang benchmark
Address review comments

Fix CI XPU not found
@leonling-ll leonling-ll force-pushed the liyang/init_sglang_benchmark branch from e1cb11b to a7a69e2 Compare July 22, 2025 09:10
@leonling-ll leonling-ll force-pushed the liyang/init_sglang_benchmark branch from a7a69e2 to c61d0ea Compare July 23, 2025 01:27
@airMeng
Copy link
Contributor

airMeng commented Jul 23, 2025

@vlad-penkin anything concerned for merging this PR? The PR is critical for us to maintain SGLang.

cc @mingfeima

@whitneywhtsang
Copy link
Contributor

@Egor-Krivov please check if anything left in this PR that should be merged.

@Egor-Krivov
Copy link
Contributor

Egor-Krivov commented Nov 11, 2025

@whitneywhtsang
I think there are interesting benchmarks, although I don't understand who "owns" them. Do you have any idea about the business source of these benchmarks?

@whitneywhtsang
Copy link
Contributor

According to the comment, should be from SGLang.

Block FP8 Gemm benchmark
============================
This benchmark is come from SGLang kernels.
https://github.com/sgl-project/sglang/blob/07f944631e747d7489fde1f11de93e503afa90ba/python/sglang/srt/layers/quantization/fp8_kernel.py#L375

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

8 participants