Skip to content

Commit c795abd

Browse files
authored
[vllm][benchmarks] Remove one memory allocation (#5340)
2 changes: 1. Remove one memory allocation that is not necessary. 2. Fix bug with result overwrite
1 parent a3f0372 commit c795abd

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

.github/workflows/third-party-benchmarks.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ jobs:
112112
113113
cd benchmarks/third_party/vllm
114114
FP8="1" python batched_moe_benchmark.py --reports $REPORTS
115-
python transform_results.py $REPORTS/moe-gemm-performance.csv $REPORTS/moe-gemm-report.csv --tag $TAG --benchmark moe-fp8-benchmark
115+
python transform_results.py $REPORTS/moe-gemm-performance.csv $REPORTS/moe-gemm-fp8-report.csv --tag $TAG --benchmark moe-fp8-benchmark
116116
117117
- name: Run Liger-Kernel benchmarks
118118
if: ${{ steps.install.outcome == 'success' && !cancelled() && (inputs.benchmarks == '' || contains(fromJson(inputs.benchmarks || '[]'), 'liger-kernel')) }}

benchmarks/third_party/vllm/batched_moe_benchmark.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
from vllm.model_executor.layers.fused_moe.utils import normalize_batched_scales_shape
2626

2727
# Import utility functions from vLLM tests
28-
from tests.kernels.moe.utils import make_quantized_test_activations, make_test_weights
28+
from tests.kernels.moe.utils import make_quantized_test_activations, make_test_weight
2929
from tests.kernels.quant_utils import native_batched_masked_quant_matmul
3030

3131

@@ -552,9 +552,9 @@ def benchmark(num_experts, max_tokens_per_expert, K, N, fp8, block_quant, provid
552552
)
553553

554554
# Create test weights (only need B matrix for batched MM)
555-
(B, B_q, B_scale, _), _ = make_test_weights(
555+
B, B_q, B_scale, _ = make_test_weight(
556556
num_experts,
557-
N // 2,
557+
N,
558558
K,
559559
in_dtype=act_dtype,
560560
quant_dtype=quant_dtype,

0 commit comments

Comments
 (0)