Skip to content

Commit c86e36f

Browse files
authored
[None][test] add deepseek and qwen cases for rtx series (#8839)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
1 parent c37924f commit c86e36f

File tree

2 files changed

+38
-0
lines changed

2 files changed

+38
-0
lines changed

tests/integration/defs/perf/pytorch_model_config.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,27 @@ def get_model_yaml_config(model_label: str,
119119
'enable_chunked_prefill': True,
120120
}
121121
},
122+
# Deepseek R1 model with CUTLASS backend
123+
{
124+
'patterns': [
125+
'deepseek_r1_nvfp4-bench-pytorch-streaming-float4-maxbs:512-maxnt:5220-input_output_len:4000,2000',
126+
],
127+
'config': {
128+
'enable_attention_dp': True,
129+
'moe_config': {
130+
'backend': 'CUTLASS',
131+
'max_num_tokens': 3072,
132+
},
133+
'kv_cache_config': {
134+
'dtype': 'fp8',
135+
'free_gpu_memory_fraction': 0.5,
136+
},
137+
'cuda_graph_config': {
138+
'enable_padding': True,
139+
'batch_sizes': [1, 2, 4, 8, 16, 32, 64],
140+
},
141+
}
142+
},
122143
# Deepseek_v3_lite_cases
123144
{
124145
'patterns':

tests/integration/test_lists/qa/llm_perf_core.yml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -397,10 +397,27 @@ llm_perf_core:
397397
- perf/test_perf.py::test_perf[llama_v3.3_nemotron_super_49b-bench-pytorch-bfloat16-input_output_len:128,128-tp:2-gpus:2]
398398
#deepseek_v3_lite
399399
- perf/test_perf.py::test_perf[deepseek_v3_lite_nvfp4-bench-pytorch-float4-input_output_len:128,128]
400+
- perf/test_perf.py::test_perf[deepseek_v3_lite_nvfp4-bench-pytorch-float4-maxbs:1-input_output_len:1000,2000-reqs:10-ep:4-tp:8-gpus:8]
401+
- perf/test_perf.py::test_perf[deepseek_v3_lite_nvfp4-bench-pytorch-float4-maxbs:384-maxnt:1536-input_output_len:1000,2000-reqs:10000-con:3072-ep:8-tp:8-gpus:8] TIMEOUT(120) #max throughput test
400402
- perf/test_perf.py::test_perf[deepseek_v3_lite_nvfp4-bench-pytorch-streaming-float4-input_output_len:128,128]
401403
- perf/test_perf.py::test_perf[deepseek_v3_lite_fp8-bench-pytorch-float8-input_output_len:128,128]
402404
#mixtral_8x7b_v0.1
403405
- perf/test_perf.py::test_perf[mixtral_8x7b_v0.1-bench-pytorch-float16-input_output_len:128,128-tp:2-gpus:2]
404406
- perf/test_perf.py::test_perf[mixtral_8x7b_v0.1_instruct_fp8-bench-pytorch-float8-input_output_len:128,128-tp:2-gpus:2]
405407
- perf/test_perf.py::test_perf[mixtral_8x7b_v0.1_instruct_fp4-bench-pytorch-float4-input_output_len:128,128-tp:2-gpus:2]
406408
- perf/test_perf.py::test_perf[mixtral_8x7b_v0.1_instruct_fp4-bench-pytorch-float4-input_output_len:128,128-kv_cache_dtype:fp8-tp:2-gpus:2]
409+
410+
- condition:
411+
ranges:
412+
system_gpu_count:
413+
gte: 8
414+
wildcards:
415+
gpu:
416+
- '*6000*'
417+
linux_distribution_name: '*'
418+
tests:
419+
- perf/test_perf.py::test_perf[qwen3_235b_a22b_fp4-bench-pytorch-float4-input_output_len:1000,2000-con:512-ep:4-gpus:4]
420+
- perf/test_perf.py::test_perf[qwen3_235b_a22b_fp4-bench-pytorch-float4-input_output_len:1000,2000-con:512-ep:8-tp:8-gpus:8]
421+
- perf/test_perf.py::test_perf[deepseek_r1_nvfp4-bench-pytorch-float4-maxbs:1-input_output_len:1000,2000-reqs:10-ep:4-tp:8-gpus:8] TIMEOUT(120)
422+
- perf/test_perf.py::test_perf[deepseek_r1_nvfp4-bench-pytorch-float4-maxbs:384-maxnt:1536-input_output_len:1000,2000-reqs:10000-con:3072-ep:8-tp:8-gpus:8] TIMEOUT(120) #max throughput test
423+
- perf/test_perf.py::test_perf[deepseek_r1_nvfp4-bench-pytorch-streaming-float4-maxbs:512-maxnt:5220-input_output_len:4000,2000-reqs:512-ep:8-tp:8-gpus:8]

0 commit comments

Comments
 (0)