File tree Expand file tree Collapse file tree 2 files changed +7
-6
lines changed Expand file tree Collapse file tree 2 files changed +7
-6
lines changed Original file line number Diff line number Diff line change @@ -1540,14 +1540,15 @@ bash -c 'python ./examples/mmlu.py --test_trt_llm \
15401540# # Run LLaMa-3.3 70B Model on PyTorch Backend
15411541This section provides the steps to run LLaMa-3.3 70B model FP8 precision on PyTorch backend by launching TensorRT LLM server and run performance benchmarks.
15421542
1543-
15441543# ## Prepare TensorRT LLM extra configs
15451544` ` ` bash
15461545cat > ./extra-llm-api-config.yml << EOF
1547- stream_interval: 2
1546+ stream_interval: 10
15481547cuda_graph_config:
15491548 max_batch_size: 1024
15501549 enable_padding: true
1550+ kv_cache_config:
1551+ dtype: fp8
15511552EOF
15521553` ` `
15531554Explanation:
@@ -1581,5 +1582,5 @@ python -m tensorrt_llm.serve.scripts.benchmark_serving \
15811582 --random-input-len 1024 \
15821583 --random-output-len 2048 \
15831584 --random-ids \
1584- --max-concurrency 1024 \
1585+ --max-concurrency 1024
15851586` ` `
Original file line number Diff line number Diff line change @@ -27,7 +27,7 @@ This section provides the steps to launch TensorRT LLM server and run performanc
2727``` bash
2828cat > ./extra-llm-api-config.yml << EOF
2929enable_attention_dp: true
30- stream_interval: 2
30+ stream_interval: 10
3131cuda_graph_config:
3232 max_batch_size: 512
3333 enable_padding: true
@@ -78,7 +78,7 @@ python -m tensorrt_llm.serve.scripts.benchmark_serving \
7878cat > ./extra-llm-api-config.yml << EOF
7979enable_attention_dp: false
8080enable_min_latency: true
81- stream_interval: 2
81+ stream_interval: 10
8282cuda_graph_config:
8383 max_batch_size: 8
8484 enable_padding: true
@@ -126,7 +126,7 @@ python -m tensorrt_llm.serve.scripts.benchmark_serving \
126126#### 1. Prepare TensorRT LLM extra configs
127127``` bash
128128cat > ./extra-llm-api-config.yml << EOF
129- stream_interval: 2
129+ stream_interval: 10
130130cuda_graph_config:
131131 max_batch_size: 1024
132132 enable_padding: true
You can’t perform that action at this time.
0 commit comments