Skip to content

Commit de6088e

Browse files
authored
[None][doc] update llama and llama4 example doc (#9048)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
1 parent 0b9bc5a commit de6088e

File tree

2 files changed

+7
-6
lines changed

2 files changed

+7
-6
lines changed

examples/models/core/llama/README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1540,14 +1540,15 @@ bash -c 'python ./examples/mmlu.py --test_trt_llm \
15401540
## Run LLaMa-3.3 70B Model on PyTorch Backend
15411541
This section provides the steps to run LLaMa-3.3 70B model FP8 precision on PyTorch backend by launching TensorRT LLM server and run performance benchmarks.
15421542

1543-
15441543
### Prepare TensorRT LLM extra configs
15451544
```bash
15461545
cat >./extra-llm-api-config.yml <<EOF
1547-
stream_interval: 2
1546+
stream_interval: 10
15481547
cuda_graph_config:
15491548
max_batch_size: 1024
15501549
enable_padding: true
1550+
kv_cache_config:
1551+
dtype: fp8
15511552
EOF
15521553
```
15531554
Explanation:
@@ -1581,5 +1582,5 @@ python -m tensorrt_llm.serve.scripts.benchmark_serving \
15811582
--random-input-len 1024 \
15821583
--random-output-len 2048 \
15831584
--random-ids \
1584-
--max-concurrency 1024 \
1585+
--max-concurrency 1024
15851586
```

examples/models/core/llama4/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ This section provides the steps to launch TensorRT LLM server and run performanc
2727
```bash
2828
cat >./extra-llm-api-config.yml <<EOF
2929
enable_attention_dp: true
30-
stream_interval: 2
30+
stream_interval: 10
3131
cuda_graph_config:
3232
max_batch_size: 512
3333
enable_padding: true
@@ -78,7 +78,7 @@ python -m tensorrt_llm.serve.scripts.benchmark_serving \
7878
cat >./extra-llm-api-config.yml <<EOF
7979
enable_attention_dp: false
8080
enable_min_latency: true
81-
stream_interval: 2
81+
stream_interval: 10
8282
cuda_graph_config:
8383
max_batch_size: 8
8484
enable_padding: true
@@ -126,7 +126,7 @@ python -m tensorrt_llm.serve.scripts.benchmark_serving \
126126
#### 1. Prepare TensorRT LLM extra configs
127127
```bash
128128
cat >./extra-llm-api-config.yml <<EOF
129-
stream_interval: 2
129+
stream_interval: 10
130130
cuda_graph_config:
131131
max_batch_size: 1024
132132
enable_padding: true

0 commit comments

Comments
 (0)