Skip to content

Conversation

@Wanli-Jiang
Copy link
Collaborator

Features:

This PR stacked the following PRs on top of TRTLLM main.

CMD to launch trtllm-serve for bench:

trtllm-serve <ckpt_folder_path> \
--host 0.0.0.0 \
--port 8000 \
--backend _autodeploy \
--trust_remote_code \
--extra_llm_api_options examples/auto_deploy/nano_v3_bench.yaml

2ez4bz and others added 6 commits November 10, 2025 19:07
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

Add conv act fusion

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

fix unit tests

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

fix tests

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

Address reviewer's comments

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

fix typo

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>

Fixes and UT

Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>

Use trtllm moe for relu2 mlp case

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

Fix the runGemmProfile

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

Replace the FP8 fused MoE backend

Before: torch.ops.auto_deploy.triton_quant_fp8_moe
After: torch.ops.auto_deploy.trtllm_quant_fp8moe_fused
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>

Code refactoring

Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>

syntax error fixes

Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>

remove dead code

Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>

fix moe operator function name

Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>

Add skips if not hopper+

Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>

remove unused code

Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
@Wanli-Jiang Wanli-Jiang force-pushed the user/williamj/nano-v3-fp8-stack branch from 49a0c40 to 7664abb Compare November 11, 2025 06:14
@lucaslie
Copy link
Member

fyi, just merged #8812 --> so you can drop the PR from the list next time you rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants