Skip to content

Commit f6f6e1f

Browse files
authored
[#9102][feat] AutoDeploy: Support fp8 kv cache (#9107)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
1 parent c6cce39 commit f6f6e1f

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,7 @@ def _get_conv_cache(si: SequenceInfo):
284284
in_channels,
285285
max(1, kernel_size - 1),
286286
device=si.device,
287-
dtype=cache_config.dtype or inp_fake.dtype,
287+
dtype=inp_fake.dtype,
288288
)
289289

290290
return {"conv_state_cache": _get_conv_cache}

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_causal_conv.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -342,7 +342,7 @@ def _get_conv_cache(si: SequenceInfo):
342342
in_channels,
343343
kernel_size,
344344
device=si.device,
345-
dtype=cache_config.dtype or inp_fake.dtype,
345+
dtype=inp_fake.dtype,
346346
)
347347

348348
return {"conv_state_cache": _get_conv_cache}

0 commit comments

Comments
 (0)