Skip to content

Commit 13271c6

Browse files
committed
Adding Compute-Context-Length(CCL)
Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com>
1 parent 5410733 commit 13271c6

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

examples/intern_example/ccl_internvl_inference.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -251,7 +251,7 @@ def run_intern_on_aic(
251251
# The Dual QPC approach splits the model to perform Image Encoding and Output generation in 2 different QPCs.
252252
# The outputs of the Vision Encoder are then passed to the Language model via host in this case.
253253

254-
kv_offload = False
254+
kv_offload = True
255255

256256
# InternVL is an Early-Fusion model that uses placeholder tokens within the input_ids to interleave text_embeddings with
257257
# Image embeddings and generate final input_embeds for outout generation. Hence we need very large prefill_seq_len (3840 in this case) to

0 commit comments

Comments
 (0)