We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 5410733 commit 13271c6Copy full SHA for 13271c6
examples/intern_example/ccl_internvl_inference.py
@@ -251,7 +251,7 @@ def run_intern_on_aic(
251
# The Dual QPC approach splits the model to perform Image Encoding and Output generation in 2 different QPCs.
252
# The outputs of the Vision Encoder are then passed to the Language model via host in this case.
253
254
- kv_offload = False
+ kv_offload = True
255
256
# InternVL is an Early-Fusion model that uses placeholder tokens within the input_ids to interleave text_embeddings with
257
# Image embeddings and generate final input_embeds for outout generation. Hence we need very large prefill_seq_len (3840 in this case) to
0 commit comments