removed redundancies from QEFFHybridCache #582

ochougul · 2025-10-02T20:38:33Z

Should improve perf for models that use sliding window like gemma, mistral etc.

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

quic-rishinr · 2025-11-04T05:46:20Z

QEfficient/transformers/cache_utils.py

            # Original Gather
            ctx_len = self.key_cache[layer_idx].shape[2]
            ctx_indices = torch.arange(ctx_len)[None, None, ...]
-            gather_limit = kv_position_ids.max(1, keepdim=True).values.unsqueeze(1)


if we are using position_ids it would go overboard for sliding window right?

removed redudancies from QEFFHybridCache

3729740

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

ochougul requested review from quic-amitraj, quic-hemagnih and quic-rishinr as code owners October 2, 2025 20:38

quic-rishinr reviewed Nov 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

removed redundancies from QEFFHybridCache #582

removed redundancies from QEFFHybridCache #582

ochougul commented Oct 2, 2025

Uh oh!

quic-rishinr Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

removed redundancies from QEFFHybridCache #582

Are you sure you want to change the base?

removed redundancies from QEFFHybridCache #582

Conversation

ochougul commented Oct 2, 2025

Uh oh!

quic-rishinr Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants