Skip to content

Feature Request: Kimi Linear model (Kimi Delta Attention) #16930

@fizzAI

Description

@fizzAI

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Support Kimi Linear architecture models such as moonshotai/Kimi-Linear-48B-A3B-Instruct

Motivation

It a gud model, what can I say :)
It also preemptively adds support for an architecture and attention method that Moonshot devs have hinted at using in their next big model; see i.e. https://x.com/bigeagle_xd/status/1983911519541981247

Possible Implementation

Likely blocked for now by the work going on in #16095, as the token mixing mechanism used (Kimi Delta Attention) is a variant of the Gated Deltanet used in Qwen 3 Next. See also the technical report for more details on it

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions