-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Support Kimi Linear architecture models such as moonshotai/Kimi-Linear-48B-A3B-Instruct
Motivation
It a gud model, what can I say :)
It also preemptively adds support for an architecture and attention method that Moonshot devs have hinted at using in their next big model; see i.e. https://x.com/bigeagle_xd/status/1983911519541981247
Possible Implementation
Likely blocked for now by the work going on in #16095, as the token mixing mechanism used (Kimi Delta Attention) is a variant of the Gated Deltanet used in Qwen 3 Next. See also the technical report for more details on it
zeerd, MuXodious, finnegannn, eleius, ross-rosario and 14 more
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request