Added support for DeepseekV2 model #382

aditya-29 · 2024-07-26T06:53:13Z

This pull request introduces the capability to merge DeepSeekV2 Mixture-of-Experts (MoE) models using MergeKit. To facilitate this, a deepseekv2.json configuration file has been added to the architecture directory. Additionally, a custom class analogous to Mixtral has been implemented to enable model merging based on the JSON configuration.

metric-space

Apologies for the wait

I think this is a good first stab but will need a bit more work to push it over the finish line.

Please see the comments and once you test it on your end it would be beneficial for both parties if you could mention how you tested it. It could be as simple as a yaml file for linear merging where you use the DeepseekV2

You've probably noticed the failing formatting check, so just be explicit please do address those

metric-space · 2024-08-27T05:38:48Z

mergekit/_data/architectures/deepseekv2.json

+    "layer_templates": {
+        "weights": [
+            {
+                "name" : "model.layers.${layer_index}.self_attn.q_proj.weight"


Unless I'm mistaken I don't think this weight exists in the model. Please see here: https://huggingface.co/deepseek-ai/DeepSeek-V2/raw/main/model.safetensors.index.json

"model.layers.0.self_attn.q_a_proj.weight": "model-00001-of-000055.safetensors", "model.layers.0.self_attn.q_a_layernorm.weight": "model-00001-of-000055.safetensors", "model.layers.0.self_attn.q_b_proj.weight": "model-00001-of-000055.safetensors",

metric-space · 2024-08-27T05:43:17Z

mergekit/architecture.py

+    def layer_weights(
+            self, index: int, config: PretrainedConfig
+    ) -> Optional[List[WeightInfo]]:
+        num_experts = self.num_local_experts


Nit: Any reason for local aliasing? I see it is also present in the Mixtral code, but any reason you left it in?

mergekit/architecture.py

metric-space · 2024-08-27T05:48:16Z

mergekit/architecture.py

+DEEPSEEKV2_INFO = _load_json_arch("deepseekv2.json")


 def get_architecture_info(config: PretrainedConfig) -> ArchitectureInfo:


Unless I am mistaken which I could be, there needs to be a clause within the get_architecture_info function similar to Mixtral's otherwise the code https://github.com/arcee-ai/mergekit/blob/main/mergekit/merge.py#L51 will just pull in just the info associated with the template

shamanez · 2024-08-28T18:57:23Z

@aditya-29 can you please respond :) sorry for the late reply.

aditya-29 · 2024-08-28T19:03:05Z

Thanks @metric-space and @shamanez. I didn't get a chance to go over the comments earlier. I will work on the suggested changes and reach out to you for any clarification

shamanez · 2024-08-28T21:16:25Z

Thanks a lot mate.

ehartford · 2025-01-24T18:59:18Z

@cg123 is there plan to support DeepseekV2 and DeepseekV3?

aditya-29 added 2 commits July 25, 2024 23:10

Added support to DeepseekV2

1b33871

Added support to DeepseekV2

df66dc7

metric-space self-requested a review August 16, 2024 23:23

metric-space suggested changes Aug 27, 2024

View reviewed changes

cg123 force-pushed the main branch from 1e18f70 to 86c30b6 Compare February 5, 2025 23:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added support for DeepseekV2 model #382

Added support for DeepseekV2 model #382

Uh oh!

aditya-29 commented Jul 26, 2024

Uh oh!

metric-space left a comment •

edited

Loading

Uh oh!

metric-space Aug 27, 2024

Uh oh!

metric-space Aug 27, 2024

Uh oh!

Uh oh!

metric-space Aug 27, 2024

Uh oh!

shamanez commented Aug 28, 2024

Uh oh!

aditya-29 commented Aug 28, 2024

Uh oh!

shamanez commented Aug 28, 2024 via email

Uh oh!

ehartford commented Jan 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		DEEPSEEKV2_INFO = _load_json_arch("deepseekv2.json")


		def get_architecture_info(config: PretrainedConfig) -> ArchitectureInfo:

Added support for DeepseekV2 model #382

Are you sure you want to change the base?

Added support for DeepseekV2 model #382

Uh oh!

Conversation

aditya-29 commented Jul 26, 2024

Uh oh!

metric-space left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

metric-space Aug 27, 2024

Choose a reason for hiding this comment

Uh oh!

metric-space Aug 27, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

metric-space Aug 27, 2024

Choose a reason for hiding this comment

Uh oh!

shamanez commented Aug 28, 2024

Uh oh!

aditya-29 commented Aug 28, 2024

Uh oh!

shamanez commented Aug 28, 2024 via email

Uh oh!

ehartford commented Jan 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

metric-space left a comment •

edited

Loading