Skip to content

Commit 4b7a3aa

Browse files
authored
Add support for ERNIE-4.5 (#1354)
1 parent fc2847c commit 4b7a3aa

File tree

5 files changed

+17
-0
lines changed

5 files changed

+17
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -323,6 +323,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
323323
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://huggingface.co/papers/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
324324
1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://huggingface.co/papers/1905.11946) by Mingxing Tan, Quoc V. Le.
325325
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://huggingface.co/papers/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
326+
1. **ERNIE-4.5** (from Baidu ERNIE Team) released with the blog post [Announcing the Open Source Release of the ERNIE 4.5 Model Family](https://ernie.baidu.com/blog/posts/ernie4.5/) by the Baidu ERNIE Team.
326327
1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2 and ESMFold** were released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
327328
1. **EXAONE** (from LG AI Research) released with the papers [EXAONE 3.0 7.8B Instruction Tuned Language Model](https://huggingface.co/papers/2408.03541) and [EXAONE 3.5: Series of Large Language Models for Real-world Use Cases](https://huggingface.co/papers/2412.04862) by the LG AI Research team.
328329
1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme.

docs/snippets/6_supported-models.snippet

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://huggingface.co/papers/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
3838
1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://huggingface.co/papers/1905.11946) by Mingxing Tan, Quoc V. Le.
3939
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://huggingface.co/papers/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
40+
1. **ERNIE-4.5** (from Baidu ERNIE Team) released with the blog post [Announcing the Open Source Release of the ERNIE 4.5 Model Family](https://ernie.baidu.com/blog/posts/ernie4.5/) by the Baidu ERNIE Team.
4041
1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2 and ESMFold** were released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
4142
1. **EXAONE** (from LG AI Research) released with the papers [EXAONE 3.0 7.8B Instruction Tuned Language Model](https://huggingface.co/papers/2408.03541) and [EXAONE 3.5: Series of Large Language Models for Real-world Use Cases](https://huggingface.co/papers/2412.04862) by the LG AI Research team.
4243
1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme.

src/configs.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,7 @@ function getNormalizedConfig(config) {
134134
case 'gemma3n_text':
135135
case 'glm':
136136
case 'helium':
137+
case 'ernie4_5':
137138
mapping['num_heads'] = 'num_key_value_heads';
138139
mapping['num_layers'] = 'num_hidden_layers';
139140
mapping['dim_kv'] = 'head_dim';

src/models.js

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6732,6 +6732,15 @@ export class MistralModel extends MistralPreTrainedModel { }
67326732
export class MistralForCausalLM extends MistralPreTrainedModel { }
67336733
//////////////////////////////////////////////////
67346734

6735+
//////////////////////////////////////////////////
6736+
// ERNIE-4.5 models
6737+
export class Ernie4_5_PretrainedModel extends PreTrainedModel { }
6738+
6739+
export class Ernie4_5_Model extends Ernie4_5_PretrainedModel { }
6740+
6741+
export class Ernie4_5_ForCausalLM extends Ernie4_5_PretrainedModel { }
6742+
//////////////////////////////////////////////////
6743+
67356744

67366745
//////////////////////////////////////////////////
67376746
// Starcoder2 models
@@ -7806,6 +7815,7 @@ const MODEL_MAPPING_NAMES_DECODER_ONLY = new Map([
78067815
['mpt', ['MptModel', MptModel]],
78077816
['opt', ['OPTModel', OPTModel]],
78087817
['mistral', ['MistralModel', MistralModel]],
7818+
['ernie4_5', ['Ernie4_5_Model', Ernie4_5_Model]],
78097819
['starcoder2', ['Starcoder2Model', Starcoder2Model]],
78107820
['falcon', ['FalconModel', FalconModel]],
78117821
['stablelm', ['StableLmModel', StableLmModel]],
@@ -7910,6 +7920,7 @@ const MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = new Map([
79107920
['opt', ['OPTForCausalLM', OPTForCausalLM]],
79117921
['mbart', ['MBartForCausalLM', MBartForCausalLM]],
79127922
['mistral', ['MistralForCausalLM', MistralForCausalLM]],
7923+
['ernie4_5', ['Ernie4_5_ForCausalLM', Ernie4_5_ForCausalLM]],
79137924
['starcoder2', ['Starcoder2ForCausalLM', Starcoder2ForCausalLM]],
79147925
['falcon', ['FalconForCausalLM', FalconForCausalLM]],
79157926
['trocr', ['TrOCRForCausalLM', TrOCRForCausalLM]],

src/tokenizers.js

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4323,6 +4323,8 @@ export class CohereTokenizer extends PreTrainedTokenizer { }
43234323

43244324
export class MgpstrTokenizer extends PreTrainedTokenizer { }
43254325

4326+
export class Ernie4_5_Tokenizer extends PreTrainedTokenizer { }
4327+
43264328
/**
43274329
* Helper class which is used to instantiate pretrained tokenizers with the `from_pretrained` function.
43284330
* The chosen tokenizer class is determined by the type specified in the tokenizer config.
@@ -4377,6 +4379,7 @@ export class AutoTokenizer {
43774379
Grok1Tokenizer,
43784380
CohereTokenizer,
43794381
MgpstrTokenizer,
4382+
Ernie4_5_Tokenizer,
43804383

43814384
// Base case:
43824385
PreTrainedTokenizer,

0 commit comments

Comments
 (0)