Commit 8598421
authored
Much more efficient and clear weight initialization and tie weights (#42191)
* everything untilo informer
* everything until perceiver
* all of them finally
* style
* replace by transformers init everywhere
* use relative import instead
* deprecated models
* style
* start contexts
* small fixes
* fix modular
* remove class switch
* do not initialize tied weights
* typo
* fix
* improve
* improve comments
* improve
* improve
* fix zamba
* fix import
* add the post_init
* more post_init
* fix
* protect
* more post_init
* fix
* fixes
* fix
* fix
* switch flag name
* more fixes
* fixes
* fixes
* copies
* fix
* finally find the culprit
* style
* last small
* big bird
* better
* update init check
* final touch
* do it everywhere1 parent 16c7afd commit 8598421
File tree
416 files changed
+3341
-5443
lines changed- src/transformers
- generation
- models
- aimv2
- albert
- align
- altclip
- aria
- audio_spectrogram_transformer
- audioflamingo3
- autoformer
- bamba
- bark
- bart
- beit
- bert_generation
- bert
- big_bird
- bigbird_pegasus
- bit
- blenderbot_small
- blenderbot
- blip_2
- blip
- bloom
- bridgetower
- bros
- camembert
- canine
- chinese_clip
- clap
- clipseg
- clip
- clvp
- codegen
- colpali
- colqwen2
- conditional_detr
- convbert
- convnextv2
- convnext
- cpmant
- csm
- ctrl
- cvt
- d_fine
- dab_detr
- dac
- data2vec
- dbrx
- deberta_v2
- deberta
- decision_transformer
- deepseek_v2
- deepseek_v3
- deepseek_vl_hybrid
- deepseek_vl
- deformable_detr
- deit
- deprecated
- deta
- efficientformer
- ernie_m
- gptsan_japanese
- graphormer
- jukebox
- mctct
- mega
- nat
- nezha
- open_llama
- qdqbert
- realm
- retribert
- speech_to_text_2
- trajectory_transformer
- transfo_xl
- tvlt
- van
- vit_hybrid
- xlm_prophetnet
- depth_anything
- depth_pro
- detr
- diffllama
- dinat
- dinov2_with_registers
- dinov2
- dinov3_convnext
- dinov3_vit
- distilbert
- doge
- donut
- dots1
- dpr
- dpt
- edgetam_video
- edgetam
- efficientloftr
- efficientnet
- electra
- emu3
- encodec
- encoder_decoder
- eomt
- ernie4_5_moe
- ernie
- esm
- evolla
- falcon_h1
- falcon_mamba
- falcon
- fastspeech2_conformer
- flaubert
- flava
- flex_olmo
- fnet
- focalnet
- fsmt
- funnel
- fuyu
- gemma2
- gemma3n
- gemma3
- gemma
- git
- glm4_moe
- glm4v_moe
- glpn
- got_ocr2
- gpt2
- gpt_bigcode
- gpt_neox_japanese
- gpt_neo
- gpt_oss
- gptj
- granite_speech
- granitemoehybrid
- granitemoe
- grounding_dino
- groupvit
- hiera
- hubert
- hunyuan_v1_dense
- ibert
- idefics2
- idefics3
- idefics
- ijepa
- imagegpt
- informer
- instructblipvideo
- instructblip
- internvl
- jamba
- jetmoe
- kosmos2_5
- kosmos2
- kyutai_speech_to_text
- layoutlmv2
- layoutlmv3
- layoutlm
- led
- levit
- lilt
- llama4
- llava_next_video
- llava_next
- llava_onevision
- longcat_flash
- longformer
- longt5
- luke
- lxmert
- m2m_100
- mamba2
- mamba
- marian
- markuplm
- mask2former
- maskformer
- mbart
- megatron_bert
- metaclip_2
- mgp_str
- mimi
- minimax
- mixtral
- mlcd
- mllama
- mm_grounding_dino
- mobilebert
- mobilenet_v1
- mobilenet_v2
- mobilevitv2
- mobilevit
- modernbert_decoder
- modernbert
- moshi
- mpnet
- mpt
- mra
- mt5
- musicgen_melody
- musicgen
- mvp
- nemotron
- nllb_moe
- nystromformer
- omdet_turbo
- oneformer
- openai
- opt
- owlv2
- owlvit
- paligemma
- parakeet
- patchtsmixer
- patchtst
- pegasus_x
- pegasus
- perceiver
- persimmon
- phi4_multimodal
- phimoe
- pix2struct
- pixtral
- plbart
- poolformer
- pop2piano
- prophetnet
- pvt_v2
- pvt
- qwen2_audio
- qwen2_moe
- qwen3_moe
- qwen3_next
- qwen3_omni_moe
- qwen3_vl_moe
- recurrent_gemma
- reformer
- regnet
- rembert
- resnet
- roberta_prelayernorm
- roberta
- roc_bert
- roformer
- rt_detr_v2
- rt_detr
- rwkv
- sam2_video
- sam2
- sam_hq
- sam
- seamless_m4t_v2
- seamless_m4t
- segformer
- seggpt
- sew_d
- sew
- siglip2
- siglip
- smolvlm
- speech_encoder_decoder
- speech_to_text
- speecht5
- splinter
- squeezebert
- stablelm
- superglue
- superpoint
- swiftformer
- swin2sr
- swinv2
- swin
- switch_transformers
- t5gemma
- t5
- table_transformer
- tapas
- textnet
- time_series_transformer
- timesfm
- timesformer
- timm_wrapper
- trocr
- tvp
- udop
- umt5
- unispeech_sat
- unispeech
- univnet
- upernet
- vaultgemma
- video_llava
- videomae
- vilt
- vision_encoder_decoder
- vision_text_dual_encoder
- visual_bert
- vit_mae
- vit_msn
- vitdet
- vitmatte
- vitpose_backbone
- vitpose
- vits
- vit
- vivit
- vjepa2
- voxtral
- wav2vec2_bert
- wav2vec2_conformer
- wav2vec2
- wavlm
- whisper
- x_clip
- xcodec
- xglm
- xlm_roberta_xl
- xlm_roberta
- xlm
- xlnet
- xlstm
- xmod
- yolos
- yoso
- zamba2
- zamba
- zoedepth
- tests
- deepspeed
- models
- auto
- informer
- roformer
- trainer
- utils
- utils
- test_module
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
416 files changed
+3341
-5443
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
30 | 29 | | |
31 | 30 | | |
32 | 31 | | |
| |||
313 | 312 | | |
314 | 313 | | |
315 | 314 | | |
316 | | - | |
317 | | - | |
318 | | - | |
319 | | - | |
320 | | - | |
321 | | - | |
322 | | - | |
323 | | - | |
324 | | - | |
325 | | - | |
326 | | - | |
327 | | - | |
328 | | - | |
329 | | - | |
330 | | - | |
331 | | - | |
332 | | - | |
333 | | - | |
334 | | - | |
335 | | - | |
336 | | - | |
337 | | - | |
338 | | - | |
339 | | - | |
340 | | - | |
341 | | - | |
342 | | - | |
343 | | - | |
344 | | - | |
345 | | - | |
346 | | - | |
347 | | - | |
348 | | - | |
349 | | - | |
350 | | - | |
351 | | - | |
352 | | - | |
353 | | - | |
354 | | - | |
355 | | - | |
356 | | - | |
357 | | - | |
358 | | - | |
359 | | - | |
360 | | - | |
361 | | - | |
362 | | - | |
363 | | - | |
364 | | - | |
365 | | - | |
366 | | - | |
367 | | - | |
368 | | - | |
369 | | - | |
370 | | - | |
371 | | - | |
372 | | - | |
373 | | - | |
374 | | - | |
375 | | - | |
376 | | - | |
377 | | - | |
378 | | - | |
379 | | - | |
380 | | - | |
381 | | - | |
382 | | - | |
383 | | - | |
384 | | - | |
385 | | - | |
386 | | - | |
387 | | - | |
388 | | - | |
389 | | - | |
390 | | - | |
391 | | - | |
392 | | - | |
393 | | - | |
394 | | - | |
395 | | - | |
396 | | - | |
397 | | - | |
398 | | - | |
399 | | - | |
400 | | - | |
401 | | - | |
402 | | - | |
403 | | - | |
404 | | - | |
405 | | - | |
406 | | - | |
407 | | - | |
408 | | - | |
409 | | - | |
410 | | - | |
411 | | - | |
412 | | - | |
413 | | - | |
414 | | - | |
415 | | - | |
416 | | - | |
417 | | - | |
418 | | - | |
419 | | - | |
420 | | - | |
421 | | - | |
422 | | - | |
423 | | - | |
424 | | - | |
425 | | - | |
426 | | - | |
427 | | - | |
428 | | - | |
429 | | - | |
430 | 315 | | |
431 | 316 | | |
432 | 317 | | |
| |||
527 | 412 | | |
528 | 413 | | |
529 | 414 | | |
530 | | - | |
531 | 415 | | |
532 | 416 | | |
533 | 417 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| |||
387 | 388 | | |
388 | 389 | | |
389 | 390 | | |
390 | | - | |
| 391 | + | |
391 | 392 | | |
392 | 393 | | |
393 | 394 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
0 commit comments