-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Refactor weight loading #41580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Refactor weight loading #41580
Changes from all commits
Commits
Show all changes
387 commits
Select commit
Hold shift + click to select a range
4d79709
ah actually we don't discard lm head if missing -> needs to be moved …
ArthurZucker d1e84db
fix some tests
ArthurZucker f2938df
small fixes
ArthurZucker 22fcdaf
up
ArthurZucker 7d78aa1
up
ArthurZucker 80517f5
dik why we tie weights twice but,..,,.
ArthurZucker 2ff8532
ups
ArthurZucker d923061
removeunused
ArthurZucker ce8c1c1
fix hunyuan
ArthurZucker 23e3ed7
small fix
ArthurZucker a8fb554
nits
ArthurZucker ab6ee8a
ish
ArthurZucker 77ccbb1
up
ArthurZucker 8a8beff
rev
ArthurZucker 02386ce
fix more tie weights keys
ArthurZucker 1c87945
small fixes
ArthurZucker 00b95ee
nit
ArthurZucker a170f29
update
ArthurZucker 8b924a3
fix and fix
ArthurZucker 8f7b1d0
fix a test
ArthurZucker 9386217
glubs
ArthurZucker 4894a25
current shitty changes
ArthurZucker da7dc10
ship validated ones
ArthurZucker d7c8171
more
ArthurZucker e088408
more update
ArthurZucker 4f212de
more
ArthurZucker dc5a22c
more
ArthurZucker 675b2bc
more
ArthurZucker f85f239
mllama
ArthurZucker 76b6a92
more up
ArthurZucker ba1a8b6
fix ernie
ArthurZucker ba3de5a
fix xopies
ArthurZucker 8fd255c
up more
ArthurZucker 5d7507b
more fixes
ArthurZucker 0fb2340
up
ArthurZucker 32b9273
up
ArthurZucker 0b95826
fix-copies
ArthurZucker 5794d27
fix more
ArthurZucker 5e71bd4
more updates
ArthurZucker 20d1b34
AI UPDATE
ArthurZucker 89846e7
up
ArthurZucker a581fd7
hoey
ArthurZucker 1652c9c
make it fast
Cyrilvallez dcad703
fix
Cyrilvallez c921ced
lol
ArthurZucker 50714d8
Merge branch 'refactor-weight-loading' of github.com:huggingface/tran…
ArthurZucker 8936cc4
fix asjusting
ArthurZucker 5c54332
more fixes
ArthurZucker ff10878
_dtype nit
ArthurZucker 9601b82
up
ArthurZucker db02b9d
nit
ArthurZucker 42fd4c4
update
ArthurZucker 4527171
update
ArthurZucker bd36211
remove semaphores
Cyrilvallez e2aefee
fix import to avoid jit execution
Cyrilvallez 74a0e9c
try to remove custom tiing logic when its stupid
ArthurZucker ead2ac3
Merge branch 'refactor-weight-loading' of github.com:huggingface/tran…
ArthurZucker e7165da
fix more individual models
ArthurZucker 2ff765e
fix whisper as well
ArthurZucker 912562c
fix?
ArthurZucker c43495a
fox umt5
ArthurZucker 57988f2
improve tqdm bar
Cyrilvallez 8c16de1
cleanup a bit
Cyrilvallez b8927d6
oupsi
Cyrilvallez 2733ff6
some updates
ArthurZucker 8baa3fe
Merge branch 'refactor-weight-loading' of github.com:huggingface/tran…
ArthurZucker d91701f
improve
Cyrilvallez 5146dec
Merge branch 'refactor-weight-loading' of github.com:huggingface/tran…
Cyrilvallez acc5b24
remove all buffering -> much faster without it
Cyrilvallez 58389a1
remove some tie_weights custome funcs when not needed
ArthurZucker 92c0229
more fixes related to strict matching regex
ArthurZucker d9e7fe6
Merge branch 'refactor-weight-loading' of github.com:huggingface/tran…
ArthurZucker b57d789
remove ALL custom tie weights
ArthurZucker ef8b6c3
small update
ArthurZucker a228fd0
revert change to init scheme (no need for params)
Cyrilvallez 07574dd
Merge branch 'refactor-weight-loading' of github.com:huggingface/tran…
ArthurZucker 2526cc5
mixtral init
Cyrilvallez 6cb3794
try less strict source check
ArthurZucker e4cadfb
Merge branch 'refactor-weight-loading' of github.com:huggingface/tran…
ArthurZucker 3fea865
tied weight first shot to the fiiiixxxxxx
Cyrilvallez 82f94b8
does this help?
ArthurZucker 84dd6eb
:)
ArthurZucker cc08195
fix some ppolry defined tied_weights_keys for now
ArthurZucker f692f4b
subclass nn.Parameters
ArthurZucker 2fa058f
up
ArthurZucker 78d4622
lol
ArthurZucker 8ff4ad5
Ouiiii
ArthurZucker 3222678
fix led
ArthurZucker 9a76a6e
fix long cat flash
ArthurZucker 9fde9f7
fix qwen and long cat flash
ArthurZucker 074a449
properly fix qwen init
ArthurZucker dde5500
just push this for now
ArthurZucker 0e7d2d0
propnet is dumb
ArthurZucker 18b02ee
update
ArthurZucker 9c0db72
push
ArthurZucker 75d3afc
remove explict sharing of some tied keys.
ArthurZucker 85ab085
update decoder.bias
ArthurZucker 443573a
moe case
ArthurZucker f8f0973
more changes to untangle old hardcoded ting
ArthurZucker 5c9d56c
fixup
ArthurZucker a0029f2
Merge branch 'main' into refactor-weight-loading
ArthurZucker 44943fb
fix big faileurs
ArthurZucker 76d66be
fix prophnet
ArthurZucker d176b48
Merge branch 'refactor-weight-loading' of github.com:huggingface/tran…
ArthurZucker 3ffc59e
fix resize token embeddings
ArthurZucker 2a00e49
nits
ArthurZucker f7d0183
fix xcodex
ArthurZucker bbf5b00
asyncio?
ArthurZucker 0412832
fix smart apply
ArthurZucker c137ea3
fix data-2-vec
ArthurZucker 7b7c990
[build-ci-image]
ArthurZucker de74aeb
checkout
ArthurZucker 94a53d4
uupdate
ArthurZucker 8755a4b
fix hunyuan
ArthurZucker 5be67b9
update error message
ArthurZucker 86a4e51
fix deformable detr
ArthurZucker 09bcd2e
fixes
ArthurZucker 7b457fd
fix init weights for non param gate up projs
ArthurZucker e033947
shared todo?
ArthurZucker f93f357
update some models
ArthurZucker 2f0a6ae
big revert, don't break this behaviour
ArthurZucker 3c8c757
ty @SunMarc this fixes the buffers
ArthurZucker f5a7c33
mt5 fuck
ArthurZucker 647f720
fix lxmbert
ArthurZucker bed6ea1
nuke slow test fetcher
ArthurZucker 2ec0a5f
fix zamba and deepcopy for now
ArthurZucker f9c7ef8
fix zamba tied weight keys! ~
ArthurZucker 8df3ffd
fix-copies
ArthurZucker e76481b
update fetch terst
ArthurZucker de00751
fix gradient for test modeling common!
ArthurZucker cdd1a9b
break "shared" for now I will fix tomorrow changes are properly isoal…
ArthurZucker d3f6476
does this fix marian? probably not
ArthurZucker 0a7db83
fix some vlms
ArthurZucker 1814200
D fine seems to handle this well
ArthurZucker b77825d
glob is fine actually
ArthurZucker 5dbb783
fix dab detr
ArthurZucker 9edc81b
small steps
ArthurZucker 970f4e5
opusy
ArthurZucker 0361d47
fix some more models?
ArthurZucker dc75773
yups
ArthurZucker cdb1284
better erro
ArthurZucker de9a2d9
fix?
ArthurZucker b9a9f4d
fix double escape
ArthurZucker c944619
escape wehere it makes sense
ArthurZucker f910524
??
ArthurZucker 4aa2ade
fix ibert
ArthurZucker 2ef1c2b
fix tvp as well
ArthurZucker b98a7bc
more fxes
ArthurZucker 74e6c87
try always download ref PR
ArthurZucker 5064edd
ONONONO
ArthurZucker 3f8a304
big fixup
ArthurZucker 3ecaa63
more fixup
ArthurZucker f384524
small step
ArthurZucker 290337a
small nits
ArthurZucker 76b388c
nits
ArthurZucker e69b988
brut force some stuff
ArthurZucker c2781f5
fix vilt
ArthurZucker f64ee96
make sure special models that always need tie always tie
ArthurZucker a3e4015
cleaning up
ArthurZucker 9eecbd2
small nits
ArthurZucker b2fa432
fix zamba and bridge tower!
ArthurZucker dbbfdf2
just fixup
ArthurZucker ab4890c
potential culprits
ArthurZucker 937ebf3
revert bark and fix bridgetower
ArthurZucker e4f9697
Merge branch 'main' of github.com:huggingface/transformers into refac…
ArthurZucker 17803ce
remove now non existant tie_weights
ArthurZucker 9f6838a
?
ArthurZucker 1afb3eb
lol reformer actually had nothing tied!
ArthurZucker f01a149
wow these two fucking models were really not well made
ArthurZucker 0b36980
fix sam family!
ArthurZucker d740c82
fix bark revision
ArthurZucker 6f3940e
fix speech2test ?
ArthurZucker b2f6f61
push this for now....
ArthurZucker ade8dab
upsy
ArthurZucker f956ccf
the fuck
ArthurZucker 99c6fd4
fix rtdetr
ArthurZucker 1ffcfc3
update
ArthurZucker ee62aec
proper
ArthurZucker 6ec80f8
wow that one 's annoying
ArthurZucker b05e329
update
ArthurZucker 2606596
try to find the culprit
ArthurZucker d9e8a09
get some help on common
ArthurZucker 581665a
nit about general init and cls.padding_idx
ArthurZucker c43bc68
revert num workers update
ArthurZucker b6fe415
remove old loading func
Cyrilvallez 4bb8e5c
fix glob
ArthurZucker 7d52b06
Merge branch 'refactor-weight-loading' of github.com:huggingface/tran…
ArthurZucker 455bcc7
add annotations
Cyrilvallez fc884c0
Merge branch 'refactor-weight-loading' of github.com:huggingface/tran…
Cyrilvallez 2e0ed5d
fix re
ArthurZucker 3ddd1cc
Merge branch 'refactor-weight-loading' of github.com:huggingface/tran…
ArthurZucker 1f86a10
small improvements
Cyrilvallez 4d56fbf
fix conflict
Cyrilvallez 67a8eeb
clean some stuff
Cyrilvallez e9168ff
improvements
Cyrilvallez feda22d
someone did not understannnnnnd what I tried to dooo or does BNB not …
ArthurZucker 70841c9
Merge branch 'refactor-weight-loading' of github.com:huggingface/tran…
ArthurZucker 52248ba
gluos
ArthurZucker e8dd4a4
fix case when `.` is just not there
ArthurZucker 1c67fc4
remove unused arg
Cyrilvallez e20ed00
recover orignal parameter/buffer using _original
SunMarc 827c42a
fix glob issu
ArthurZucker e5e4d28
Merge branch 'refactor-weight-loading' of github.com:huggingface/tran…
ArthurZucker 4db2aa6
this?
ArthurZucker 2b16c17
deepspeed best-effort
Cyrilvallez c411ddb
remove unused stuff
Cyrilvallez 56d368b
Update tie weight keys as they were just wroong
ArthurZucker 85d0ac1
up
ArthurZucker daa642c
Merge branch 'refactor-weight-loading' of github.com:huggingface/tran…
ArthurZucker bbf71b9
augustuc clauss, a gloubs gloups gloubs
ArthurZucker 127e4d5
fixup
ArthurZucker 7954185
fixup
ArthurZucker f7cd4b3
there was fucking typo
ArthurZucker f9e747e
mrain
ArthurZucker 57bf5b2
nits
ArthurZucker c38ad24
fix marian 3 remaining tests
ArthurZucker d7be7df
one more
ArthurZucker 729e3df
fix some of the copies, not all :)
ArthurZucker c95a3f1
small cleanup
ArthurZucker 8778840
one propertest
ArthurZucker 1181e3f
fix core model loadig tes
ArthurZucker b750e6b
attempt a new test
ArthurZucker 3178c3f
fix some of the annoying tests by supporting reading .bin sometimes
ArthurZucker d6ab250
push
ArthurZucker 0695197
push more small fixes
ArthurZucker fd5a75a
Merge branch 'main' of github.com:huggingface/transformers into refac…
ArthurZucker f54b528
remove 1 useless test
ArthurZucker 1abf6a9
up
ArthurZucker 3014290
fix audio flamingo post rebase
ArthurZucker 1f1bea3
fixup
ArthurZucker c2dbca0
some small updatess
ArthurZucker 347b966
fix sam models
ArthurZucker 40ed636
nits
ArthurZucker 3b2f934
up
ArthurZucker fb0fb89
updates
ArthurZucker 92e2771
onem ore
ArthurZucker 06f2ba9
skip this stupid test
ArthurZucker 3d5c86c
some other fixes
ArthurZucker 15bc48e
fixup
ArthurZucker 47743f8
update
ArthurZucker d77cf57
skip more offloaded stuff
ArthurZucker 75f2bd4
oups
ArthurZucker 08ad69b
ups
ArthurZucker b605e1a
update mixtral
ArthurZucker 91d40b8
skip this one
ArthurZucker 638bbfc
LET"SGO
ArthurZucker 7daacb4
fixup
ArthurZucker 22c19a7
rope delta order
ArthurZucker 6d89354
fix csm
ArthurZucker 9ccb693
small nit
ArthurZucker File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we do zeros? Might make sense to have 1s instead?
Ig this is tied to not using init weights