-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Fix gpt oss weight loading with EP + bf16 #28765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: ashors1 <ashors@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly identifies and fixes a bug in the _load_weights_other method within the GptOssModel class. The parameters ep_rank_start and ep_rank_end were swapped in the function signature, leading to incorrect weight slicing when using expert parallelism. The change aligns the signature with its call site and makes it consistent with the _load_weights_mxfp4 method, resolving the indexing issue. The fix is accurate and well-contained.
jikunshang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! thanks for fixing.
Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Purpose
The signature for
_load_weights_otheris incorrect. The start and end indices are flipped, which casues an indexing issue when attempting to extract the weights on the current EP rank. This fixes the signature to be consistent with_load_weights_mxfp4.Test Plan
Test Result
Running
_load_weights_otherwith EP now works as expected.Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.