Skip to content

Conversation

@wukaixingxp
Copy link
Contributor

Some models, like Llama models, did not have pad_token_id thus this pad_token() will return None, which cause the training failure. This PR will use eos_token_id if there is no pad_token_id to avoid that.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 8, 2025
@felipemello1
Copy link
Contributor

felipemello1 commented Nov 8, 2025

hey @wukaixingxp , thanks for the PR! The code changes seem minimal, but i am skeptical. The llama model was tested multiple times. Why would the error only appear now? Was it because previously the context was too small and it was never padded?

Could it be that there is another root cause and this PR is only fixing the symptom?

@wukaixingxp
Copy link
Contributor Author

previously claude provide my a 'workaround' that can get my llama 8B training working but it is WRONG!! My grpo loss is constantly ± 1k ~ 180K. With this fix, my grpo loss is about ± 1 ..

@felipemello1
Copy link
Contributor

felipemello1 commented Nov 8, 2025

@wukaixingxp cool, thank you, nice finding! Before i merge, do you mind posting one of your losses, does it go down?

@wukaixingxp
Copy link
Contributor Author

Screenshot 2025-11-08 at 2 41 56 PM Screenshot 2025-11-08 at 2 40 43 PM It could be my problem that the reward and loss in the graph is still not looking good, but now the grpo loss is at normal range.

@felipemello1 felipemello1 merged commit 4410e90 into meta-pytorch:main Nov 9, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants