-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Fix ReduceLROnPlateau scheduler error when validation doesn't run every epoch #21266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Only update plateau schedulers on epochs when validation runs - This prevents errors when monitored metrics are not available - Added test case for this scenario Co-authored-by: Borda <6035284+Borda@users.noreply.github.com>
Co-authored-by: Borda <6035284+Borda@users.noreply.github.com>
⛈️ Required checks status: Has failure 🔴
Groups summary🔴 pytorch_lightning: Tests workflow
These checks are required after the changes to 🟢 pytorch_lightning: lit GPU
These checks are required after the changes to 🟢 Benchmarks
These checks are required after the changes to 🟢 pytorch_lightning: Docs
These checks are required after the changes to 🟢 mypy
These checks are required after the changes to 🟢 install
These checks are required after the changes to Thank you for your contribution! 💜
|
❌ 1 Tests Failed:
View the full list of 1 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
What does this PR do?
Fixes a
MisconfigurationExceptionthat occurs when usingReduceLROnPlateauscheduler withcheck_val_every_n_epoch > 1. The scheduler was attempting to access validation metrics on epochs where validation didn't run, causing an error.Issue
When
check_val_every_n_epochis set to a value greater than 1, validation only runs on specific epochs (e.g., every 2nd epoch). However, theReduceLROnPlateauscheduler was being updated at the end of every epoch, attempting to access the monitored metric (e.g.,val/loss) even when it wasn't available.Example error:
Before submitting
Solution
Modified the scheduler update logic in
fit_loop.pyto only update plateau schedulers when validation actually runs. This ensures the monitored metrics are available when the scheduler needs them.Before:
After:
Testing
Added test case
test_reducelronplateau_with_check_val_every_n_epochthat verifies the fix works correctly when validation runs every N epochs.Impact
"strict": Falseto avoid the errorFixes #<issue_number>
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.
📚 Documentation preview 📚: https://pytorch-lightning--21266.org.readthedocs.build/en/21266/