You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Log effective batch size during training when gradient accumulation and/or replication is used
Summary:
Unless the user has prior knowledge, it isn't obvious that the batch size provided to `fit()` (or `dataset.batch(batch_size)`) is per replica, rather than the effective batch size from an ML perspective. This source of confusion is added to when gradient accumulation is also used.
This diff adds a log which states the effective batch size from the training optimizer's perspective. It dynamically explains the effective batch size by considering the following 3 possibilities:
- gradient accumulation > 1 - log only mentions gradient accumulation
- number of replicas > 1 - log only mentions replication
- number of replicas > 1 and gradient accumulation > 1 - log mentions both
TF2.5 Only
Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, markf, vladimirm, christiana
Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, christiana
Subscribers: vladimirm
Maniphest Tasks: T56300
Differential Revision: https://phabricator.sourcevertex.net/D61155
0 commit comments