You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
These changes are an update to the elementwise clustering algorithm to allow RTS to work with the LAMB optimiser.
RTS was not working with LAMB because the clusters in LAMB had intermediate scalar values which were not included.
In the old algorithm, all scalars were excluded by default to prevent otherwise unrelated clusters from being joined via hyperparameters.
There was a mechanism to allow certain scalar values into clusters:
- Recursively search operands for all non-scalar values which the scalar depends on.
- If all of the non-scalar values have the same shape as the top of the cluster, add the scalar to the cluster.
The problem with this was that in LAMB with BERT, there are scalars which depend on a large number of non-scalar values. The algorithm doesn't consider if the ops it's looking at are actually clusterable (replica-identical and supported op type) so it ends up checking lots of irrelevant ops and failing.
With the new algorithm, all ops in a given cluster are the same shape when the clusters are first created.
Then the merging step is used to allow for clusters with differently shaped intermediate values.
This guarantees that only clusterable ops get considered.
Since the clusters are already constructed, we can more robustely determine if one cluster contains intermediates for another.
Changes to the algorithm:
- Renaming some functions to be less confusing.
- Using `ReplicaIdenticalDataflowAnalysis` to identify more accurately which ops are replica identical. This allowed the the removal of code which precalculated which fusion computations are clusterable.
- Moving all logic which determines if an op is clusterable into one function (IsClusterable).
- Tightening the requirements for which ops can be added to a cluster during cluster creation. The shape of the op must match the shape of the top of the cluster. This means initially, all ops in a cluster have the same shape.
- Changing the merging logic so that normally clusters will only be merged if their top elements have the same shape. The exception to this is when a cluster `b` is surrounded by another cluster `a` (`a` directly uses the outputs of `b` but `b`'s inputs are reachable from `a`), in which case they are merged even if the shapes don't match. This essentially allows clusters to have differently shaped intermediate values.
Test Plan:
There are existing tests (`resource_update_elementwise_clustering_test.cc` and `replicated_resource_update_elementwise_clustering_test.cc`).
I have added a test based on the issue which was preventing RTS working with LAMB. The test fails before the changes and passes after.
Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, gauthamg, georgep
Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, gauthamg
Subscribers: gauthamg
Maniphest Tasks: T65700
Differential Revision: https://phabricator.sourcevertex.net/D72891
0 commit comments