Mark GA reduction_method as experimental

georgepaw · georgepaw · commit 08d74b2726d8 · 2022-03-07T12:57:13.000Z
Summary: Ref T57087 TF2.5 Only Reviewers: gauthamg, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: gauthamg, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T57087 Differential Revision: https://phabricator.sourcevertex.net/D62596
diff --git a/tensorflow/python/ipu/ipu_pipeline_estimator.py b/tensorflow/python/ipu/ipu_pipeline_estimator.py
@@ -104,13 +104,13 @@ def __new__(cls,
         during evaluation.
       prediction_hooks: List of instances of `tf.estimator.SessionRunHook` used
         during prediction.
-      reduction_method: Reduction method to use when accumulating gradients.
-        During the iterations in each optimizer step, the computed gradients
-        can either be directly summed up or scaled such that we compute a mean
-        of all gradients for each variable. Computing a mean avoids potential
-        issues with overflow during accumulation especially when using
-        float16, but gives smaller gradients and might require adjusting
-        the learning-rate accordingly.
+      reduction_method: (Experimental) Reduction method to use when accumulating
+        gradients. During the iterations in each optimizer step, the computed
+        gradients can either be directly summed up or scaled such that we
+        compute a mean of all gradients for each variable. Computing a mean
+        avoids potential issues with overflow during accumulation especially
+        when using float16, but gives smaller gradients and might require
+        adjusting the learning-rate accordingly.
         Defaults to `GradientAccumulationReductionMethod.SUM`
         (see :class:`~tensorflow.python.ipu.optimizers.GradientAccumulationReductionMethod`)  # pylint: disable=line-too-long
       pipeline_op_kwargs: All remaining keyword arguments are forwarded to
diff --git a/tensorflow/python/ipu/keras/extensions/functional_extensions.py b/tensorflow/python/ipu/keras/extensions/functional_extensions.py
@@ -377,13 +377,13 @@ def set_pipelining_options(
         before being added to the gradient accumulation buffer. Note that this
         option is experimental and the behavior might change in future releases.
         This value is saved/loaded when the model is saved/loaded.
-      reduction_method: Reduction method to use when accumulating gradients.
-        During the iterations in each optimizer step, the computed gradients
-        can either be directly summed up or scaled such that we compute a mean
-        of all gradients for each variable. Computing a mean avoids potential
-        issues with overflow during accumulation especially when using
-        float16, but gives smaller gradients and might require adjusting
-        the learning-rate accordingly.
+      gradient_accumulation_reduction_method:  (Experimental)  Reduction method
+        to use when accumulating gradients. During the iterations in each
+        optimizer step, the computed gradients can either be directly summed up
+        or scaled such that we compute a mean of all gradients for each
+        variable. Computing a mean avoids potential issues with overflow during
+        accumulation especially when using float16, but gives smaller gradients
+        and might require adjusting the learning-rate accordingly.
         Defaults to `GradientAccumulationReductionMethod.SUM`
         (see :class:`~tensorflow.python.ipu.optimizers.GradientAccumulationReductionMethod`)  # pylint: disable=line-too-long
       pipelining_kwargs: All remaining keyword arguments are forwarded to
diff --git a/tensorflow/python/ipu/keras/extensions/model_extensions.py b/tensorflow/python/ipu/keras/extensions/model_extensions.py
@@ -486,13 +486,13 @@ def set_pipelining_options(
         before being added to the gradient accumulation buffer. Note that this
         option is experimental and the behavior might change in future releases.
         This value is saved/loaded when the model is saved/loaded.
-      reduction_method: Reduction method to use when accumulating gradients.
-        During the iterations in each optimizer step, the computed gradients
-        can either be directly summed up or scaled such that we compute a mean
-        of all gradients for each variable. Computing a mean avoids potential
-        issues with overflow during accumulation especially when using
-        float16, but gives smaller gradients and might require adjusting
-        the learning-rate accordingly.
+      gradient_accumulation_reduction_method:  (Experimental)  Reduction method
+        to use when accumulating gradients. During the iterations in each
+        optimizer step, the computed gradients can either be directly summed up
+        or scaled such that we compute a mean of all gradients for each
+        variable. Computing a mean avoids potential issues with overflow during
+        accumulation especially when using float16, but gives smaller gradients
+        and might require adjusting the learning-rate accordingly.
         Defaults to `GradientAccumulationReductionMethod.SUM`
         (see :class:`~tensorflow.python.ipu.optimizers.GradientAccumulationReductionMethod`)  # pylint: disable=line-too-long
       pipelining_kwargs: All remaining keyword arguments are forwarded to
diff --git a/tensorflow/python/ipu/keras/extensions/sequential_extensions.py b/tensorflow/python/ipu/keras/extensions/sequential_extensions.py
@@ -317,13 +317,13 @@ def set_pipelining_options(
         before being added to the gradient accumulation buffer. Note that this
         option is experimental and the behavior might change in future releases.
         This value is saved/loaded when the model is saved/loaded.
-      reduction_method: Reduction method to use when accumulating gradients.
-        During the iterations in each optimizer step, the computed gradients
-        can either be directly summed up or scaled such that we compute a mean
-        of all gradients for each variable. Computing a mean avoids potential
-        issues with overflow during accumulation especially when using
-        float16, but gives smaller gradients and might require adjusting
-        the learning-rate accordingly.
+      gradient_accumulation_reduction_method:  (Experimental)  Reduction method
+        to use when accumulating gradients. During the iterations in each
+        optimizer step, the computed gradients can either be directly summed up
+        or scaled such that we compute a mean of all gradients for each
+        variable. Computing a mean avoids potential issues with overflow during
+        accumulation especially when using float16, but gives smaller gradients
+        and might require adjusting the learning-rate accordingly.
         Defaults to `GradientAccumulationReductionMethod.SUM`
         (see :class:`~tensorflow.python.ipu.optimizers.GradientAccumulationReductionMethod`)  # pylint: disable=line-too-long
       pipelining_kwargs: All remaining keyword arguments are forwarded to
diff --git a/tensorflow/python/ipu/ops/pipelining_ops.py b/tensorflow/python/ipu/ops/pipelining_ops.py
@@ -693,11 +693,11 @@ def model(lr):
       outfeed queue, and if it is set to `True` it is not enqueued. Cannot be
       set when `outfeed_loss` is set. Can only be used when `optimizer_function`
       has been set.
-    reduction_method: Reduction method to use when accumulating gradients.
-      During the iterations in each optimizer step, the computed gradients
-      can either be directly summed up or scaled such that we compute a mean
-      of all gradients for each variable. Computing a mean avoids potential
-      issues with overflow during accumulation especially when using
+    reduction_method: (Experimental) Reduction method to use when accumulating
+      gradients. During the iterations in each optimizer step, the computed
+      gradients can either be directly summed up or scaled such that we compute
+      a mean of all gradients for each variable. Computing a mean avoids
+      potential issues with overflow during accumulation especially when using
       float16, but gives smaller gradients and might require adjusting
       the learning-rate accordingly.
       Defaults to `GradientAccumulationReductionMethod.SUM`
diff --git a/tensorflow/python/ipu/optimizers/gradient_accumulation_optimizer.py b/tensorflow/python/ipu/optimizers/gradient_accumulation_optimizer.py
@@ -118,13 +118,13 @@ def __init__(self,
         a cast is needed at some point to make them compatible. If you want
         to cast the gradients immediately, you can wrap your optimizer in the
         `MapGradientOptimizer` with a `tf.cast`.
-      reduction_method: Reduction method to use when accumulating gradients.
-        During the iterations in each optimizer step, the computed gradients
-        can either be directly summed up or scaled such that we compute a mean
-        of all gradients for each variable. Computing a mean avoids potential
-        issues with overflow during accumulation especially when using
-        float16, but gives smaller gradients and might require adjusting
-        the learning-rate accordingly.
+      reduction_method: (Experimental) Reduction method to use when accumulating
+        gradients. During the iterations in each optimizer step, the computed
+        gradients can either be directly summed up or scaled such that we
+        compute a mean of all gradients for each variable. Computing a mean
+        avoids potential issues with overflow during accumulation especially
+        when using float16, but gives smaller gradients and might require
+        adjusting the learning-rate accordingly.
         Defaults to `GradientAccumulationReductionMethod.SUM`
         (see :class:`~tensorflow.python.ipu.optimizers.GradientAccumulationReductionMethod`)  # pylint: disable=line-too-long
       name: Optional name prefix for the operations created when applying
@@ -258,13 +258,13 @@ def __init__(self,
         a cast is needed at some point to make them compatible. If you want
         to cast the gradients immediately, you can wrap your optimizer in the
         `MapGradientOptimizer` with a `tf.cast`.
-      reduction_method: Reduction method to use when accumulating gradients.
-        During the iterations in each optimizer step, the computed gradients
-        can either be directly summed up or scaled such that we compute a mean
-        of all gradients for each variable. Computing a mean avoids potential
-        issues with overflow during accumulation especially when using
-        float16, but gives smaller gradients and might require adjusting
-        the learning-rate accordingly.
+      reduction_method: (Experimental) Reduction method to use when accumulating
+        gradients. During the iterations in each optimizer step, the computed
+        gradients can either be directly summed up or scaled such that we
+        compute a mean of all gradients for each variable. Computing a mean
+        avoids potential issues with overflow during accumulation especially
+        when using float16, but gives smaller gradients and might require
+        adjusting the learning-rate accordingly.
         Defaults to `GradientAccumulationReductionMethod.SUM`
         (see :class:`~tensorflow.python.ipu.optimizers.GradientAccumulationReductionMethod`)  # pylint: disable=line-too-long
       name: Optional name prefix for the operations created when applying