Skip to content

Commit 08d74b2

Browse files
committed
Mark GA reduction_method as experimental
Summary: Ref T57087 TF2.5 Only Reviewers: gauthamg, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: gauthamg, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T57087 Differential Revision: https://phabricator.sourcevertex.net/D62596
1 parent a736b60 commit 08d74b2

File tree

6 files changed

+47
-47
lines changed

6 files changed

+47
-47
lines changed

tensorflow/python/ipu/ipu_pipeline_estimator.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -104,13 +104,13 @@ def __new__(cls,
104104
during evaluation.
105105
prediction_hooks: List of instances of `tf.estimator.SessionRunHook` used
106106
during prediction.
107-
reduction_method: Reduction method to use when accumulating gradients.
108-
During the iterations in each optimizer step, the computed gradients
109-
can either be directly summed up or scaled such that we compute a mean
110-
of all gradients for each variable. Computing a mean avoids potential
111-
issues with overflow during accumulation especially when using
112-
float16, but gives smaller gradients and might require adjusting
113-
the learning-rate accordingly.
107+
reduction_method: (Experimental) Reduction method to use when accumulating
108+
gradients. During the iterations in each optimizer step, the computed
109+
gradients can either be directly summed up or scaled such that we
110+
compute a mean of all gradients for each variable. Computing a mean
111+
avoids potential issues with overflow during accumulation especially
112+
when using float16, but gives smaller gradients and might require
113+
adjusting the learning-rate accordingly.
114114
Defaults to `GradientAccumulationReductionMethod.SUM`
115115
(see :class:`~tensorflow.python.ipu.optimizers.GradientAccumulationReductionMethod`) # pylint: disable=line-too-long
116116
pipeline_op_kwargs: All remaining keyword arguments are forwarded to

tensorflow/python/ipu/keras/extensions/functional_extensions.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -377,13 +377,13 @@ def set_pipelining_options(
377377
before being added to the gradient accumulation buffer. Note that this
378378
option is experimental and the behavior might change in future releases.
379379
This value is saved/loaded when the model is saved/loaded.
380-
reduction_method: Reduction method to use when accumulating gradients.
381-
During the iterations in each optimizer step, the computed gradients
382-
can either be directly summed up or scaled such that we compute a mean
383-
of all gradients for each variable. Computing a mean avoids potential
384-
issues with overflow during accumulation especially when using
385-
float16, but gives smaller gradients and might require adjusting
386-
the learning-rate accordingly.
380+
gradient_accumulation_reduction_method: (Experimental) Reduction method
381+
to use when accumulating gradients. During the iterations in each
382+
optimizer step, the computed gradients can either be directly summed up
383+
or scaled such that we compute a mean of all gradients for each
384+
variable. Computing a mean avoids potential issues with overflow during
385+
accumulation especially when using float16, but gives smaller gradients
386+
and might require adjusting the learning-rate accordingly.
387387
Defaults to `GradientAccumulationReductionMethod.SUM`
388388
(see :class:`~tensorflow.python.ipu.optimizers.GradientAccumulationReductionMethod`) # pylint: disable=line-too-long
389389
pipelining_kwargs: All remaining keyword arguments are forwarded to

tensorflow/python/ipu/keras/extensions/model_extensions.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -486,13 +486,13 @@ def set_pipelining_options(
486486
before being added to the gradient accumulation buffer. Note that this
487487
option is experimental and the behavior might change in future releases.
488488
This value is saved/loaded when the model is saved/loaded.
489-
reduction_method: Reduction method to use when accumulating gradients.
490-
During the iterations in each optimizer step, the computed gradients
491-
can either be directly summed up or scaled such that we compute a mean
492-
of all gradients for each variable. Computing a mean avoids potential
493-
issues with overflow during accumulation especially when using
494-
float16, but gives smaller gradients and might require adjusting
495-
the learning-rate accordingly.
489+
gradient_accumulation_reduction_method: (Experimental) Reduction method
490+
to use when accumulating gradients. During the iterations in each
491+
optimizer step, the computed gradients can either be directly summed up
492+
or scaled such that we compute a mean of all gradients for each
493+
variable. Computing a mean avoids potential issues with overflow during
494+
accumulation especially when using float16, but gives smaller gradients
495+
and might require adjusting the learning-rate accordingly.
496496
Defaults to `GradientAccumulationReductionMethod.SUM`
497497
(see :class:`~tensorflow.python.ipu.optimizers.GradientAccumulationReductionMethod`) # pylint: disable=line-too-long
498498
pipelining_kwargs: All remaining keyword arguments are forwarded to

tensorflow/python/ipu/keras/extensions/sequential_extensions.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -317,13 +317,13 @@ def set_pipelining_options(
317317
before being added to the gradient accumulation buffer. Note that this
318318
option is experimental and the behavior might change in future releases.
319319
This value is saved/loaded when the model is saved/loaded.
320-
reduction_method: Reduction method to use when accumulating gradients.
321-
During the iterations in each optimizer step, the computed gradients
322-
can either be directly summed up or scaled such that we compute a mean
323-
of all gradients for each variable. Computing a mean avoids potential
324-
issues with overflow during accumulation especially when using
325-
float16, but gives smaller gradients and might require adjusting
326-
the learning-rate accordingly.
320+
gradient_accumulation_reduction_method: (Experimental) Reduction method
321+
to use when accumulating gradients. During the iterations in each
322+
optimizer step, the computed gradients can either be directly summed up
323+
or scaled such that we compute a mean of all gradients for each
324+
variable. Computing a mean avoids potential issues with overflow during
325+
accumulation especially when using float16, but gives smaller gradients
326+
and might require adjusting the learning-rate accordingly.
327327
Defaults to `GradientAccumulationReductionMethod.SUM`
328328
(see :class:`~tensorflow.python.ipu.optimizers.GradientAccumulationReductionMethod`) # pylint: disable=line-too-long
329329
pipelining_kwargs: All remaining keyword arguments are forwarded to

tensorflow/python/ipu/ops/pipelining_ops.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -693,11 +693,11 @@ def model(lr):
693693
outfeed queue, and if it is set to `True` it is not enqueued. Cannot be
694694
set when `outfeed_loss` is set. Can only be used when `optimizer_function`
695695
has been set.
696-
reduction_method: Reduction method to use when accumulating gradients.
697-
During the iterations in each optimizer step, the computed gradients
698-
can either be directly summed up or scaled such that we compute a mean
699-
of all gradients for each variable. Computing a mean avoids potential
700-
issues with overflow during accumulation especially when using
696+
reduction_method: (Experimental) Reduction method to use when accumulating
697+
gradients. During the iterations in each optimizer step, the computed
698+
gradients can either be directly summed up or scaled such that we compute
699+
a mean of all gradients for each variable. Computing a mean avoids
700+
potential issues with overflow during accumulation especially when using
701701
float16, but gives smaller gradients and might require adjusting
702702
the learning-rate accordingly.
703703
Defaults to `GradientAccumulationReductionMethod.SUM`

tensorflow/python/ipu/optimizers/gradient_accumulation_optimizer.py

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -118,13 +118,13 @@ def __init__(self,
118118
a cast is needed at some point to make them compatible. If you want
119119
to cast the gradients immediately, you can wrap your optimizer in the
120120
`MapGradientOptimizer` with a `tf.cast`.
121-
reduction_method: Reduction method to use when accumulating gradients.
122-
During the iterations in each optimizer step, the computed gradients
123-
can either be directly summed up or scaled such that we compute a mean
124-
of all gradients for each variable. Computing a mean avoids potential
125-
issues with overflow during accumulation especially when using
126-
float16, but gives smaller gradients and might require adjusting
127-
the learning-rate accordingly.
121+
reduction_method: (Experimental) Reduction method to use when accumulating
122+
gradients. During the iterations in each optimizer step, the computed
123+
gradients can either be directly summed up or scaled such that we
124+
compute a mean of all gradients for each variable. Computing a mean
125+
avoids potential issues with overflow during accumulation especially
126+
when using float16, but gives smaller gradients and might require
127+
adjusting the learning-rate accordingly.
128128
Defaults to `GradientAccumulationReductionMethod.SUM`
129129
(see :class:`~tensorflow.python.ipu.optimizers.GradientAccumulationReductionMethod`) # pylint: disable=line-too-long
130130
name: Optional name prefix for the operations created when applying
@@ -258,13 +258,13 @@ def __init__(self,
258258
a cast is needed at some point to make them compatible. If you want
259259
to cast the gradients immediately, you can wrap your optimizer in the
260260
`MapGradientOptimizer` with a `tf.cast`.
261-
reduction_method: Reduction method to use when accumulating gradients.
262-
During the iterations in each optimizer step, the computed gradients
263-
can either be directly summed up or scaled such that we compute a mean
264-
of all gradients for each variable. Computing a mean avoids potential
265-
issues with overflow during accumulation especially when using
266-
float16, but gives smaller gradients and might require adjusting
267-
the learning-rate accordingly.
261+
reduction_method: (Experimental) Reduction method to use when accumulating
262+
gradients. During the iterations in each optimizer step, the computed
263+
gradients can either be directly summed up or scaled such that we
264+
compute a mean of all gradients for each variable. Computing a mean
265+
avoids potential issues with overflow during accumulation especially
266+
when using float16, but gives smaller gradients and might require
267+
adjusting the learning-rate accordingly.
268268
Defaults to `GradientAccumulationReductionMethod.SUM`
269269
(see :class:`~tensorflow.python.ipu.optimizers.GradientAccumulationReductionMethod`) # pylint: disable=line-too-long
270270
name: Optional name prefix for the operations created when applying

0 commit comments

Comments
 (0)