Skip to content

Commit efef03e

Browse files
author
Gautham Ganapathy
committed
Add documentation on Replicated Tensor Sharding to the user guide
Summary: REF T25545 Test Plan: !docs-only Reviewers: jamiep, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, #tech_docs, alfiee Reviewed By: jamiep, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, #tech_docs, alfiee Maniphest Tasks: T25545 Differential Revision: https://phabricator.sourcevertex.net/D77856
1 parent 0fc66a4 commit efef03e

File tree

1 file changed

+21
-0
lines changed

1 file changed

+21
-0
lines changed

tensorflow/compiler/plugin/poplar/docs/perf_training.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -523,6 +523,27 @@ Offloading variables into remote memory can reduce maximum memory liveness, but
523523
it can also increase the computation time of the weight update as more time is
524524
spent communicating with the host.
525525

526+
.. _replicated-tensor-sharding:
527+
528+
Replicated tensor sharding
529+
~~~~~~~~~~~~~~~~~~~~~~~~~~
530+
531+
Replicated tensor sharding (RTS) attempts to reduce memory usage in replicated
532+
models by partitioning applicable tensors and distributing them across replicas.
533+
RTS identifies tensors that share the same value across replicas and partitions
534+
each tensor into evenly sized shards such that each replica stores one shard.
535+
If an operation requires the full tensor, the shards can be broadcast to all
536+
replicas.
537+
538+
RTS is used to save memory when using stateful optimizers, such as Adam or LAMB,
539+
with replicas. In Keras, it can be enabled by setting the
540+
`replicated_optimizer_state_sharding`
541+
argument to True in the
542+
:py:func:`~keras.ipu.extensions.FunctionalExtension.set_gradient_accumulation_options`
543+
method for non-pipelined models and the
544+
:py:func:`~keras.ipu.extensions.FunctionalExtension.set_pipelining_options` method
545+
for pipelined models.
546+
526547
Dataset benchmarking
527548
~~~~~~~~~~~~~~~~~~~~
528549
In order to fully utilise the potential of the IPU, the ``tf.data.Dataset`` used

0 commit comments

Comments
 (0)