Skip to content

Commit 8cbb17b

Browse files
caandewielgeorgepaw
authored andcommitted
Update documentation regarding Keras API update
Summary: TF2.5 Only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jamiep, alexc, jackh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jamiep, alexc, jackh Subscribers: georgep, jackh, alfiee, grahamh, alexc, jamiep Maniphest Tasks: T56523 Differential Revision: https://phabricator.sourcevertex.net/D62197
1 parent f6f9304 commit 8cbb17b

File tree

4 files changed

+46
-13
lines changed

4 files changed

+46
-13
lines changed

tensorflow/compiler/plugin/poplar/docs/api-changes.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,14 @@ ________________
1414

1515
These will require changes to any code that uses them.
1616

17+
IPU Keras changes
18+
'''''''''''''''''
19+
20+
- The argument ``steps_per_execution`` in ``model.compile()`` now reflects
21+
the number of steps to process per execution *per replica* instead, whereas
22+
previously this reflected the number of steps to process per execution for
23+
all replicas combined.
24+
1725
Removal of deprecated APIs
1826
''''''''''''''''''''''''''
1927

tensorflow/compiler/plugin/poplar/docs/keras_tf2.rst

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,18 @@ inside the scope of an ``IPUStrategy``:
2424
Using steps_per_execution
2525
~~~~~~~~~~~~~~~~~~~~~~~~~
2626

27-
To reduce Python overhead and maximize the performance of your model, pass in
28-
the ``steps_per_execution`` argument to the compile method. This argument sets
29-
the number of batches to process sequentially in a single execution. You should
30-
increase this number to improve accelerator utilization.
27+
To reduce Python overhead and maximize the performance of your model, pass the
28+
``steps_per_execution`` argument to the compile method. This argument sets the
29+
number of batches processed sequentially by one replica in a single execution
30+
which can greatly improve performance because any overhead between steps is removed,
31+
thus increasing IPU utilization.
32+
33+
Ideally, ``steps_per_execution`` is equal to the number of steps your model needs
34+
to run per replica in order to complete one epoch. Note that it is not possible
35+
to fetch intermediate results when ``steps_per_execution`` is specified. Model
36+
weights are read on the Python host after all steps are executed on the IPU. If
37+
you need to access model weights during an epoch (for example for saving a
38+
checkpoint), you must set ``steps_per_execution`` accordingly.
3139

3240
.. note::
3341

@@ -69,8 +77,7 @@ for more details.
6977

7078
When using data-parallelism, the ``steps_per_execution`` value the model was
7179
compiled with must be an integer multiple of
72-
``gradient_accumulation_steps_per_replica`` multiplied by the number of
73-
replicas in the model. Data parallelism is discussed in
80+
``gradient_accumulation_steps_per_replica``. Data parallelism is discussed in
7481
:numref:`automatic-data-parallelism`.
7582

7683

tensorflow/compiler/plugin/poplar/docs/keras_tf2_example2.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ def create_dataset():
2929
train_ds = train_ds.map(lambda d, l:
3030
(tf.cast(d, tf.float32), tf.cast(l, tf.int32)))
3131

32-
return train_ds.repeat().prefetch(16)
32+
return train_ds.prefetch(16)
3333

3434

3535
dataset = create_dataset()
@@ -45,8 +45,10 @@ def create_dataset():
4545
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
4646
optimizer=tf.keras.optimizers.RMSprop(),
4747
metrics=["accuracy"],
48-
# Anything between 2 and `steps_per_epoch` could help here.
49-
steps_per_execution=50,
48+
# Anything between 2 and the length of the dataset would work,
49+
# but the greater `steps_per_execution` the greater the
50+
# performance gains.
51+
steps_per_execution=dataset.cardinality(),
5052
)
5153

52-
model.fit(dataset, epochs=2, steps_per_epoch=100)
54+
model.fit(dataset, epochs=2)

tensorflow/compiler/plugin/poplar/docs/keras_tf2_example3.py

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ def create_dataset():
2929
train_ds = train_ds.map(lambda d, l:
3030
(tf.cast(d, tf.float32), tf.cast(l, tf.int32)))
3131

32-
return train_ds.repeat().prefetch(16)
32+
return train_ds.prefetch(16)
3333

3434

3535
dataset = create_dataset()
@@ -40,15 +40,31 @@ def create_dataset():
4040
# Create a Keras model inside the strategy.
4141
model = create_model()
4242

43+
# `steps_per_execution` must be divisible by `gradient_accumulation_steps_per_replica`.
44+
# Say we want to accumulate 10 steps before doing a weight update, then we would end up
45+
# with the following values.
46+
gradient_accumulation_steps_per_replica = 10
47+
number_of_accumulated_steps = dataset.cardinality(
48+
) // gradient_accumulation_steps_per_replica
49+
50+
# In order to get the proper `steps_per_execution` value, we have to multiply
51+
# `number_of_accumulated_steps` with `gradient_accumulation_steps_per_replica`.
52+
steps_per_execution = number_of_accumulated_steps * \
53+
gradient_accumulation_steps_per_replica
54+
55+
# Now we need to truncate the dataset so Keras will not try to take more data
56+
# from the dataset than is available.
57+
dataset = dataset.take(steps_per_execution)
58+
4359
# Compile the model for training.
4460
model.compile(
4561
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
4662
optimizer=tf.keras.optimizers.RMSprop(),
4763
metrics=["accuracy"],
48-
steps_per_execution=50,
64+
steps_per_execution=steps_per_execution,
4965
)
5066

5167
model.set_gradient_accumulation_options(
5268
gradient_accumulation_steps_per_replica=10)
5369

54-
model.fit(dataset, epochs=2, steps_per_epoch=100)
70+
model.fit(dataset, epochs=2)

0 commit comments

Comments
 (0)