aws
diff --git a/‎doc/api/training/smd_data_parallel.rst‎
Lines changed: 7 additions & 7 deletions b/‎doc/api/training/smd_data_parallel.rst‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎doc/api/training/smd_data_parallel_pytorch.rst‎
Lines changed: 22 additions & 22 deletions b/‎doc/api/training/smd_data_parallel_pytorch.rst‎
Lines changed: 22 additions & 22 deletions
diff --git a/‎doc/api/training/smd_data_parallel_tensorflow.rst‎
Lines changed: 18 additions & 18 deletions b/‎doc/api/training/smd_data_parallel_tensorflow.rst‎
Lines changed: 18 additions & 18 deletions
diff --git a/‎doc/api/training/smd_model_parallel.rst‎
Lines changed: 10 additions & 10 deletions b/‎doc/api/training/smd_model_parallel.rst‎
Lines changed: 10 additions & 10 deletions
@@ -2,12 +2,12 @@
 Distributed data parallel
 ###################################
 
-SageMaker distributed data parallel (SDP) extends SageMaker’s training
+SageMaker's distributed data parallel library extends SageMaker’s training
 capabilities on deep learning models with near-linear scaling efficiency,
 achieving fast time-to-train with minimal code changes.
 
-- SDP optimizes your training job for AWS network infrastructure and EC2 instance topology.
-- SDP takes advantage of gradient update to communicate between nodes with a custom AllReduce algorithm.
+- optimizes your training job for AWS network infrastructure and EC2 instance topology.
+- takes advantage of gradient update to communicate between nodes with a custom AllReduce algorithm.
 
 When training a model on a large amount of data, machine learning practitioners
 will often turn to distributed training to reduce the time to train.
@@ -21,11 +21,11 @@ in performance. This drop in performance is primarily caused the communications
 overhead between nodes in a cluster.
 
 .. important::
-   SDP only supports training jobs using CUDA 11. When you define a PyTorch or TensorFlow
+   The distributed data parallel library only supports training jobs using CUDA 11. When you define a PyTorch or TensorFlow
    ``Estimator`` with ``dataparallel`` parameter ``enabled`` set to ``True``,
    it uses CUDA 11. When you extend or customize your own training image
    you must use a CUDA 11 base image. See
-   `SageMaker Python SDK's SDP APIs
+   `SageMaker Python SDK's distributed data parallel library APIs
    <https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api>`__
    for more information.
 
@@ -38,7 +38,7 @@ To customize your own training script, you will need the following:
    <div data-section-style="5" style="">
 
 -  You must provide TensorFlow / PyTorch training scripts that are
-   adapted to use SDP.
+   adapted to use the distributed data parallel library.
 -  Your input data must be in an S3 bucket or in FSx in the AWS region
    that you will use to launch your training job. If you use the Jupyter
    notebooks provided, create a SageMaker notebook instance in the same
@@ -53,7 +53,7 @@ To customize your own training script, you will need the following:
 
 Use the API guides for each framework to see
 examples of training scripts that can be used to convert your training scripts.
-Then, use one of the example notebooks as your template to launch a training job.
+Then use one of the example notebooks as your template to launch a training job.
 You’ll need to swap your training script with the one that came with the
 notebook and modify any input functions as necessary.
 Once you have launched a training job, you can monitor it using CloudWatch.
 
@@ -1,6 +1,6 @@
-####################
-PyTorch Guide to SDP
-####################
+##############################################################
+PyTorch Guide to SageMaker's distributed data parallel library
+##############################################################
 
 .. admonition:: Contents
 
@@ -13,16 +13,16 @@ Modify a PyTorch training script to use SageMaker data parallel
 ======================================================================
 
 The following steps show you how to convert a PyTorch training script to
-utilize SageMaker Distributed Data Parallel (SDP).
+utilize SageMaker's distributed data parallel library.
 
-The SDP APIs are designed to be close to PyTorch Distributed Data
-Parallel (DDP) APIs. Please see `SageMaker Distributed Data Parallel
-PyTorch API documentation <http://#>`__ for additional details on each
-API SDP offers for PyTorch.
+The distributed data parallel library APIs are designed to be close to PyTorch Distributed Data
+Parallel (DDP) APIs.
+See `SageMaker distributed data parallel PyTorch examples <https://sagemaker-examples.readthedocs.io/en/latest/training/distributed_training/index.html#pytorch-distributed>`__ for additional details on how to implement the data parallel library
+API offered for PyTorch.
 
 
--  First import SDP’s PyTorch client and initialize it. You also import
-   the SDP module for distributed training.
+-  First import the distributed data parallel library’s PyTorch client and initialize it. You also import
+   the distributed data parallel library module for distributed training.
 
    .. code:: python
 
@@ -33,7 +33,7 @@ API SDP offers for PyTorch.
       dist.init_process_group()
 
 
--  Pin each GPU to a single SDP process with ``local_rank`` - this
+-  Pin each GPU to a single distributed data parallel library process with ``local_rank`` - this
    refers to the relative rank of the process within a given node.
    ``smdistributed.dataparallel.torch.get_local_rank()`` API provides
    you the local rank of the device. The leader node will be rank 0, and
@@ -45,12 +45,12 @@ API SDP offers for PyTorch.
       torch.cuda.set_device(dist.get_local_rank())
 
 
--  Then wrap the PyTorch model with SDP’s DDP.
+-  Then wrap the PyTorch model with the distributed data parallel library’s DDP.
 
    .. code:: python
 
       model = ...
-      # Wrap model with SDP DistributedDataParallel
+      # Wrap model with SageMaker's DistributedDataParallel
       model = DDP(model)
 
 
@@ -82,17 +82,17 @@ API SDP offers for PyTorch.
 
 
 All put together, the following is an example PyTorch training script
-you will have for distributed training with SDP:
+you will have for distributed training with the distributed data parallel library:
 
 .. code:: python
 
-   # SDP: Import SDP PyTorch API
+   # Import distributed data parallel library PyTorch API
    import smdistributed.dataparallel.torch.distributed as dist
 
-   # SDP: Import SDP PyTorch DDP
+   # Import distributed data parallel library PyTorch DDP
    from smdistributed.dataparallel.torch.parallel.distributed import DistributedDataParallel as DDP
 
-   # SDP: Initialize SDP
+   # Initialize distributed data parallel library
    dist.init_process_group()
 
    class Net(nn.Module):
@@ -109,25 +109,25 @@ you will have for distributed training with SDP:
 
    def main():
 
-       # SDP: Scale batch size by world size
+       # Scale batch size by world size
        batch_size //= dist.get_world_size() // 8
        batch_size = max(batch_size, 1)
 
        # Prepare dataset
        train_dataset = torchvision.datasets.MNIST(...)
 
-       # SDP: Set num_replicas and rank in DistributedSampler
+       # Set num_replicas and rank in DistributedSampler
        train_sampler = torch.utils.data.distributed.DistributedSampler(
                train_dataset,
                num_replicas=dist.get_world_size(),
                rank=dist.get_rank())
 
        train_loader = torch.utils.data.DataLoader(..)
 
-       # SDP: Wrap the PyTorch model with SDP’s DDP
+       # Wrap the PyTorch model with distributed data parallel library’s DDP
        model = DDP(Net().to(device))
 
-       # SDP: Pin each GPU to a single SDP process.
+       # Pin each GPU to a single distributed data parallel library process.
        torch.cuda.set_device(local_rank)
        model.cuda(local_rank)
 
@@ -140,7 +140,7 @@ you will have for distributed training with SDP:
                test(...)
            scheduler.step()
 
-       # SDP: Save model on master node.
+       # Save model on master node.
        if dist.get_rank() == 0:
            torch.save(...)
 
 
@@ -1,6 +1,6 @@
-#######################
-TensorFlow Guide to SDP
-#######################
+#################################################################
+TensorFlow Guide to SageMaker's distributed data parallel library
+#################################################################
 
 .. admonition:: Contents
 
@@ -13,13 +13,13 @@ Modify a TensorFlow 2.x training script to use SageMaker data parallel
 ======================================================================
 
 The following steps show you how to convert a TensorFlow 2.x training
-script to utilize SDP.
+script to utilize the distributed data parallel library.
 
-The SDP APIs are designed to be close to Horovod APIs. Please see the
-SDP TensorFlow API specification for additional details on each API that
-SDP offers for TensorFlow.
+The distributed data parallel library APIs are designed to be close to Horovod APIs.
+See `SageMaker distributed data parallel TensorFlow examples <https://sagemaker-examples.readthedocs.io/en/latest/training/distributed_training/index.html#tensorflow-distributed>`__ for additional details on how to implement the data parallel library
+API offered for TensorFlow.
 
--  First import SDP’s TensorFlow client and initialize it:
+-  First import the distributed data parallel library’s TensorFlow client and initialize it:
 
    .. code:: python
 
@@ -54,7 +54,7 @@ SDP offers for TensorFlow.
       learning_rate = learning_rate * sdp.size()
 
 
--  Use SDP’s ``DistributedGradientTape`` to optimize AllReduce
+-  Use the library’s ``DistributedGradientTape`` to optimize AllReduce
    operations during training. This wraps ``tf.GradientTape``.
 
    .. code:: python
@@ -63,7 +63,7 @@ SDP offers for TensorFlow.
             output = model(input)
             loss_value = loss(label, output)
 
-      # SDP: Wrap tf.GradientTape with SDP's DistributedGradientTape
+      # Wrap tf.GradientTape with the library's DistributedGradientTape
       tape = sdp.DistributedGradientTape(tape)
 
 
@@ -92,23 +92,23 @@ SDP offers for TensorFlow.
 
 
 All put together, the following is an example TensorFlow2 training
-script you will have for distributed training with SDP.
+script you will have for distributed training with the library.
 
 .. code:: python
 
    import tensorflow as tf
 
-   # SDP: Import SDP TF API
+   # Import the library's TF API
    import smdistributed.dataparallel.tensorflow as sdp
 
-   # SDP: Initialize SDP
+   # Initialize the library
    sdp.init()
 
    gpus = tf.config.experimental.list_physical_devices('GPU')
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
    if gpus:
-       # SDP: Pin GPUs to a single SDP process
+       # Pin GPUs to a single process
        tf.config.experimental.set_visible_devices(gpus[sdp.local_rank()], 'GPU')
 
    # Prepare Dataset
@@ -118,7 +118,7 @@ script you will have for distributed training with SDP.
    mnist_model = tf.keras.Sequential(...)
    loss = tf.losses.SparseCategoricalCrossentropy()
 
-   # SDP: Scale Learning Rate
+   # Scale Learning Rate
    # LR for 8 node run : 0.000125
    # LR for single node run : 0.001
    opt = tf.optimizers.Adam(0.000125 * sdp.size())
@@ -129,22 +129,22 @@ script you will have for distributed training with SDP.
            probs = mnist_model(images, training=True)
            loss_value = loss(labels, probs)
 
-       # SDP: Wrap tf.GradientTape with SDP's DistributedGradientTape
+       # Wrap tf.GradientTape with the library's DistributedGradientTape
        tape = sdp.DistributedGradientTape(tape)
 
        grads = tape.gradient(loss_value, mnist_model.trainable_variables)
        opt.apply_gradients(zip(grads, mnist_model.trainable_variables))
 
        if first_batch:
-          # SDP: Broadcast model and optimizer variables
+          # Broadcast model and optimizer variables
           sdp.broadcast_variables(mnist_model.variables, root_rank=0)
           sdp.broadcast_variables(opt.variables(), root_rank=0)
 
        return loss_value
 
    ...
 
-   # SDP: Save checkpoints only from master node.
+   # Save checkpoints only from master node.
    if sdp.rank() == 0:
        checkpoint.save(checkpoint_dir)
 
 
@@ -1,30 +1,30 @@
 Distributed model parallel
 --------------------------
 
-Amazon SageMaker Distributed Model Parallel (SMP) is a model parallelism library for training
+The Amazon SageMaker distributed model parallel library is a model parallelism library for training
 large deep learning models that were previously difficult to train due to GPU memory limitations.
-SMP automatically and efficiently splits a model across multiple GPUs and instances and coordinates model training,
+The library automatically and efficiently splits a model across multiple GPUs and instances and coordinates model training,
 allowing you to increase prediction accuracy by creating larger models with more parameters.
 
-You can use SMP to automatically partition your existing TensorFlow and PyTorch workloads
-across multiple GPUs with minimal code changes. The SMP API can be accessed through the Amazon SageMaker SDK.
+You can use the library to automatically partition your existing TensorFlow and PyTorch workloads
+across multiple GPUs with minimal code changes. The library's API can be accessed through the Amazon SageMaker SDK.
 
-Use the following sections to learn more about the model parallelism and the SMP library.
+Use the following sections to learn more about the model parallelism and the library.
 
 .. important::
-   SMP only supports training jobs using CUDA 11. When you define a PyTorch or TensorFlow
+   The model parallel library only supports training jobs using CUDA 11. When you define a PyTorch or TensorFlow
    ``Estimator`` with ``modelparallel`` parameter ``enabled`` set to ``True``,
    it uses CUDA 11. When you extend or customize your own training image
    you must use a CUDA 11 base image. See
-   `Extend or Adapt A Docker Container that Contains SMP
+   `Extend or Adapt A Docker Container that Contains the Model Parallel Library
    <https://integ-docs-aws.amazon.com/sagemaker/latest/dg/model-parallel-use-api.html#model-parallel-customize-container>`__
    for more information.
 
 It is recommended to use this documentation alongside `SageMaker Distributed Model Parallel
 <http://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel.html>`__ in the Amazon SageMaker
 developer guide. This developer guide documentation includes:
 
-   -  An overview of model parallelism and the SMP library
+   -  An overview of model parallelism and the library
       `core features <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-core-features.html>`__
    -  Instructions on how to modify `TensorFlow
       <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script.html#model-parallel-customize-training-script-tf>`__
@@ -36,8 +36,8 @@ developer guide. This developer guide documentation includes:
 
 **How to Use this Guide**
 
-The SMP library contains a Common API that is shared across frameworks, as well as APIs
-that are specific to supported frameworks, TensorFlow and PyTroch. To use SMP, reference the
+The library contains a Common API that is shared across frameworks, as well as APIs
+that are specific to supported frameworks, TensorFlow and PyTroch. To use the library, reference the
 **Common API** documentation alongside framework specific API documentation.