You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Pytorch | NLP | BERT | WIKI-103 | Next sentence prediction, Masked language modelling, Question/Answering | ✅ | ✅ |[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805v2)|
10
+
| Pytorch | NLP | BERT | WIKI-103 | Next sentence prediction, Masked language modelling, Question/Answering | ✅ | ✅ |[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805v2)|
11
11
12
12
13
13
## Instructions summary
@@ -29,13 +29,13 @@ If no path is provided, then follow these steps:
### Employing automatic loss scaling (ALS) for half precision training
134
134
135
-
ALS is a feature in the Poplar SDK which brings stability to training large models in half precision, specially when gradient accumulation and reduction across replicas also happen in half precision.
135
+
ALS is a feature in the Poplar SDK which brings stability to training large models in half precision, specially when gradient accumulation and reduction across replicas also happen in half precision.
136
136
137
137
NB. This feature expects the `poptorch` training option `accumulationAndReplicationReductionType` to be set to `poptorch.ReductionType.Mean`, and for accumulation by the optimizer to be done in half precision (using `accum_type=torch.float16` when instantiating the optimizer), or else it may lead to unexpected behaviour.
0 commit comments