You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: week05_large_models/practice_part1.ipynb
+13-9Lines changed: 13 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
"cells": [
3
3
{
4
4
"cell_type": "code",
5
-
"execution_count": 8,
5
+
"execution_count": null,
6
6
"metadata": {
7
7
"id": "0TH9Am-9ztHB"
8
8
},
@@ -42,7 +42,7 @@
42
42
},
43
43
{
44
44
"cell_type": "code",
45
-
"execution_count": 3,
45
+
"execution_count": null,
46
46
"metadata": {
47
47
"colab": {
48
48
"base_uri": "https://localhost:8080/"
@@ -81,7 +81,7 @@
81
81
},
82
82
{
83
83
"cell_type": "code",
84
-
"execution_count": 4,
84
+
"execution_count": null,
85
85
"metadata": {
86
86
"id": "sTuoIY_tNSVk",
87
87
"colab": {
@@ -279,7 +279,7 @@
279
279
},
280
280
{
281
281
"cell_type": "code",
282
-
"execution_count": 5,
282
+
"execution_count": null,
283
283
"metadata": {
284
284
"colab": {
285
285
"base_uri": "https://localhost:8080/"
@@ -305,7 +305,7 @@
305
305
},
306
306
{
307
307
"cell_type": "code",
308
-
"execution_count": 6,
308
+
"execution_count": null,
309
309
"metadata": {
310
310
"colab": {
311
311
"base_uri": "https://localhost:8080/"
@@ -354,7 +354,7 @@
354
354
},
355
355
{
356
356
"cell_type": "code",
357
-
"execution_count": 7,
357
+
"execution_count": null,
358
358
"metadata": {
359
359
"colab": {
360
360
"base_uri": "https://localhost:8080/",
@@ -564,7 +564,7 @@
564
564
},
565
565
{
566
566
"cell_type": "code",
567
-
"execution_count": 8,
567
+
"execution_count": null,
568
568
"metadata": {
569
569
"colab": {
570
570
"base_uri": "https://localhost:8080/"
@@ -660,6 +660,7 @@
660
660
"\n",
661
661
"__Task 1.2:__ generate a short sequence given a prefix. You may choose any generation task that requires generating at least 25 consecutive tokens. Here's one example from the NLP course (the generated code is in blue)\n",
662
662
"\n",
663
+
"\n",
663
664
"\n",
664
665
"You may use model.generate (if your code is compatible with that) or write your own inference loop. If you choose to write your own loop, you are free to use sampling, greedy, top-p, top-k or any other [inference mode supported by HF transformers](https://huggingface.co/docs/transformers/main_classes/text_generation).\n",
665
666
"\n",
@@ -672,7 +673,10 @@
672
673
"- __+1 point__ you can perform forward pass on 128x1024 tokens of actual text data (e.g. the sample data above)\n",
673
674
"- __+1 point__ you can compute gradients with offloading on the same 128x1024 tokens from the real text data\n",
674
675
"- __+1 point__ you can inference the model - and it generates some human-readable text\n",
675
-
"- __bonus points__ optimize your code so that it would pre-load the next offloaded layer in background\n",
676
+
"- __bonus points:__ we offer two optional assignments:\n",
677
+
" - **Selective activation checkpointing (2pt):** there is a gentler version of gradient checkpointing where you don't just remember the layer inputs, but also some activations that are easier to compute - compared to their size. For instance, MLP linear layers are compute-heavy, but the nonlinearity is relatively compute-light for the same amount of memory. You can re-compute only the compute-light operations and keep the compute-heavy ones in memory. There's [a paper](https://arxiv.org/pdf/2205.05198) that describes such an approach in detail (see 'Selective activation checkpointing').\n",
678
+
" - **Prefetch offloaded layers (2pt):** optimize your code so that it begins pre-loading the next offloaded layer in the background, while computing the current layer. It can be done with a copy with non_blocking=True, or, for fine-grained control, CUDA streams. To get the full grade for this assignment, please demonstrate that your approach is faster than naive offloading, at least during large batch forward/backward pass. This can be done using a profiler.\n",
679
+
" - Please note that the maximum points for this week are **capped at 14**.\n",
676
680
"\n",
677
681
"__Conditions:__\n",
678
682
"- using more than 10GiB of GPU memory at any point is forbidden (check with [`torch.cuda.max_memory_allocated()`](https://pytorch.org/docs/stable/generated/torch.cuda.max_memory_allocated.html))\n",
0 commit comments