You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorials/sft_on_multi_host.md
+21-3Lines changed: 21 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,7 +50,7 @@ The `docker_upload_runner.sh` script uploads your Docker image to Artifact Regis
50
50
Install XPK by following the instructions in the [official documentation](https://github.com/AI-Hypercomputer/xpk?tab=readme-ov-file#installation-via-pip).
51
51
52
52
## 3. Create GKE cluster
53
-
If you don't already have a GKE cluster, create one by following the [XPK cluster creation guide](https://github.com/AI-Hypercomputer/xpk?tab=readme-ov-file#cluster-create).
53
+
If you don't already have a GKE cluster, create one by following the [XPK cluster creation guide](https://github.com/AI-Hypercomputer/xpk?tab=readme-ov-file#cluster-create). Ensure the cluster is Pathways-compatible when running SFT with Pathways.
54
54
55
55
## 4. Environment configuration
56
56
```bash
@@ -89,6 +89,9 @@ If you already have a MaxText-compatible model checkpoint, simply set the follow
89
89
```bash
90
90
export MODEL_CHECKPOINT_PATH=<gcs path for MaxText checkpoint># e.g., gs://my-bucket/my-model-checkpoint/0/items
91
91
```
92
+
**Note:** Make sure that `MODEL_CHECKPOINT_PATH` has the checkpoints created using the correct storage flags:
93
+
***For SFT with McJAX:**`checkpoint_storage_use_zarr3=True` and `checkpoint_storage_use_ocdbt=True`.
94
+
***For SFT with Pathways:**`checkpoint_storage_use_zarr3=False` and `checkpoint_storage_use_ocdbt=False`.
92
95
93
96
### Option 2: Converting a Hugging Face checkpoint
94
97
If your model checkpoint is from Hugging Face, you need to run a conversion script to make it MaxText-compatible.
2.**Run the Conversion Script:** Execute the following command that downloads the specified Hugging Face model and converts its weights into the MaxText format. The conversion script only supports official versions of models from Hugging Face. To see the specific models and versions currently supported for conversion, please refer to the `HF_IDS` dictionary in the MaxText utility file [here](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/utils/ckpt_conversion/utils/utils.py).
103
106
104
107
```bash
108
+
USE_ZARR3=<Flag to use zarr3># True to run SFT with McJAX, False to run SFT with Pathways
109
+
USE_OCDBT=<Flag to use ocdbt># True to run SFT with McJAX, False to run SFT with Pathways
0 commit comments