Skip to content

Commit 8d9588a

Browse files
Merge pull request #2361 from SamuelMarks:v2-layout2
PiperOrigin-RevId: 828168144
2 parents d29bb51 + b4e08bd commit 8d9588a

40 files changed

+118
-54
lines changed

.readthedocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,4 @@ sphinx:
2121
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
2222
python:
2323
install:
24-
- requirements: requirements_docs.txt
24+
- requirements: dependencies/requirements/requirements_docs.txt

PREFLIGHT.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ bash preflight.sh PLATFORM=GCE && numactl --membind 0 --cpunodebind=0 python3 -m
2626
```
2727

2828
For GKE,
29-
`numactl` should be built into your docker image from [maxtext_dependencies.Dockerfile](https://github.com/google/maxtext/blob/main/maxtext_dependencies.Dockerfile), so you can use it directly if you built the maxtext docker image. Here is an example
29+
`numactl` should be built into your docker image from [maxtext_dependencies.Dockerfile](https://github.com/google/maxtext/blob/main/dependencies/dockerfiles/maxtext_dependencies.Dockerfile), so you can use it directly if you built the maxtext docker image. Here is an example
3030

3131
```
3232
bash preflight.sh PLATFORM=GKE && numactl --membind 0 --cpunodebind=0 python3 -m MaxText.train src/MaxText/configs/base.yml run_name=$YOUR_JOB_NAME

docs/development.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ If you are writing documentation for MaxText, you may want to preview the docume
1212
First, make sure you install the necessary dependencies. You can do this by navigating to your local clone of the MaxText repo and running:
1313

1414
```bash
15-
pip install -r requirements_docs.txt
15+
pip install -r dependencies/requirements/requirements_docs.txt
1616
```
1717

1818
Once the dependencies are installed, you can navigate to the `docs/` folder and run:

docs/guides/data_input_pipeline/data_input_grain.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -29,17 +29,17 @@ Grain ensures determinism in data input pipelines by saving the pipeline's state
2929

3030
## Using Grain
3131
1. Grain currently supports two data formats: [ArrayRecord](https://github.com/google/array_record) (random access) and [Parquet](https://arrow.apache.org/docs/python/parquet.html) (partial random-access through row groups). Only the ArrayRecord format supports the global shuffle mentioned above. For converting a dataset into ArrayRecord, see [Apache Beam Integration for ArrayRecord](https://github.com/google/array_record/tree/main/beam). Additionally, other random access data sources can be supported via a custom [data source](https://google-grain.readthedocs.io/en/latest/data_sources.html) class.
32-
2. When the dataset is hosted on a Cloud Storage bucket, Grain can read it through [Cloud Storage FUSE](https://cloud.google.com/storage/docs/gcs-fuse). The installation of Cloud Storage FUSE is included in [setup.sh](https://github.com/google/maxtext/blob/main/setup.sh). The user then needs to mount the Cloud Storage bucket to a local path for each worker, using the script [setup_gcsfuse.sh](https://github.com/google/maxtext/blob/main/setup_gcsfuse.sh). The script configures some parameters for the mount.
33-
```
34-
bash setup_gcsfuse.sh \
32+
2. When the dataset is hosted on a Cloud Storage bucket, Grain can read it through [Cloud Storage FUSE](https://cloud.google.com/storage/docs/gcs-fuse). The installation of Cloud Storage FUSE is included in [setup.sh](https://github.com/google/maxtext/blob/main/tools/setup/setup.sh). The user then needs to mount the Cloud Storage bucket to a local path for each worker, using the script [setup_gcsfuse.sh](https://github.com/google/maxtext/blob/main/tools/setup/setup_gcsfuse.sh). The script configures some parameters for the mount.
33+
```sh
34+
bash tools/setup/setup_gcsfuse.sh \
3535
DATASET_GCS_BUCKET=$BUCKET_NAME \
3636
MOUNT_PATH=$MOUNT_PATH \
3737
[FILE_PATH=$MOUNT_PATH/my_dataset]
3838
# FILE_PATH is optional, when provided, the script runs "ls -R" for pre-filling the metadata cache
3939
# https://cloud.google.com/storage/docs/cloud-storage-fuse/performance#improve-first-time-reads
4040
```
4141
3. Set `dataset_type=grain`, `grain_file_type={arrayrecord|parquet}`, `grain_train_files` to match the file pattern on the mounted local path.
42-
4. Tune `grain_worker_count` for performance. This parameter controls the number of child processes used by Grain (more details in [behind_the_scenes](https://google-grain.readthedocs.io/en/latest/behind_the_scenes.html), [grain_pool.py](https://github.com/google/grain/blob/main/grain/_src/python/grain_pool.py)). If you use a large number of workers, check your config for gcsfuse in [setup_gcsfuse.sh](https://github.com/google/maxtext/blob/main/setup_gcsfuse.sh) to avoid gcsfuse throttling.
42+
4. Tune `grain_worker_count` for performance. This parameter controls the number of child processes used by Grain (more details in [behind_the_scenes](https://google-grain.readthedocs.io/en/latest/behind_the_scenes.html), [grain_pool.py](https://github.com/google/grain/blob/main/grain/_src/python/grain_pool.py)). If you use a large number of workers, check your config for gcsfuse in [setup_gcsfuse.sh](https://github.com/google/maxtext/blob/main/tools/setup/setup_gcsfuse.sh) to avoid gcsfuse throttling.
4343

4444
5. For multi-source blending, you can specify multiple data sources with their respective weights using semicolon (;) as a separator and colon (:) for weights. The weights will be automatically normalized to sum to 1.0. For example:
4545
```
@@ -52,8 +52,8 @@ grain_train_files=/tmp/gcsfuse/dataset1.array_record*:1;/tmp/gcsfuse/dataset2.ar
5252
Note: When using multiple data sources, only the ArrayRecord format is supported.
5353

5454
6. Example command:
55-
```
56-
bash setup_gcsfuse.sh \
55+
```sh
56+
bash tools/setup/setup_gcsfuse.sh \
5757
DATASET_GCS_BUCKET=maxtext-dataset \
5858
MOUNT_PATH=/tmp/gcsfuse && \
5959
python3 -m MaxText.train src/MaxText/configs/base.yml \

docs/guides/data_input_pipeline/data_input_tfds.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# TFDS pipeline
22

33
1. Download the Allenai C4 dataset in TFRecord format to a Cloud Storage bucket. For information about cost, see [this discussion](https://github.com/allenai/allennlp/discussions/5056)
4-
```
5-
bash download_dataset.sh {GCS_PROJECT} {GCS_BUCKET_NAME}
4+
```sh
5+
bash tools/data_generation/download_dataset.sh ${GCS_PROJECT} ${GCS_BUCKET_NAME}
66
```
77
2. In `src/MaxText/configs/base.yml` or through command line, set the following parameters:
88
```yaml

docs/guides/knowledge_distillation.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,12 +47,13 @@ export RUN_NAME = <unique name for the run>
4747

4848
#### b. Install dependencies
4949

50-
```
50+
```sh
5151
git clone https://github.com/AI-Hypercomputer/maxtext.git
5252
python3 -m venv ~/venv-maxtext
5353
source ~/venv-maxtext/bin/activate
54+
python3 -m pip install uv
5455
cd maxtext
55-
uv pip install -r requirements.txt
56+
uv pip install -r dependencies/requirements/requirements.txt
5657
```
5758

5859
### 1. Obtain and prepare the teacher model

docs/guides/run_maxtext/run_maxtext_localhost.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Within the root directory of the cloned repo, create a virtual environment and i
4343
```bash
4444
python3.12 -m venv ~/venv-maxtext
4545
source ~/venv-maxtext/bin/activate
46-
bash setup.sh DEVICE={tpu|gpu}
46+
bash tools/setup/setup.sh DEVICE={tpu|gpu}
4747
```
4848

4949
#### Run a Test Training Job

docs/guides/run_maxtext/run_maxtext_via_multihost_job.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,12 +59,12 @@ either be a TPUVM or not. If your runner machine is a TPUVM, it needs service ac
5959
```
6060
6161
Choose the number of nodes (we use 2 below, but you may customize this and other feature of your TPU(s))
62-
```
62+
```sh
6363
NODE_COUNT=2
6464
```
65-
```
65+
```sh
6666
RUN_NAME=$YOUR_JOB_NAME # You may set this to any unique name for a fresh run.
67-
python3 multihost_job.py --NUM_SLICES=$NODE_COUNT --RUN_NAME=$RUN_NAME --BUCKET_NAME=$BUCKET_NAME --CQR_EXTRA_ARGS="--reserved" --COMMAND="bash setup.sh && python3 -m MaxText.train src/MaxText/configs/base.yml run_name=$RUN_NAME"
67+
python3 multihost_job.py --NUM_SLICES=$NODE_COUNT --RUN_NAME=$RUN_NAME --BUCKET_NAME=$BUCKET_NAME --CQR_EXTRA_ARGS="--reserved" --COMMAND="bash tools/setup/setup.sh && python3 -m MaxText.train src/MaxText/configs/base.yml run_name=$RUN_NAME"
6868
```
6969
7070
We tell `multihost_job` to target the `reserved` pool by by including `--reserved` as extra arguments to the CQR request, but you may instead target the `on-demand` pool by removing the `--CQR_EXTRA_ARGS` flag (on-demand is default), or the pre-emptible pool with `--CQR_EXTRA_ARGS="--best-effort"`, which may be necessary if your reservation is full.

docs/guides/run_maxtext/run_maxtext_via_multihost_runner.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ either be a TPUVM or not, but it cannot be one of the workers. If your runner ma
7575
4. **Install dependencies.**
7676
Install the dependencies of `train.py` on each worker using `multihost_runner.py`:
7777
```
78-
python3 multihost_runner.py --TPU_PREFIX=$TPU_PREFIX --COMMAND="bash setup.sh"
78+
python3 multihost_runner.py --TPU_PREFIX=$TPU_PREFIX --COMMAND="bash tools/setup/setup.sh"
7979
```
8080
If you are running the `multihost_runner.py` script from a TPUVM, you will need to set `--INTERNAL_IP=true`.
8181

docs/guides/run_maxtext/run_maxtext_via_pathways.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,14 @@ Before you can run a MaxText workload, you must complete the following setup ste
3636
```bash
3737
# Step 1: Build the Docker image for a TPU device
3838
# This image contains MaxText and its dependencies.
39-
bash docker_build_dependency_image.sh DEVICE=tpu MODE=jax_ai_image BASEIMAGE=us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latest
39+
bash dependencies/scripts/docker_build_dependency_image.sh DEVICE=tpu MODE=jax_ai_image BASEIMAGE=us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latest
4040

4141
# Step 2: Configure Docker to authenticate with Google Cloud
4242
gcloud auth configure-docker
4343

4444
# Step 3: Upload the image to your project's registry
4545
# Replace `$USER_runner` with your desired image name.
46-
bash docker_upload_runner.sh CLOUD_IMAGE_NAME=$USER_runner
46+
bash dependencies/scripts/docker_upload_runner.sh CLOUD_IMAGE_NAME=$USER_runner
4747
```
4848

4949
## 2. Environment configuration

0 commit comments

Comments
 (0)