Merge pull request #2361 from SamuelMarks:v2-layout2

Google-ML-Automation · Google-ML-Automation · commit 8d9588ae0536 · 2025-11-04T15:56:32.000-08:00
PiperOrigin-RevId: 828168144
diff --git a/.readthedocs.yml b/.readthedocs.yml
@@ -21,4 +21,4 @@ sphinx:
 # See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
 python:
   install:
-    - requirements: requirements_docs.txt
+    - requirements: dependencies/requirements/requirements_docs.txt
diff --git a/PREFLIGHT.md b/PREFLIGHT.md
@@ -26,7 +26,7 @@ bash preflight.sh PLATFORM=GCE && numactl --membind 0 --cpunodebind=0 python3 -m
 ```
 
 For GKE,
-`numactl` should be built into your docker image from [maxtext_dependencies.Dockerfile](https://github.com/google/maxtext/blob/main/maxtext_dependencies.Dockerfile), so you can use it directly if you built the maxtext docker image. Here is an example
+`numactl` should be built into your docker image from [maxtext_dependencies.Dockerfile](https://github.com/google/maxtext/blob/main/dependencies/dockerfiles/maxtext_dependencies.Dockerfile), so you can use it directly if you built the maxtext docker image. Here is an example
 
 ```
 bash preflight.sh PLATFORM=GKE && numactl --membind 0 --cpunodebind=0 python3 -m MaxText.train src/MaxText/configs/base.yml run_name=$YOUR_JOB_NAME
diff --git a/docs/development.md b/docs/development.md
@@ -12,7 +12,7 @@ If you are writing documentation for MaxText, you may want to preview the docume
 First, make sure you install the necessary dependencies. You can do this by navigating to your local clone of the MaxText repo and running:
 
 ```bash
-pip install -r requirements_docs.txt
+pip install -r dependencies/requirements/requirements_docs.txt
 ```
 
 Once the dependencies are installed, you can navigate to the `docs/` folder and run:
diff --git a/docs/guides/data_input_pipeline/data_input_grain.md b/docs/guides/data_input_pipeline/data_input_grain.md
@@ -29,17 +29,17 @@ Grain ensures determinism in data input pipelines by saving the pipeline's state
 
 ## Using Grain
 1. Grain currently supports two data formats: [ArrayRecord](https://github.com/google/array_record) (random access) and [Parquet](https://arrow.apache.org/docs/python/parquet.html) (partial random-access through row groups). Only the ArrayRecord format supports the global shuffle mentioned above. For converting a dataset into ArrayRecord, see [Apache Beam Integration for ArrayRecord](https://github.com/google/array_record/tree/main/beam). Additionally, other random access data sources can be supported via a custom [data source](https://google-grain.readthedocs.io/en/latest/data_sources.html) class.
-2. When the dataset is hosted on a Cloud Storage bucket, Grain can read it through [Cloud Storage FUSE](https://cloud.google.com/storage/docs/gcs-fuse). The installation of Cloud Storage FUSE is included in [setup.sh](https://github.com/google/maxtext/blob/main/setup.sh). The user then needs to mount the Cloud Storage bucket to a local path for each worker, using the script [setup_gcsfuse.sh](https://github.com/google/maxtext/blob/main/setup_gcsfuse.sh). The script configures some parameters for the mount.
-```
-bash setup_gcsfuse.sh \
+2. When the dataset is hosted on a Cloud Storage bucket, Grain can read it through [Cloud Storage FUSE](https://cloud.google.com/storage/docs/gcs-fuse). The installation of Cloud Storage FUSE is included in [setup.sh](https://github.com/google/maxtext/blob/main/tools/setup/setup.sh). The user then needs to mount the Cloud Storage bucket to a local path for each worker, using the script [setup_gcsfuse.sh](https://github.com/google/maxtext/blob/main/tools/setup/setup_gcsfuse.sh). The script configures some parameters for the mount.
+```sh
+bash tools/setup/setup_gcsfuse.sh \
 DATASET_GCS_BUCKET=$BUCKET_NAME \
 MOUNT_PATH=$MOUNT_PATH \
 [FILE_PATH=$MOUNT_PATH/my_dataset]
 # FILE_PATH is optional, when provided, the script runs "ls -R" for pre-filling the metadata cache
 # https://cloud.google.com/storage/docs/cloud-storage-fuse/performance#improve-first-time-reads
 ```
 3. Set `dataset_type=grain`, `grain_file_type={arrayrecord|parquet}`, `grain_train_files` to match the file pattern on the mounted local path.
-4. Tune `grain_worker_count` for performance. This parameter controls the number of child processes used by Grain (more details in [behind_the_scenes](https://google-grain.readthedocs.io/en/latest/behind_the_scenes.html), [grain_pool.py](https://github.com/google/grain/blob/main/grain/_src/python/grain_pool.py)). If you use a large number of workers, check your config for gcsfuse in [setup_gcsfuse.sh](https://github.com/google/maxtext/blob/main/setup_gcsfuse.sh) to avoid gcsfuse throttling.
+4. Tune `grain_worker_count` for performance. This parameter controls the number of child processes used by Grain (more details in [behind_the_scenes](https://google-grain.readthedocs.io/en/latest/behind_the_scenes.html), [grain_pool.py](https://github.com/google/grain/blob/main/grain/_src/python/grain_pool.py)). If you use a large number of workers, check your config for gcsfuse in [setup_gcsfuse.sh](https://github.com/google/maxtext/blob/main/tools/setup/setup_gcsfuse.sh) to avoid gcsfuse throttling.
 
 5. For multi-source blending, you can specify multiple data sources with their respective weights using semicolon (;) as a separator and colon (:) for weights. The weights will be automatically normalized to sum to 1.0. For example:
 ```
@@ -52,8 +52,8 @@ grain_train_files=/tmp/gcsfuse/dataset1.array_record*:1;/tmp/gcsfuse/dataset2.ar
 Note: When using multiple data sources, only the ArrayRecord format is supported.
 
 6. Example command:
-```
-bash setup_gcsfuse.sh \
+```sh
+bash tools/setup/setup_gcsfuse.sh \
 DATASET_GCS_BUCKET=maxtext-dataset \
 MOUNT_PATH=/tmp/gcsfuse && \
 python3 -m MaxText.train src/MaxText/configs/base.yml \
diff --git a/docs/guides/data_input_pipeline/data_input_tfds.md b/docs/guides/data_input_pipeline/data_input_tfds.md
@@ -1,8 +1,8 @@
 # TFDS pipeline
 
 1. Download the Allenai C4 dataset in TFRecord format to a Cloud Storage bucket. For information about cost, see [this discussion](https://github.com/allenai/allennlp/discussions/5056)
-```
-bash download_dataset.sh {GCS_PROJECT} {GCS_BUCKET_NAME}
+```sh
+bash tools/data_generation/download_dataset.sh ${GCS_PROJECT} ${GCS_BUCKET_NAME}
 ```
 2. In `src/MaxText/configs/base.yml` or through command line, set the following parameters:
 ```yaml
diff --git a/docs/guides/knowledge_distillation.md b/docs/guides/knowledge_distillation.md
@@ -47,12 +47,13 @@ export RUN_NAME = <unique name for the run>
 
 #### b. Install dependencies
 
-```
+```sh
 git clone https://github.com/AI-Hypercomputer/maxtext.git
 python3 -m venv ~/venv-maxtext
 source ~/venv-maxtext/bin/activate
+python3 -m pip install uv
 cd maxtext
-uv pip install -r requirements.txt
+uv pip install -r dependencies/requirements/requirements.txt
 ```
 
 ### 1. Obtain and prepare the teacher model
diff --git a/docs/guides/run_maxtext/run_maxtext_localhost.md b/docs/guides/run_maxtext/run_maxtext_localhost.md
@@ -43,7 +43,7 @@ Within the root directory of the cloned repo, create a virtual environment and i
 ```bash
 python3.12 -m venv ~/venv-maxtext
 source ~/venv-maxtext/bin/activate
-bash setup.sh DEVICE={tpu|gpu}
+bash tools/setup/setup.sh DEVICE={tpu|gpu}
 ```
 
 #### Run a Test Training Job
diff --git a/docs/guides/run_maxtext/run_maxtext_via_multihost_job.md b/docs/guides/run_maxtext/run_maxtext_via_multihost_job.md
@@ -59,12 +59,12 @@ either be a TPUVM or not. If your runner machine is a TPUVM, it needs service ac
     ```
 
     Choose the number of nodes (we use 2 below, but you may customize this and other feature of your TPU(s))
-    ```
+    ```sh
     NODE_COUNT=2
     ```
-    ```
+    ```sh
     RUN_NAME=$YOUR_JOB_NAME # You may set this to any unique name for a fresh run.
-    python3 multihost_job.py --NUM_SLICES=$NODE_COUNT --RUN_NAME=$RUN_NAME --BUCKET_NAME=$BUCKET_NAME --CQR_EXTRA_ARGS="--reserved" --COMMAND="bash setup.sh && python3 -m MaxText.train src/MaxText/configs/base.yml run_name=$RUN_NAME"
+    python3 multihost_job.py --NUM_SLICES=$NODE_COUNT --RUN_NAME=$RUN_NAME --BUCKET_NAME=$BUCKET_NAME --CQR_EXTRA_ARGS="--reserved" --COMMAND="bash tools/setup/setup.sh && python3 -m MaxText.train src/MaxText/configs/base.yml run_name=$RUN_NAME"
     ```
 
     We tell `multihost_job` to target the `reserved` pool by  by including `--reserved` as extra arguments to the CQR request, but you may instead target the `on-demand` pool by removing the `--CQR_EXTRA_ARGS` flag (on-demand is default), or the pre-emptible pool with `--CQR_EXTRA_ARGS="--best-effort"`, which may be necessary if your reservation is full.
diff --git a/docs/guides/run_maxtext/run_maxtext_via_multihost_runner.md b/docs/guides/run_maxtext/run_maxtext_via_multihost_runner.md
@@ -75,7 +75,7 @@ either be a TPUVM or not, but it cannot be one of the workers. If your runner ma
 4. **Install dependencies.**
     Install the dependencies of `train.py` on each worker using `multihost_runner.py`:
     ```
-    python3 multihost_runner.py --TPU_PREFIX=$TPU_PREFIX --COMMAND="bash setup.sh"
+    python3 multihost_runner.py --TPU_PREFIX=$TPU_PREFIX --COMMAND="bash tools/setup/setup.sh"
     ```
     If you are running the `multihost_runner.py` script from a TPUVM, you will need to set `--INTERNAL_IP=true`.
 
diff --git a/docs/guides/run_maxtext/run_maxtext_via_pathways.md b/docs/guides/run_maxtext/run_maxtext_via_pathways.md
@@ -36,14 +36,14 @@ Before you can run a MaxText workload, you must complete the following setup ste
     ```bash
     # Step 1: Build the Docker image for a TPU device
     # This image contains MaxText and its dependencies.
-    bash docker_build_dependency_image.sh DEVICE=tpu MODE=jax_ai_image BASEIMAGE=us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latest
+    bash dependencies/scripts/docker_build_dependency_image.sh DEVICE=tpu MODE=jax_ai_image BASEIMAGE=us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latest
 
     # Step 2: Configure Docker to authenticate with Google Cloud
     gcloud auth configure-docker
 
     # Step 3: Upload the image to your project's registry
     # Replace `$USER_runner` with your desired image name.
-    bash docker_upload_runner.sh CLOUD_IMAGE_NAME=$USER_runner
+    bash dependencies/scripts/docker_upload_runner.sh CLOUD_IMAGE_NAME=$USER_runner
     ```
 
 ## 2. Environment configuration
diff --git a/docs/tutorials/first_run.md b/docs/tutorials/first_run.md
@@ -37,7 +37,7 @@ multiple hosts but is a good way to learn about MaxText.
 ```sh
 python3 -m venv ~/venv-maxtext
 source ~/venv-maxtext/bin/activate
-bash setup.sh
+bash tools/setup/setup.sh
 pre-commit install
 ```
 4. After installation completes, run training on synthetic data with the following command:
@@ -67,7 +67,7 @@ In the same TPU VM where you just installed all the dependencies of MaxText, You
 You can use [demo_decoding.ipynb](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/examples/demo_decoding.ipynb) to try out decoding on MaxText's `Llama3.1-8b` model implementation. In this notebook, we give `"I love to"` as the prompt, and the greedily sampled first output token is `" cook"`. Please remember to provide the path to your `Llama3.1-8b` checkpoint for the `load_parameters_path` argument in the config inside the notebook. You can use [to_maxtext.py](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/utils/ckpt_conversion/to_maxtext.py) to create a MaxText/Orbax checkpoint from a Huggingface checkpoint.
 
 ### Run MaxText on NVIDIA GPUs
-1. Use `bash docker_build_dependency_image.sh DEVICE=gpu` to build a container with the required dependencies.
+1. Use `bash dependencies/scripts/docker_build_dependency_image.sh DEVICE=gpu` to build a container with the required dependencies.
 2. After installation is complete, run training with the following command on synthetic data:
 ```sh
 python3 -m MaxText.train src/MaxText/configs/base.yml \
diff --git a/docs/tutorials/full_finetuning.md b/docs/tutorials/full_finetuning.md
@@ -92,7 +92,7 @@ You need to run these steps once per project prior to any local development or c
 MaxText assumes these GCS buckets are created in the same project and that it has permissions to read and write from them:
 
 ```bash
-bash download_dataset.sh {GCS_PROJECT} {GCS_BUCKET_NAME}
+bash tools/data_generation/download_dataset.sh ${GCS_PROJECT?} ${GCS_BUCKET_NAME?}
 ```
 
 The above will download the c4 dataset to your GCS BUCKET.
diff --git a/docs/tutorials/grpo.md b/docs/tutorials/grpo.md
@@ -28,11 +28,11 @@ In this tutorial we use a single host TPUVM such as `v6e-8/v5p-8`. Let's get sta
 ## Setup your virtual environment
 
 ### Create a Python3.12 venv if not already pre-existing and install MaxText dependencies
-```
-bash setup.sh
+```sh
+bash tools/setup/setup.sh
 ```
 
-### Activate your virtual environment (Skip if you have already done this for running `bash setup.sh` )
+### Activate your virtual environment (Skip if you have already done this for running `bash tools/setup/setup.sh` )
 ```
 # Replace with your virtual environment name if not using this default name
 venv_name="maxtext_venv"
diff --git a/docs/tutorials/sft.md b/docs/tutorials/sft.md
@@ -26,8 +26,8 @@ In this tutorial we use a single host TPU VM such as `v6e-8/v5p-8`. Let's get st
 ## Setup virtual environment
 
 ### Create a Python3.12 virtual environment
-```
-bash setup.sh
+```sh
+bash tools/setup/setup.sh
 ```
 
 ### Activate virtual environment
@@ -39,7 +39,7 @@ source ~/$venv_name/bin/activate
 
 ### Install MaxText dependencies
 ```
-bash setup.sh
+bash tools/setup/setup.sh
 ```
 
 ## Run SFT
diff --git a/docs/tutorials/sft_on_multi_host.md b/docs/tutorials/sft_on_multi_host.md
@@ -28,15 +28,15 @@ cd maxtext
 
 ### 1.2. Build MaxText Docker image
 ```bash
-bash docker_build_dependency_image.sh MODE=jax_ai_image BASEIMAGE=us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latest
+bash dependencies/scripts/docker_build_dependency_image.sh MODE=jax_ai_image BASEIMAGE=us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latest
 ```
 This creates a local Docker image named `maxtext_base_image`.
 
 ### 1.3. Upload the Docker image to Artifact Registry
 ```bash
 # Replace `$USER_runner` with your desired image name
 export DOCKER_IMAGE_NAME=${USER}_runner
-bash docker_upload_runner.sh CLOUD_IMAGE_NAME=$DOCKER_IMAGE_NAME
+bash dependencies/scripts/docker_upload_runner.sh CLOUD_IMAGE_NAME=$DOCKER_IMAGE_NAME
 ```
 The `docker_upload_runner.sh` script uploads your Docker image to Artifact Registry.
 
diff --git a/end_to_end/gpu/a3/test_convergence_125m_params.sh b/end_to_end/gpu/a3/test_convergence_125m_params.sh
@@ -34,7 +34,7 @@ then
     EVAL_METRICS=grain_checkpoint_save_restore
     echo "Using c4-array_record dataset type"
     echo "Mounting $DATASET_PATH to /tmp/gcsfuse/"
-    bash setup_gcsfuse.sh DATASET_GCS_BUCKET=$DATASET_PATH MOUNT_PATH=/tmp/gcsfuse/
+    bash tools/setup/setup_gcsfuse.sh DATASET_GCS_BUCKET=$DATASET_PATH MOUNT_PATH=/tmp/gcsfuse/
     DATASET_PATH=/tmp/gcsfuse/
     CMD_DATA=" dataset_type=c4-array_record dataset_name=array-record/c4/en/3.0.1 eval_dataset_name=array-record/c4/en/3.0.1"
 fi
diff --git a/end_to_end/gpu/a3/test_convergence_1b_params.sh b/end_to_end/gpu/a3/test_convergence_1b_params.sh
@@ -34,7 +34,7 @@ then
     EVAL_METRICS=grain_checkpoint_save_restore
     echo "Using c4-array_record dataset type"
     echo "Mounting $DATASET_PATH to /tmp/gcsfuse/"
-    bash setup_gcsfuse.sh DATASET_GCS_BUCKET=$DATASET_PATH MOUNT_PATH=/tmp/gcsfuse/
+    bash tools/setup/setup_gcsfuse.sh DATASET_GCS_BUCKET=$DATASET_PATH MOUNT_PATH=/tmp/gcsfuse/
     DATASET_PATH=/tmp/gcsfuse/
     CMD_DATA=" dataset_type=c4-array_record dataset_name=array-record/c4/en/3.0.1 eval_dataset_name=array-record/c4/en/3.0.1"
 fi
diff --git a/end_to_end/test_checkpoint_compatibility.sh b/end_to_end/test_checkpoint_compatibility.sh
@@ -16,7 +16,7 @@ fi
 model_params=" base_emb_dim=384 base_num_query_heads=8 base_num_kv_heads=8 base_mlp_dim=192 base_num_decoder_layers=8 head_dim=128"
 
 echo "Mounting $DATASET_PATH to /tmp/gcsfuse/"
-bash setup_gcsfuse.sh DATASET_GCS_BUCKET=$DATASET_PATH MOUNT_PATH=/tmp/gcsfuse/
+bash tools/setup/setup_gcsfuse.sh DATASET_GCS_BUCKET=$DATASET_PATH MOUNT_PATH=/tmp/gcsfuse/
 
 echo "Run_1: Starting the first run using the grain input pipeline"
 
diff --git a/end_to_end/test_checkpointing.sh b/end_to_end/test_checkpointing.sh
@@ -30,7 +30,7 @@ then
     eval_metrics=grain_checkpoint_save_restore
     echo "Using grain dataset type"
     echo "Mounting $DATASET_PATH to /tmp/gcsfuse/"
-    bash setup_gcsfuse.sh DATASET_GCS_BUCKET=$DATASET_PATH MOUNT_PATH=/tmp/gcsfuse/
+    bash tools/setup/setup_gcsfuse.sh DATASET_GCS_BUCKET=$DATASET_PATH MOUNT_PATH=/tmp/gcsfuse/
     DATASET_PATH=/tmp/gcsfuse/
     CMD_DATA=" grain_worker_count=0 dataset_type=grain grain_train_files=/tmp/gcsfuse/array-record/c4/en/3.0.1/c4-train.array_record*"
 fi
diff --git a/end_to_end/tpu/test_convergence_1b_params.sh b/end_to_end/tpu/test_convergence_1b_params.sh
@@ -38,7 +38,7 @@ then
     EVAL_METRICS=grain_checkpoint_save_restore
     echo "dataset_type is grain"
     echo "Mounting $DATASET_PATH to /tmp/gcsfuse/"
-    bash setup_gcsfuse.sh DATASET_GCS_BUCKET=$DATASET_PATH MOUNT_PATH=/tmp/gcsfuse/
+    bash tools/setup/setup_gcsfuse.sh DATASET_GCS_BUCKET=$DATASET_PATH MOUNT_PATH=/tmp/gcsfuse/
     DATASET_PATH=/tmp/gcsfuse/
     CMD_DATA=" grain_worker_count=2 \
             grain_train_files=/tmp/gcsfuse/array-record/c4/en/3.0.1/c4-train.array_record* \
diff --git a/end_to_end/tpu/test_determinism.sh b/end_to_end/tpu/test_determinism.sh
@@ -11,7 +11,7 @@ then
     EVAL_METRICS=grain_checkpoint_save_restore
     echo "Using grain dataset type"
     echo "Mounting $DATASET_PATH to /tmp/gcsfuse/"
-    bash setup_gcsfuse.sh DATASET_GCS_BUCKET=$DATASET_PATH MOUNT_PATH=/tmp/gcsfuse/
+    bash tools/setup/setup_gcsfuse.sh DATASET_GCS_BUCKET=$DATASET_PATH MOUNT_PATH=/tmp/gcsfuse/
     DATASET_PATH=/tmp/gcsfuse/
     CMD_DATA=" dataset_type=grain grain_train_files=/tmp/gcsfuse/array-record/c4/en/3.0.1/c4-train.array_record*"
 fi
diff --git a/end_to_end/tpu/test_grpo.sh b/end_to_end/tpu/test_grpo.sh
@@ -1,21 +1,19 @@
 #!/bin/bash
 
-'
 # This script is designed for internal use within Google.
 # External users can update pre-trained model checkpoint GCS path (gs://) to your accessible locations.
 # Usage:
-  HF_TOKEN=<huggingface access token> \
-  MODEL=llama3.1-8b TOKENIZER=meta-llama/Llama-3.1-8B-Instruct \
-  NUM_SAMPLERS=2 DEVICES_PER_SAMPLER=8 \
-  TRAINING_PER_DEVICE_BATCH_SIZE=1 \
-  INFERENCE_PER_DEVICE_BATCH_SIZE=1 \
-  TRAINING_SUBSLICE=2,8 \
-  INFERENCE_SUBSLICE=2,8 \
-  MAX_PREFILL_LENGTH=128 \
-  MAX_TARGET_LENGTH=256 \
-  STEPS=20 \
-  bash end_to_end/tpu/test_grpo.sh
-'
+#  HF_TOKEN=<huggingface access token> \
+#  MODEL=llama3.1-8b TOKENIZER=meta-llama/Llama-3.1-8B-Instruct \
+#  NUM_SAMPLERS=2 DEVICES_PER_SAMPLER=8 \
+#  TRAINING_PER_DEVICE_BATCH_SIZE=1 \
+#  INFERENCE_PER_DEVICE_BATCH_SIZE=1 \
+#  TRAINING_SUBSLICE=2,8 \
+#  INFERENCE_SUBSLICE=2,8 \
+#  MAX_PREFILL_LENGTH=128 \
+#  MAX_TARGET_LENGTH=256 \
+#  STEPS=20 \
+#  bash end_to_end/tpu/test_grpo.sh
 
 set -xe
 
diff --git a/tests/integration_tests/checkpoint_compatibility_test.py b/tests/integration_tests/checkpoint_compatibility_test.py
@@ -21,7 +21,7 @@
 The tests confirm restoration by checking the starting step of the resumed runs.
 
 Note: Make sure to run
-  `bash setup_gcsfuse.sh DATASET_GCS_BUCKET=gs://maxtext-dataset MOUNT_PATH=/tmp/gcsfuse/`
+  `bash tools/setup/setup_gcsfuse.sh DATASET_GCS_BUCKET=gs://maxtext-dataset MOUNT_PATH=/tmp/gcsfuse/`
 before running tests locally.
 """
 
diff --git a/tests/integration_tests/checkpointing_test.py b/tests/integration_tests/checkpointing_test.py
@@ -20,7 +20,7 @@
 continue from that saved checkpoint.
 
 Note: Make sure to run
-  `bash setup_gcsfuse.sh DATASET_GCS_BUCKET=gs://maxtext-dataset MOUNT_PATH=/tmp/gcsfuse/`
+  `bash tools/setup/setup_gcsfuse.sh DATASET_GCS_BUCKET=gs://maxtext-dataset MOUNT_PATH=/tmp/gcsfuse/`
 before running tests locally.
 """
 
diff --git a/tests/integration_tests/standalone_dl_ckpt_test.py b/tests/integration_tests/standalone_dl_ckpt_test.py
@@ -15,8 +15,8 @@
 """ Tests for the standalone_checkpointer.py """
 import unittest
 import pytest
-from MaxText.standalone_checkpointer import main as sckpt_main
-from MaxText.standalone_dataloader import main as sdl_main
+from tools.gcs_benchmarks.standalone_checkpointer import main as sckpt_main
+from tools.gcs_benchmarks.standalone_dataloader import main as sdl_main
 from MaxText.globals import MAXTEXT_PKG_DIR, MAXTEXT_ASSETS_ROOT
 from datetime import datetime
 import random
diff --git a/tools/__init__.py b/tools/__init__.py
@@ -0,0 +1,13 @@
+# Copyright 2023–2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/tools/data_generation/__init__.py b/tools/data_generation/__init__.py
@@ -0,0 +1,13 @@
+# Copyright 2023–2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/tools/data_generation/download_dataset.sh b/tools/data_generation/download_dataset.sh
diff --git a/tools/data_generation/generate_distillation_data.py b/tools/data_generation/generate_distillation_data.py
diff --git a/tools/dev/code_style.sh b/tools/dev/code_style.sh
diff --git a/tools/dev/unit_test_and_lint.sh b/tools/dev/unit_test_and_lint.sh
diff --git a/tools/gcs_benchmarks/__init__.py b/tools/gcs_benchmarks/__init__.py
@@ -0,0 +1,13 @@
+# Copyright 2023–2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/tools/gcs_benchmarks/standalone_checkpointer.py b/tools/gcs_benchmarks/standalone_checkpointer.py
diff --git a/tools/gcs_benchmarks/standalone_dataloader.py b/tools/gcs_benchmarks/standalone_dataloader.py
diff --git a/tools/orchestration/__init__.py b/tools/orchestration/__init__.py
diff --git a/tools/orchestration/gpu_multi_process_run.sh b/tools/orchestration/gpu_multi_process_run.sh
diff --git a/tools/orchestration/multihost_job.py b/tools/orchestration/multihost_job.py
diff --git a/tools/orchestration/multihost_runner.py b/tools/orchestration/multihost_runner.py
diff --git a/tools/weight_inspector/__init__.py b/tools/weight_inspector/__init__.py
diff --git a/tools/weight_inspector/weight_inspector.py b/tools/weight_inspector/weight_inspector.py