Beef up documentation (#134)

jackzhxng · web-flow · commit ae2d32ae2fba · 2025-09-23T22:18:24.000-04:00
diff --git a/CONTRIBUTING.MD b/CONTRIBUTING.MD
@@ -0,0 +1,79 @@
+Thank you for your interest in contributing to Optimum ExecuTorch!
+
+## Developing Optimum ExecuTorch
+
+### Setting up the development environment
+To install Optimum ExecuTorch for development:
+```
+python install_dev.py
+```
+
+### Testing local chagnes
+Optimum ExecuTorch does not have an editable install at the moment, so to test your local changes, you will need to reinstall.
+To prevent the reinstall from overwriting other dependencies, some of which you may have modified, you can run the following ahead of your test:
+```
+pip install --no-deps --no-build-isolation .
+```
+
+An example command for testing local changes to Gemma3:
+```
+pip install --no-deps --no-build-isolation .
+RUN_SLOW=1 python -m pytest tests/models/test_modeling_gemma3.py -s -k test_gemma3_image_vision_with_custom_sdpa_kv_cache_8da4w_8we --log-cli-level=INFO
+```
+
+To run tests marked with `@slow`, just set `RUN_SLOW=1`.
+
+## Enabling a new model on Optimum
+
+Our design philsophy is to have as little model-specific code as possible, which means all optimizations, export code, etc. are model-agnostic.
+This allows us to theoretically export any new model straight from the source, with a few caveats which will be explained later.
+For example, most Large Language Models should be able to be exported using this library.
+
+### 💡 How to "enable" a model on Optimum
+❓ Currently, the [homepage README](README.md?tab=readme-ov-file#-supported-models) lists all of the "supported" models. What does this mean, and what about models not on this list?
+
+👉 These supported models all have a test file associated with them, such as [Gemma3](https://github.com/huggingface/optimum-executorch/blob/main/tests/models/test_modeling_gemma3.py), which has been used to validate the E2E of the model (export + run generation loop on exported artifact).
+The test file is then used in CI to guard against potential regressions.
+Once you have a PR up for adding the test to the repo, feel free to edit the homepage README to include the new model.
+
+As an example, in the Gemma3 test file, we have validated that the model is able to export and returns correct output to a test prompt for different export configurations - now other users will know that Gemma3 works and are able to export the model like so:
+```
+optimum-cli export executorch \
+  --model google/gemma-3-1b-it \
+  --task text-generation \
+  --recipe xnnpack \
+  --use_custom_sdpa \
+  --use_custom_kv_cache \
+  --qlinear 8da4w \
+  --qembedding 8w
+```
+
+However, there are many models without test files in Optimum that probably still work - just that no one has went through the trouble of validating them.
+This is where you come in - feel free to contribute if there is a model you are interested in that does not yet have a test file!
+
+If you run into any issues, they will most likely stem from the following:
+- ❓ How much model-specific code is in Transformers for this model?
+- ❓ Do we already have the model type supported in Optimum?
+- ❓ Is the model itself torch.exportable?
+
+### ❌ Model-specific code is in Transformers
+To address this issue, we will need to upstream changes to the Transformers library, or update our code to match.
+For instance, if hypothetically Transformers introduced a new type of cache, and this cache is used in a new LLM, we would need to handle this new cache type in Optimum.
+Or, hypothetically if we are expecting a certain attribute in a Transformers model and it exists instead with a slighly different name, this may be an opportunity to upstream some naming standardization changes to Transformers.
+[Here](https://github.com/huggingface/transformers/pull/40919) is an example of one such standardization.
+
+### ❌ Model type is not supported in Optimum
+All of the supported model types are in [integrations.py](https://github.com/huggingface/optimum-executorch/blob/main/optimum/exporters/executorch/integrations.py), which contains wrapper classes that facilitate torch.exporting a model:
+- `CausalLMExportableModule` - LLMs (Large Language Models)
+- `MultiModalTextToTextExportableModule` - Multimodal LLMs (Large Language Models with support for audio/image input)
+- `VisionEncoderExportableModule` - Vision Encoder backbones (such as DiT or MobileViT)
+- `MaskedLMExportableModule` - Masked language models (for predicting masked characters)
+- `Seq2SeqLMExportableModule` - General Seq2Seq encoder-decoder models (such as T5 and Whisper)
+
+This is where most of the complexity around "enabling" a model on Optimum arises from, since post torch.export() every model follows the same flow per backend for transforming the torch.export() artifact into an Excecutorch `.pte` artifact.
+If the model type doesn't exist in Optimum then we will need to write a new class for it.
+
+### ❌ Model is not torch.exportable
+To address this issue, we will need to upstream changes to the model's modeling file in Transformers to make the model exportable.
+After doing this, it's a good idea to add a torch.export test to guard against future regressions (which tend to happen frequently since Transformers moves fast).
+[Here](https://github.com/huggingface/transformers/blob/87f38dbfcec48027d4bf2ea7ec8b8eecd5a7bc85/tests/models/smollm3/test_modeling_smollm3.py#L175) is an example.
diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@
 
 **Optimize and deploy Hugging Face models with ExecuTorch**
 
-[Documentation](https://huggingface.co/docs/optimum/index) | [ExecuTorch](https://github.com/pytorch/executorch) | [Hugging Face](https://huggingface.co/)
+[Documentation](https://huggingface.co/docs/optimum-executorch/en/index) | [ExecuTorch](https://github.com/pytorch/executorch) | [Hugging Face](https://huggingface.co/)
 
 </div>
 
@@ -94,7 +94,8 @@ optimum-cli export executorch \
     --qembedding 8w \
     --output_dir="hf_smollm2"
 ```
-Explore the various export options by running the command: `optimum-cli export executorch --help`
+Explore the various export options by running the command: `optimum-cli export executorch --help`.
+To read more about how to export different types of models on Optimum ExecuTorch, please revert to the export [README](optimum/exporters/executorch/README.md).
 
 #### Step 2: Validate the Exported Model on Host Using the Python API
 Use the exported model for text generation:
@@ -187,6 +188,7 @@ We currently support a wide range of popular transformer models, including encod
 - [Whisper](https://huggingface.co/openai/whisper-tiny): OpenAI's `Whisper` and its variants
 
 #### Speech text-to-text (Automatic Speech Recognition)
+- 💡[**NEW**] [Granite Speech](https://huggingface.co/ibm-granite/granite-speech-3.3-2b): `granite-speech-3.3-2b` and its variants
 - 💡[**NEW**] [Voxtral](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507): Mistral's newest speech/text-to-text model
 
 *📌 Note: This list is continuously expanding. As we continue to expand support, more models will be added.*
diff --git a/optimum/exporters/executorch/README.md b/optimum/exporters/executorch/README.md
@@ -0,0 +1,85 @@
+# Exporting Transformers Models to ExecuTorch
+
+Optimum ExecuTorch enables exporting models from Transformers to ExecuTorch.
+The models supported by Optimum ExecuTorch are listed [here](../../../README.md#-supported-models).
+
+### LLMs (Large Language Models)
+LLMs can be exported using the `text-generation` task like so:
+```
+optimum-cli export executorch \
+  --model <model-id> \
+  --task text-generation \
+  --recipe xnnpack \
+  --use_custom_sdpa \
+  --use_custom_kv_cache \
+  --qlinear 8da4w \
+  --qembedding 8w
+  ...etc...
+```
+
+The export will produce a `.pte` with a single forward method for the decoder: `model`.
+
+Note that most of the arguments here are only applicable to LLMs (multimodal included):
+```
+--use_custom_sdpa \
+--use_custom_kv_cache \
+--qlinear 8da4w \
+--qembedding 8w
+```
+
+### Multimodal LLMs
+Multimodal LLMs can be exported using the `multimodal-text-to-text` task like so:
+```
+optimum-cli export executorch \
+  --model mistralai/Voxtral-Mini-3B-2507 \
+  --task multimodal-text-to-text \
+  --recipe xnnpack \
+  --use_custom_sdpa \
+  --use_custom_kv_cache \
+  --qlinear 8da4w \
+  --qembedding 8w
+  ...etc...
+```
+
+The export will produce a `.pte` with the following methods:
+- `text_decoder`: the text decoder or language model backbone
+- `audio_encoder` or `vision_encoder`: the encoder which feeds into the decoder
+- `token_embedding`: the embedding layer of the language model backbone
+  -  This is needed in order to cleanly separate the entire multimodal model into subgraphs. The text decoder subgraph will take in token embeddings, so multimodal input will be processed into embeddings by the encoder while text input will be processed into embeddings by this method.
+
+### Seq2Seq
+Seq2Seq models can be exported using the `text2text-generation` task like so:
+```
+optimum-cli export executorch \
+  --model google-t5/t5-small \
+  --task text2text-generation \
+  --recipe xnnpack
+```
+
+The export will produce a `.pte` with the following methods:
+- `text_decoder`: the decoder half of the Seq2Seq model
+- `encoder`: the encoder half of the Seq2Seq model. This encoder can support a variety of modalities, such as text for T5 and audio for Whisper.
+
+### Image classification
+Image classification models can be exported using the `image-classification` task like so:
+```
+optimum-cli export executorch \
+  --model google/vit-base-patch16-224 \
+  --task image-classification \
+  --recipe xnnpack
+```
+
+The export will produce a `.pte` with a single forward method for the decoder: `model`.
+
+### ASR (Automatic speech recognition)
+ASR is a special case of Seq2Seq that uses the base Seq2Seq exportable modules. It can be exported using the `automatic-speech-recognition` task like so:
+```
+optimum-cli export executorch \
+  --model openai/whisper-tiny \
+  --task automatic-speech-recognition \
+  --recipe xnnpack
+```
+
+The export will produce a `.pte` with the following methods:
+- `text_decoder`: the decoder half of the Seq2Seq model
+- `encoder`: the encoder half of the Seq2Seq model.