Skip to content

Commit ae2d32a

Browse files
authored
Beef up documentation (#134)
1 parent 93393ff commit ae2d32a

File tree

3 files changed

+168
-2
lines changed

3 files changed

+168
-2
lines changed

CONTRIBUTING.MD

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
Thank you for your interest in contributing to Optimum ExecuTorch!
2+
3+
## Developing Optimum ExecuTorch
4+
5+
### Setting up the development environment
6+
To install Optimum ExecuTorch for development:
7+
```
8+
python install_dev.py
9+
```
10+
11+
### Testing local chagnes
12+
Optimum ExecuTorch does not have an editable install at the moment, so to test your local changes, you will need to reinstall.
13+
To prevent the reinstall from overwriting other dependencies, some of which you may have modified, you can run the following ahead of your test:
14+
```
15+
pip install --no-deps --no-build-isolation .
16+
```
17+
18+
An example command for testing local changes to Gemma3:
19+
```
20+
pip install --no-deps --no-build-isolation .
21+
RUN_SLOW=1 python -m pytest tests/models/test_modeling_gemma3.py -s -k test_gemma3_image_vision_with_custom_sdpa_kv_cache_8da4w_8we --log-cli-level=INFO
22+
```
23+
24+
To run tests marked with `@slow`, just set `RUN_SLOW=1`.
25+
26+
## Enabling a new model on Optimum
27+
28+
Our design philsophy is to have as little model-specific code as possible, which means all optimizations, export code, etc. are model-agnostic.
29+
This allows us to theoretically export any new model straight from the source, with a few caveats which will be explained later.
30+
For example, most Large Language Models should be able to be exported using this library.
31+
32+
### 💡 How to "enable" a model on Optimum
33+
❓ Currently, the [homepage README](README.md?tab=readme-ov-file#-supported-models) lists all of the "supported" models. What does this mean, and what about models not on this list?
34+
35+
👉 These supported models all have a test file associated with them, such as [Gemma3](https://github.com/huggingface/optimum-executorch/blob/main/tests/models/test_modeling_gemma3.py), which has been used to validate the E2E of the model (export + run generation loop on exported artifact).
36+
The test file is then used in CI to guard against potential regressions.
37+
Once you have a PR up for adding the test to the repo, feel free to edit the homepage README to include the new model.
38+
39+
As an example, in the Gemma3 test file, we have validated that the model is able to export and returns correct output to a test prompt for different export configurations - now other users will know that Gemma3 works and are able to export the model like so:
40+
```
41+
optimum-cli export executorch \
42+
--model google/gemma-3-1b-it \
43+
--task text-generation \
44+
--recipe xnnpack \
45+
--use_custom_sdpa \
46+
--use_custom_kv_cache \
47+
--qlinear 8da4w \
48+
--qembedding 8w
49+
```
50+
51+
However, there are many models without test files in Optimum that probably still work - just that no one has went through the trouble of validating them.
52+
This is where you come in - feel free to contribute if there is a model you are interested in that does not yet have a test file!
53+
54+
If you run into any issues, they will most likely stem from the following:
55+
- ❓ How much model-specific code is in Transformers for this model?
56+
- ❓ Do we already have the model type supported in Optimum?
57+
- ❓ Is the model itself torch.exportable?
58+
59+
### ❌ Model-specific code is in Transformers
60+
To address this issue, we will need to upstream changes to the Transformers library, or update our code to match.
61+
For instance, if hypothetically Transformers introduced a new type of cache, and this cache is used in a new LLM, we would need to handle this new cache type in Optimum.
62+
Or, hypothetically if we are expecting a certain attribute in a Transformers model and it exists instead with a slighly different name, this may be an opportunity to upstream some naming standardization changes to Transformers.
63+
[Here](https://github.com/huggingface/transformers/pull/40919) is an example of one such standardization.
64+
65+
### ❌ Model type is not supported in Optimum
66+
All of the supported model types are in [integrations.py](https://github.com/huggingface/optimum-executorch/blob/main/optimum/exporters/executorch/integrations.py), which contains wrapper classes that facilitate torch.exporting a model:
67+
- `CausalLMExportableModule` - LLMs (Large Language Models)
68+
- `MultiModalTextToTextExportableModule` - Multimodal LLMs (Large Language Models with support for audio/image input)
69+
- `VisionEncoderExportableModule` - Vision Encoder backbones (such as DiT or MobileViT)
70+
- `MaskedLMExportableModule` - Masked language models (for predicting masked characters)
71+
- `Seq2SeqLMExportableModule` - General Seq2Seq encoder-decoder models (such as T5 and Whisper)
72+
73+
This is where most of the complexity around "enabling" a model on Optimum arises from, since post torch.export() every model follows the same flow per backend for transforming the torch.export() artifact into an Excecutorch `.pte` artifact.
74+
If the model type doesn't exist in Optimum then we will need to write a new class for it.
75+
76+
### ❌ Model is not torch.exportable
77+
To address this issue, we will need to upstream changes to the model's modeling file in Transformers to make the model exportable.
78+
After doing this, it's a good idea to add a torch.export test to guard against future regressions (which tend to happen frequently since Transformers moves fast).
79+
[Here](https://github.com/huggingface/transformers/blob/87f38dbfcec48027d4bf2ea7ec8b8eecd5a7bc85/tests/models/smollm3/test_modeling_smollm3.py#L175) is an example.

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
**Optimize and deploy Hugging Face models with ExecuTorch**
88

9-
[Documentation](https://huggingface.co/docs/optimum/index) | [ExecuTorch](https://github.com/pytorch/executorch) | [Hugging Face](https://huggingface.co/)
9+
[Documentation](https://huggingface.co/docs/optimum-executorch/en/index) | [ExecuTorch](https://github.com/pytorch/executorch) | [Hugging Face](https://huggingface.co/)
1010

1111
</div>
1212

@@ -94,7 +94,8 @@ optimum-cli export executorch \
9494
--qembedding 8w \
9595
--output_dir="hf_smollm2"
9696
```
97-
Explore the various export options by running the command: `optimum-cli export executorch --help`
97+
Explore the various export options by running the command: `optimum-cli export executorch --help`.
98+
To read more about how to export different types of models on Optimum ExecuTorch, please revert to the export [README](optimum/exporters/executorch/README.md).
9899

99100
#### Step 2: Validate the Exported Model on Host Using the Python API
100101
Use the exported model for text generation:
@@ -187,6 +188,7 @@ We currently support a wide range of popular transformer models, including encod
187188
- [Whisper](https://huggingface.co/openai/whisper-tiny): OpenAI's `Whisper` and its variants
188189

189190
#### Speech text-to-text (Automatic Speech Recognition)
191+
- 💡[**NEW**] [Granite Speech](https://huggingface.co/ibm-granite/granite-speech-3.3-2b): `granite-speech-3.3-2b` and its variants
190192
- 💡[**NEW**] [Voxtral](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507): Mistral's newest speech/text-to-text model
191193

192194
*📌 Note: This list is continuously expanding. As we continue to expand support, more models will be added.*
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Exporting Transformers Models to ExecuTorch
2+
3+
Optimum ExecuTorch enables exporting models from Transformers to ExecuTorch.
4+
The models supported by Optimum ExecuTorch are listed [here](../../../README.md#-supported-models).
5+
6+
### LLMs (Large Language Models)
7+
LLMs can be exported using the `text-generation` task like so:
8+
```
9+
optimum-cli export executorch \
10+
--model <model-id> \
11+
--task text-generation \
12+
--recipe xnnpack \
13+
--use_custom_sdpa \
14+
--use_custom_kv_cache \
15+
--qlinear 8da4w \
16+
--qembedding 8w
17+
...etc...
18+
```
19+
20+
The export will produce a `.pte` with a single forward method for the decoder: `model`.
21+
22+
Note that most of the arguments here are only applicable to LLMs (multimodal included):
23+
```
24+
--use_custom_sdpa \
25+
--use_custom_kv_cache \
26+
--qlinear 8da4w \
27+
--qembedding 8w
28+
```
29+
30+
### Multimodal LLMs
31+
Multimodal LLMs can be exported using the `multimodal-text-to-text` task like so:
32+
```
33+
optimum-cli export executorch \
34+
--model mistralai/Voxtral-Mini-3B-2507 \
35+
--task multimodal-text-to-text \
36+
--recipe xnnpack \
37+
--use_custom_sdpa \
38+
--use_custom_kv_cache \
39+
--qlinear 8da4w \
40+
--qembedding 8w
41+
...etc...
42+
```
43+
44+
The export will produce a `.pte` with the following methods:
45+
- `text_decoder`: the text decoder or language model backbone
46+
- `audio_encoder` or `vision_encoder`: the encoder which feeds into the decoder
47+
- `token_embedding`: the embedding layer of the language model backbone
48+
- This is needed in order to cleanly separate the entire multimodal model into subgraphs. The text decoder subgraph will take in token embeddings, so multimodal input will be processed into embeddings by the encoder while text input will be processed into embeddings by this method.
49+
50+
### Seq2Seq
51+
Seq2Seq models can be exported using the `text2text-generation` task like so:
52+
```
53+
optimum-cli export executorch \
54+
--model google-t5/t5-small \
55+
--task text2text-generation \
56+
--recipe xnnpack
57+
```
58+
59+
The export will produce a `.pte` with the following methods:
60+
- `text_decoder`: the decoder half of the Seq2Seq model
61+
- `encoder`: the encoder half of the Seq2Seq model. This encoder can support a variety of modalities, such as text for T5 and audio for Whisper.
62+
63+
### Image classification
64+
Image classification models can be exported using the `image-classification` task like so:
65+
```
66+
optimum-cli export executorch \
67+
--model google/vit-base-patch16-224 \
68+
--task image-classification \
69+
--recipe xnnpack
70+
```
71+
72+
The export will produce a `.pte` with a single forward method for the decoder: `model`.
73+
74+
### ASR (Automatic speech recognition)
75+
ASR is a special case of Seq2Seq that uses the base Seq2Seq exportable modules. It can be exported using the `automatic-speech-recognition` task like so:
76+
```
77+
optimum-cli export executorch \
78+
--model openai/whisper-tiny \
79+
--task automatic-speech-recognition \
80+
--recipe xnnpack
81+
```
82+
83+
The export will produce a `.pte` with the following methods:
84+
- `text_decoder`: the decoder half of the Seq2Seq model
85+
- `encoder`: the encoder half of the Seq2Seq model.

0 commit comments

Comments
 (0)