Skip to content
This repository was archived by the owner on Nov 27, 2024. It is now read-only.

Commit 1f50960

Browse files
committed
Stable Cascade converter
1 parent 3901935 commit 1f50960

File tree

10 files changed

+1115
-0
lines changed

10 files changed

+1115
-0
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
/footprints/
2+
/result_*.png
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Stable Diffusion Optimization
2+
3+
This folder contains sample use cases of Olive with ONNX Runtime and OpenVINO to optimize:
4+
- Stable Diffusion: [Stable Diffusion v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4), [Stable Diffusion v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5), [Stable Diffusion v2](https://huggingface.co/stabilityai/stable-diffusion-2)
5+
- Stable Diffusion XL: [Stable Diffusion XL Base](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), [Stable Diffusion XL Refiner](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0)
6+
7+
Stable Diffusion comprises multiple PyTorch models tied together into a *pipeline*.
8+
9+
The ONNX Runtime optimization sample will convert each PyTorch model to ONNX, and then run the converted ONNX models through the `OrtTransformersOptimization` pass. The transformer optimization pass performs several time-consuming graph transformations that make the models more efficient for inference at runtime.
10+
11+
The OpenVINO optimization sample will convert each PyTorch model to OpenVINO IR model by `OpenVINOConversion` pass, and create an `OpenVINOStableDiffusionPipeline` for inference.
12+
13+
- ONNX Runtime with
14+
- [CUDA EP](#stable-diffusion-and-stable-diffusion-xl-optimization-with-onnx-runtime-cuda-ep)
15+
- DirectML EP: go to examples [Stable Diffusion](../directml/stable_diffusion/README.md), [Stable Diffusion XL](../directml/stable_diffusion_xl/README.md)
16+
- [OpenVINO](#stable-diffusion-optimization-with-openvino)
17+
18+
## Stable Diffusion and Stable Diffusion XL Optimization with ONNX Runtime CUDA EP
19+
20+
This sample performs the following optimization workflow for each model in the Stable Diffusion pipeline:
21+
- *PyTorch Model -> Onnx Model -> Transformers Optimized Onnx Model fp16*
22+
<br/><br/>
23+
24+
Transformers optimization uses the following optimizations to speed up Stable Diffusion in CUDA:
25+
* [Flash Attention](https://arxiv.org/abs/2205.14135) for float16 precision. Flash Attention uses tiling to reduce number of GPU memory reads/writes, and improves performance with less memory for long sequence length. The kernel requires GPUs of Compute Capability >= 7.5 (like T4, A100, and RTX 2060~4090). Only availanle in Linux.
26+
* [Memory Efficient Attention](https://arxiv.org/abs/2112.05682v2) for float32 precision or older GPUs (like V100). We used the fused multi-head attention kernel in CUTLASS, and the kernel was contributed by xFormers.
27+
* Channel-last (NHWC) convolution. For NVidia GPU with Tensor Cores support, NHWC tensor layout is recommended for convolution. See [Tensor Layouts In Memory: NCHW vs NHWC](https://docs.nvidia.com/deeplearning/performance/dl-performance-convolutional/index.html#tensor-layout).
28+
* GroupNorm for NHWC tensor layout, and SkipGroupNorm fusion which fuses GroupNorm with Add bias and residual inputs
29+
* SkipLayerNormalization which fuses LayerNormalization with Add bias and residual inputs.
30+
* BiasSplitGelu is a fusion of Add bias with SplitGelu activation.
31+
* BiasAdd fuses Add bias and residual.
32+
* Reduce Transpose nodes by graph transformation.
33+
34+
#### Prerequisites
35+
##### Clone the repository and install Olive
36+
37+
Refer to the instructions in the [examples README](../README.md) to clone the repository and install Olive.
38+
39+
40+
We use the same olive workflow config files and scripts as the DirectML examples. The only difference is the `--provider cuda` option provided to the `stable_diffusion.py` and `stable_diffusion_xl.py` scripts.
41+
42+
So, cd into the corresponding DirectML example folder from the root of the cloned repository:
43+
44+
**_Stable Diffusion_**
45+
```bash
46+
cd examples/stable_diffusion
47+
```
48+
49+
**_Stable Diffusion XL_**
50+
```bash
51+
cd examples/directml/stable_diffusion_xl
52+
```
53+
54+
##### Install onnxruntime
55+
56+
This example requires the latest onnxruntime-gpu code which can either be built from source or installed from the nightly builds. The following command can be used to install the latest nightly build of onnxruntime-gpu:
57+
58+
```bash
59+
# uninstall any pre-existing onnxruntime packages
60+
pip uninstall -y onnxruntime onnxruntime-gpu onnxruntime-directml ort-nightly ort-nightly-gpu ort-nightly-directml
61+
62+
# install onnxruntime-gpu nightly build
63+
pip install ort-nightly-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/
64+
```
65+
66+
##### Install other dependencies
67+
68+
Install the necessary python packages:
69+
70+
```bash
71+
python -m pip install -r requirements-common.txt
72+
```
73+
74+
#### Conversion to ONNX and Latency Optimization
75+
76+
The easiest way to optimize the pipeline is with the `stable_diffusion.py` and `stable_diffusion_xl.py` scripts. These scripts will enumerate the `config_<model_name>.json` files and optimize each with Olive, then gather the optimized models into a directory structure suitable for testing inference.
77+
78+
**_Stable Diffusion_**
79+
```bash
80+
# default model_id is "runwayml/stable-diffusion-v1-5"
81+
python stable_diffusion.py --provider cuda --optimize
82+
```
83+
84+
**_Stable Diffusion XL_**
85+
```bash
86+
# default model_id is "stabilityai/stable-diffusion-xl-base-1.0"
87+
python stable_diffusion_xl.py --provider cuda --optimize [--use_fp16_fixed_vae]
88+
89+
# or specify a different model_id
90+
python stable_diffusion_xl.py --provider cuda --model_id stabilityai/stable-diffusion-xl-refiner-1.0 --optimize [--use_fp16_fixed_vae]
91+
```
92+
93+
`--use_fp16_fixed_vae` is optional. If provided, will use [madebyollin/sdxl-vae-fp16-fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) for the vae models and all sub-models will be entirely in fp16.
94+
Otherwise, the vae models (vae-decoder for base and both vae-decoder and vae-encoder for refiner) will be in fp32 and all other sub-models will be in fp16 with fp32 input/output.
95+
96+
Once the script successfully completes:
97+
- The optimized ONNX pipeline will be stored under `models/optimized-cuda/[model_id]` (for example `models/optimized-cuda/runwayml/stable-diffusion-v1-5` or `models/optimized-cuda/stabilityai/stable-diffusion-xl-base-1.0`).
98+
- The unoptimized ONNX pipeline (models converted to ONNX, but not run through transformer optimization pass) will be stored under `models/unoptimized/[model_id]` (for example `models/unoptimized/runwayml/stable-diffusion-v1-5` or `models/unoptimized/stabilityai/stable-diffusion-xl-base-1.0`).
99+
100+
Re-running the script with `--optimize` will delete the output models, but it will *not* delete the Olive cache. Subsequent runs will complete much faster since it will simply be copying previously optimized models; you may use the `--clean_cache` option to start from scratch (not typically used unless you are modifying the scripts, for example).
101+
102+
### Test Inference with CUDA
103+
104+
Test ONNX runtime inference with the optimized models using `OnnxStableDiffusionPipeline`:
105+
106+
**_Stable Diffusion_**
107+
```bash
108+
python stable_diffusion.py --provider cuda --num_images 2
109+
```
110+
Inference will loop until the generated image passes the safety checker (otherwise you would see black images). The result will be saved as `result_<i>.png` on disk.
111+
112+
**_Stable Diffusion XL_**
113+
```bash
114+
python stable_diffusion_xl.py --provider cuda --num_images 2
115+
```
116+
The result will be saved as `result_<i>.png` on disk.
117+
118+
Refer to the corresponding section in the DirectML READMEs for more details on the test inference options:
119+
- [Stable Diffusion](../directml/stable_diffusion/README.md#test-inference)
120+
- [Stable Diffusion XL](../directml/stable_diffusion_xl/README.md#test-inference)
121+
122+
123+
## Stable Diffusion Optimization with OpenVINO
124+
125+
**Contents**:
126+
- [Setup](#setup)
127+
- [Conversion to OpenVINO IR model](#convert-to-openvino-ir-model)
128+
- [Test Inference](#test-inference-with-openvino)
129+
130+
### Setup
131+
132+
Olive is currently under pre-release, with constant updates and improvements to the functions and usage. This sample code will be frequently updated as Olive evolves, so it is important to install Olive from source when checking out this code from the main branch. See the [README for examples](https://github.com/microsoft/Olive/blob/main/examples/README.md#important) for detailed instructions on how to do this.
133+
134+
**Alternatively**, you may install a stable release that we have validated. For example:
135+
136+
```
137+
# Install Olive from main branch
138+
pip install git+https://github.com/microsoft/Olive#egg=olive-ai[openvino]
139+
140+
# Clone Olive repo to access sample code
141+
git clone https://github.com/microsoft/olive
142+
```
143+
144+
Once you've installed Olive, install the requirements for this sample matching the version of the library you are using:
145+
```
146+
cd olive/examples/stable_diffusion
147+
pip install -r requirements-ov.txt
148+
```
149+
150+
### Convert to OpenVINO IR model
151+
152+
The easiest way to optimize the pipeline is with the `stable_diffusion.py` helper script:
153+
154+
```
155+
python stable_diffusion.py --optimize
156+
```
157+
158+
The above command will enumerate the `config_<model_name>.json` files and optimize each with Olive, then gather the optimized models into a directory structure suitable for testing inference.
159+
160+
The stable diffusion models are large, and the optimization process is resource intensive. It is recommended to run optimization on a system with a minimum of 16GB of memory (preferably 32GB). Expect optimization to take several minutes (especially the U-Net model).
161+
162+
Once the script successfully completes:
163+
- The converted OpenVINO IR model will be stored under `models/optimized-openvino/[model_id]` (for example `models/optimized-openvino/runwayml/stable-diffusion-v1-5`).
164+
165+
Re-running the script with `--optimize` will delete the output models, but it will *not* delete the Olive cache. Subsequent runs will complete much faster since it will simply be copying previously optimized models; you may use the `--clean_cache` option to start from scratch (not typically used unless you are modifying the scripts, for example).
166+
167+
### Test Inference with OpenVINO
168+
169+
This sample code is primarily intended to illustrate model optimization with Olive, but it also provides a simple interface for testing inference with the OpenVINO models. Inference is done by creating an `OVStableDiffusionPipeline` from the saved models.
170+
171+
172+
```
173+
python stable_diffusion.py --inference --provider openvino
174+
```
175+
Inference will loop until the generated image. The result will be saved as `result_<i>.png` on disk.
176+
177+
178+
Run `python stable_diffusion.py --help` for additional options. A few particularly relevant ones:
179+
- `--image_path <str>`: the input image path for image to image inference.
180+
- `--img_to_img_example`: image to image example. The default input image is `assets/dog.png`, the default prompt is `amazing watercolor painting`.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# -------------------------------------------------------------------------
2+
# Copyright (c) Microsoft Corporation. All rights reserved.
3+
# Licensed under the MIT License.
4+
# --------------------------------------------------------------------------
5+
6+
vae_sample_size = 512
7+
unet_sample_size = 24
8+
cross_attention_dim = 1280
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
{
2+
"input_model": {
3+
"type": "PyTorchModel",
4+
"config": {
5+
"model_path": "stabilityai/stable-cascade",
6+
"model_loader": "decoder_load",
7+
"model_script": "models.py",
8+
"io_config": {
9+
"input_names": [ "sample", "timestep_ratio", "clip_text_pooled", "effnet", "return_dict" ],
10+
"output_names": [ "out_sample" ],
11+
"dynamic_axes": {
12+
"sample": {"0": "unet_sample_batch", "1": "unet_sample_channels", "2": "unet_sample_height", "3": "unet_sample_width"},
13+
"timestep_ratio": {"0": "unet_timestep_ratio"},
14+
"clip_text_pooled": {"0": "unet_clip_text_pooled_batch", "1": "unet_clip_text_pooled_size"},
15+
"effnet": {"0": "unet_hidden_batch", "1": "unet_hidden_size"}
16+
}
17+
},
18+
"dummy_inputs_func": "decoder_conversion_inputs"
19+
}
20+
},
21+
"systems": {
22+
"local_system": {
23+
"type": "LocalSystem",
24+
"config": {
25+
"accelerators": [
26+
{
27+
"device": "gpu",
28+
"execution_providers": [
29+
"DmlExecutionProvider"
30+
]
31+
}
32+
]
33+
}
34+
}
35+
},
36+
"evaluators": {
37+
"common_evaluator": {
38+
"metrics": [
39+
{
40+
"name": "latency",
41+
"type": "latency",
42+
"sub_types": [{"name": "avg"}],
43+
"user_config": {
44+
"user_script": "models.py",
45+
"dataloader_func": "decoder_data_loader",
46+
"batch_size": 2
47+
}
48+
}
49+
]
50+
}
51+
},
52+
"passes": {
53+
"convert": {
54+
"type": "OnnxConversion",
55+
"config": {
56+
"target_opset": 14,
57+
"save_as_external_data": true,
58+
"all_tensors_to_one_file": true,
59+
"external_data_name": "weights.pb"
60+
}
61+
},
62+
"optimize": {
63+
"type": "OrtTransformersOptimization",
64+
"config": {
65+
"model_type": "unet",
66+
"opt_level": 0,
67+
"float16": true,
68+
"use_gpu": true,
69+
"keep_io_types": false,
70+
"optimization_options": {
71+
"enable_gelu": true,
72+
"enable_layer_norm": true,
73+
"enable_attention": true,
74+
"use_multi_head_attention": true,
75+
"enable_skip_layer_norm": false,
76+
"enable_embed_layer_norm": true,
77+
"enable_bias_skip_layer_norm": false,
78+
"enable_bias_gelu": true,
79+
"enable_gelu_approximation": false,
80+
"enable_qordered_matmul": false,
81+
"enable_shape_inference": true,
82+
"enable_gemm_fast_gelu": false,
83+
"enable_nhwc_conv": false,
84+
"enable_group_norm": true,
85+
"enable_bias_splitgelu": false,
86+
"enable_packed_qkv": true,
87+
"enable_packed_kv": true,
88+
"enable_bias_add": false,
89+
"group_norm_channels_last": false
90+
},
91+
"force_fp32_ops": ["RandomNormalLike"],
92+
"force_fp16_inputs": {
93+
"GroupNorm": [0, 1, 2]
94+
}
95+
}
96+
},
97+
"optimize_cuda": {
98+
"type": "OrtTransformersOptimization",
99+
"config": {
100+
"model_type": "decoder",
101+
"opt_level": 0,
102+
"float16": true,
103+
"use_gpu": true,
104+
"keep_io_types": false
105+
}
106+
}
107+
},
108+
"pass_flows": [
109+
["convert", "optimize"]
110+
],
111+
"engine": {
112+
"log_severity_level": 0,
113+
"evaluator": "common_evaluator",
114+
"evaluate_input_model": false,
115+
"host": "local_system",
116+
"target": "local_system",
117+
"cache_dir": "cache",
118+
"output_name": "decoder",
119+
"output_dir": "footprints"
120+
}
121+
}

0 commit comments

Comments
 (0)