Skip to content

Commit 8482770

Browse files
authored
Small updates to main torchao README.md (#3160)
Small updated to main torchao README.md
1 parent fe9be99 commit 8482770

File tree

1 file changed

+14
-11
lines changed

1 file changed

+14
-11
lines changed

README.md

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,13 @@
2828
- [May 25] QAT is now integrated into [Axolotl](https://github.com/axolotl-ai-cloud/axolotl) for fine-tuning ([docs](https://docs.axolotl.ai/docs/qat.html))!
2929
- [Apr 25] Float8 rowwise training yielded [1.34-1.43x training speedup](https://pytorch.org/blog/accelerating-large-scale-training-and-convergence-with-pytorch-float8-rowwise-on-crusoe-2k-h200s/) at 2k H100 GPU scale
3030
- [Apr 25] TorchAO is added as a [quantization backend to vLLM](https://docs.vllm.ai/en/latest/features/quantization/torchao.html) ([docs](https://docs.vllm.ai/en/latest/features/quantization/torchao.html))!
31-
- [Mar 25] Our [2:4 Sparsity paper](https://openreview.net/pdf?id=O5feVk7p6Y) was accepted to SLLM @ ICLR 2025!
32-
- [Jan 25] Our [integration with GemLite and SGLang](https://pytorch.org/blog/accelerating-llm-inference/) yielded 1.1-2x faster inference with int4 and float8 quantization across different batch sizes and tensor parallel sizes
33-
- [Jan 25] We added [1-8 bit ARM CPU kernels](https://pytorch.org/blog/hi-po-low-bit-operators/) for linear and embedding ops
3431

3532
<details>
3633
<summary>Older news</summary>
3734

35+
- [Mar 25] Our [2:4 Sparsity paper](https://openreview.net/pdf?id=O5feVk7p6Y) was accepted to SLLM @ ICLR 2025!
36+
- [Jan 25] Our [integration with GemLite and SGLang](https://pytorch.org/blog/accelerating-llm-inference/) yielded 1.1-2x faster inference with int4 and float8 quantization across different batch sizes and tensor parallel sizes
37+
- [Jan 25] We added [1-8 bit ARM CPU kernels](https://pytorch.org/blog/hi-po-low-bit-operators/) for linear and embedding ops
3838
- [Nov 24] We achieved [1.43-1.51x faster pre-training](https://pytorch.org/blog/training-using-float8-fsdp2/) on Llama-3.1-70B and 405B using float8 training
3939
- [Oct 24] TorchAO is added as a quantization backend to HF Transformers!
4040
- [Sep 24] We officially launched TorchAO. Check out our blog [here](https://pytorch.org/blog/pytorch-native-architecture-optimization/)!
@@ -47,8 +47,7 @@
4747

4848
## 🌅 Overview
4949

50-
TorchAO is a PyTorch-native model optimization framework leveraging quantization and sparsity to provide an end-to-end, training-to-serving workflow
51-
for AI models. TorchAO works out-of-the-box with `torch.compile()` and `FSDP2` across most HuggingFace PyTorch models. Key features include:
50+
TorchAO is an easy to use quantization library for native PyTorch. TorchAO works out-of-the-box with `torch.compile()` and `FSDP2` across most HuggingFace PyTorch models. Key features include:
5251
* Float8 [training](torchao/float8/README.md) and [inference](https://docs.pytorch.org/ao/main/generated/torchao.quantization.Float8DynamicActivationFloat8WeightConfig.html) for speedups without compromising accuracy
5352
* [MX training and inference](torchao/prototype/mx_formats/README.md), provides MX tensor formats based on native PyTorch MX dtypes (prototype)
5453
* [Quantization-Aware Training (QAT)](torchao/quantization/qat/README.md) for mitigating quantization degradation
@@ -67,17 +66,17 @@ From the team that brought you the fast series:
6766
## 🚀 Quick Start
6867

6968
First, install TorchAO. We recommend installing the latest stable version:
70-
```
69+
```bash
7170
pip install torchao
7271
```
7372

7473
Quantize your model weights to int4!
75-
```
74+
```python
7675
from torchao.quantization import Int4WeightOnlyConfig, quantize_
7776
quantize_(model, Int4WeightOnlyConfig(group_size=32, version=1))
7877
```
7978
Compared to a `torch.compiled` bf16 baseline, your quantized model should be significantly smaller and faster on a single A100 GPU:
80-
```
79+
```bash
8180
int4 model size: 1.25 MB
8281
bfloat16 model size: 4.00 MB
8382
compression ratio: 3.2
@@ -86,13 +85,13 @@ bf16 mean time: 30.393 ms
8685
int4 mean time: 4.410 ms
8786
speedup: 6.9x
8887
```
89-
For the full model setup and benchmark details, check out our [quick start guide](https://docs.pytorch.org/ao/stable/quick_start.html). Alternatively, try quantizing your favorite model using our [HuggingFace space](https://huggingface.co/spaces/pytorch/torchao-my-repo)!
88+
See our [quick start guide](https://docs.pytorch.org/ao/stable/quick_start.html) for more details. Alternatively, try quantizing your favorite model using our [HuggingFace space](https://huggingface.co/spaces/pytorch/torchao-my-repo)!
9089

9190

9291
## 🛠 Installation
9392

9493
To install the latest stable version:
95-
```
94+
```bash
9695
pip install torchao
9796
```
9897

@@ -196,7 +195,7 @@ quantize_(my_model, QATConfig(base_config, step="convert"))
196195
Users can also combine LoRA + QAT to speed up training by [1.89x](https://dev-discuss.pytorch.org/t/speeding-up-qat-by-1-89x-with-lora/2700) compared to vanilla QAT using this [fine-tuning recipe](https://github.com/pytorch/torchtune/blob/main/recipes/qat_lora_finetune_distributed.py).
197196

198197

199-
### Float8
198+
### Quantized training
200199

201200
[torchao.float8](torchao/float8) implements training recipes with the scaled float8 dtypes, as laid out in https://arxiv.org/abs/2209.05433. With ``torch.compile`` on, current results show throughput speedups of up to **1.5x on up to 512 GPU / 405B parameter count scale** ([details](https://pytorch.org/blog/training-using-float8-fsdp2/)):
202201

@@ -211,6 +210,8 @@ Our float8 training is integrated into [TorchTitan's pre-training flows](https:/
211210
* [Efficient Pre-training of Llama 3-like model architectures using torchtitan on Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/efficient-pre-training-of-llama-3-like-model-architectures-using-torchtitan-on-amazon-sagemaker/)
212211
* [Float8 in PyTorch](https://dev-discuss.pytorch.org/t/float8-in-pytorch-1-x/1815)
213212

213+
<details>
214+
<summary>Other features (sparse training, memory efficient optimizers)</summary>
214215

215216
### Sparse Training
216217

@@ -242,6 +243,8 @@ optim = CPUOffloadOptimizer(model.parameters(), torch.optim.AdamW, fused=True)
242243
optim.load_state_dict(ckpt["optim"])
243244
```
244245

246+
</details>
247+
245248
<!--
246249
## For Developers
247250

0 commit comments

Comments
 (0)