Skip to content

Commit 9f2d566

Browse files
docs: update bitsandbytes platform support (#41266)
1 parent 9d8f693 commit 9f2d566

File tree

3 files changed

+19
-17
lines changed

3 files changed

+19
-17
lines changed

docs/source/en/quantization/bitsandbytes.md

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ rendered properly in your Markdown viewer.
1616

1717
# Bitsandbytes
1818

19-
The [bitsandbytes](https://github.com/bitsandbytes-foundation/bitsandbytes) library provides quantization tools for LLMs through a lightweight Python wrapper around CUDA functions. It enables working with large models using limited computational resources by reducing their memory footprint.
19+
The [bitsandbytes](https://github.com/bitsandbytes-foundation/bitsandbytes) library provides quantization tools for LLMs through a lightweight Python wrapper around hardware accelerator functions. It enables working with large models using limited computational resources by reducing their memory footprint.
2020

2121
At its core, bitsandbytes provides:
2222

@@ -41,27 +41,29 @@ pip install --upgrade transformers accelerate bitsandbytes
4141
To compile from source, follow the instructions in the [bitsandbytes installation guide](https://huggingface.co/docs/bitsandbytes/main/en/installation).
4242

4343
## Hardware Compatibility
44-
bitsandbytes is currently only supported on CUDA GPUs for CUDA versions 11.0 - 12.8. However, there's an ongoing multi-backend effort under development, which is currently in alpha. If you're interested in providing feedback or testing, check out the [bitsandbytes repository](https://github.com/bitsandbytes-foundation/bitsandbytes) for more information.
44+
bitsandbytes is supported on NVIDIA GPUs for CUDA versions 11.8 - 13.0, Intel XPU, Intel Gaudi (HPU), and CPU. There is an ongoing effort to support additional platforms. If you're interested in providing feedback or testing, check out the [bitsandbytes repository](https://github.com/bitsandbytes-foundation/bitsandbytes) for more information.
4545

46-
### CUDA
46+
### NVIDIA GPUs (CUDA)
47+
48+
This backend is supported on Linux x86-64, Linux aarch64, and Windows platforms.
4749

4850
| Feature | Minimum Hardware Requirement |
4951
|---------|-------------------------------|
50-
| 8-bit optimizers | NVIDIA Maxwell (GTX 900 series, TITAN X, M40) or newer GPUs * |
51-
| LLM.int8() | NVIDIA Turing (RTX 20 series, T4) or newer GPUs |
52-
| NF4/FP4 quantization | NVIDIA Maxwell (GTX 900 series, TITAN X, M40) or newer GPUs * |
52+
| 8-bit optimizers | NVIDIA Pascal (GTX 10X0 series, P100) or newer GPUs * |
53+
| LLM.int8() | NVIDIA Turing (RTX 20X0 series, T4) or newer GPUs |
54+
| NF4/FP4 quantization | NVIDIA Pascal (GTX 10X0 series, P100) or newer GPUs * |
55+
56+
### Intel GPUs (XPU)
57+
58+
This backend is supported on Linux x86-64 and Windows x86-64 platforms.
59+
60+
### Intel Gaudi (HPU)
5361

54-
### Multi-backend
62+
This backend is supported on Linux x86-64 for Gaudi2 and Gaudi3.
5563

56-
| Backend | Supported Versions | Python versions | Architecture Support | Status |
57-
|---------|-------------------|----------------|---------------------|---------|
58-
| AMD ROCm | 6.1+ | 3.10+ | minimum CDNA - gfx90a, RDNA - gfx1100 | Alpha |
59-
| Apple Silicon (MPS) | WIP | 3.10+ | M1/M2 chips | Planned |
60-
| Intel CPU | v2.4.0+ (ipex) | 3.10+ | Intel CPU | Alpha |
61-
| Intel GPU | v2.4.0+ (ipex) | 3.10+ | Intel GPU | Experimental |
62-
| Ascend NPU | 2.1.0+ (torch_npu) | 3.10+ | Ascend NPU | Experimental |
64+
### CPU
6365

64-
> **Note:** Bitsandbytes is moving away from the multi-backend approach towards using [Pytorch Custom Operators](https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html), as the main mechanism for supporting new hardware, and dispatching to the correct backend.
66+
This backend is supported on Linux x86-64, Linux aarch64, and Windows x86-64 platforms.
6567

6668
## Quantization Examples
6769

docs/source/en/quantization/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Use the Space below to help you pick a quantization method depending on your har
2727
| [AQLM](./aqlm) | 🔴 | 🟢 | 🟢 | 🔴 | 🔴 | 🟢 | 🟢 | 1/2 | 🟢 | 🟢 | 🟢 | https://github.com/Vahe1994/AQLM |
2828
| [AutoRound](./auto_round) | 🔴 | 🟢 | 🟢 | 🔴 | 🔴 | 🟢 | 🔴 | 2/3/4/8 | 🔴 | 🟢 | 🟢 | https://github.com/intel/auto-round |
2929
| [AWQ](./awq) | 🔴 | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | ? | 4 | 🟢 | 🟢 | 🟢 | https://github.com/casper-hansen/AutoAWQ |
30-
| [bitsandbytes](./bitsandbytes) | 🟢 | 🟡 | 🟢 | 🟡 | 🔴 | 🟡 | 🟢 | 4/8 | 🟢 | 🟢 | 🟢 | https://github.com/bitsandbytes-foundation/bitsandbytes |
30+
| [bitsandbytes](./bitsandbytes) | 🟢 | 🟢 | 🟢 | 🟡 | 🟡 | 🟢 | 🟢 | 4/8 | 🟢 | 🟢 | 🟢 | https://github.com/bitsandbytes-foundation/bitsandbytes |
3131
| [compressed-tensors](./compressed_tensors) | 🔴 | 🟢 | 🟢 | 🟢 | 🔴 | 🔴 | 🔴 | 1/8 | 🟢 | 🟢 | 🟢 | https://github.com/neuralmagic/compressed-tensors |
3232
| [EETQ](./eetq) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | ? | 8 | 🟢 | 🟢 | 🟢 | https://github.com/NetEase-FuXi/EETQ |
3333
| [FP-Quant](./fp_quant) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🟢 | 4 | 🔴 | 🟢 | 🟢 | https://github.com/IST-DASLab/FP-Quant |

docs/source/en/quantization/selecting.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Consider the quantization methods below for inference.
2626

2727
| quantization method | use case |
2828
|---|---|
29-
| bitsandbytes | ease of use and QLoRA fine-tuning on NVIDIA GPUs |
29+
| bitsandbytes | ease of use and QLoRA fine-tuning on NVIDIA and Intel GPUs |
3030
| compressed-tensors | loading specific quantized formats (FP8, Sparse) |
3131
| GPTQModel or AWQ | good 4-bit accuracy with upfront calibration |
3232
| HQQ | fast on the fly quantization without calibration |

0 commit comments

Comments
 (0)