docs: update bitsandbytes platform support (#41266)

matthewdouglas · web-flow · commit 9f2d5666f8fd · 2025-10-01T14:27:19.000-04:00
diff --git a/docs/source/en/quantization/bitsandbytes.md b/docs/source/en/quantization/bitsandbytes.md
@@ -16,7 +16,7 @@ rendered properly in your Markdown viewer.
 
 # Bitsandbytes
 
-The [bitsandbytes](https://github.com/bitsandbytes-foundation/bitsandbytes) library provides quantization tools for LLMs through a lightweight Python wrapper around CUDA functions. It enables working with large models using limited computational resources by reducing their memory footprint.
+The [bitsandbytes](https://github.com/bitsandbytes-foundation/bitsandbytes) library provides quantization tools for LLMs through a lightweight Python wrapper around hardware accelerator functions. It enables working with large models using limited computational resources by reducing their memory footprint.
 
 At its core, bitsandbytes provides:
 
@@ -41,27 +41,29 @@ pip install --upgrade transformers accelerate bitsandbytes
 To compile from source, follow the instructions in the [bitsandbytes installation guide](https://huggingface.co/docs/bitsandbytes/main/en/installation).
 
 ## Hardware Compatibility
-bitsandbytes is currently only supported on CUDA GPUs for CUDA versions 11.0 - 12.8. However, there's an ongoing multi-backend effort under development, which is currently in alpha. If you're interested in providing feedback or testing, check out the [bitsandbytes repository](https://github.com/bitsandbytes-foundation/bitsandbytes) for more information.
+bitsandbytes is supported on NVIDIA GPUs for CUDA versions 11.8 - 13.0, Intel XPU, Intel Gaudi (HPU), and CPU. There is an ongoing effort to support additional platforms. If you're interested in providing feedback or testing, check out the [bitsandbytes repository](https://github.com/bitsandbytes-foundation/bitsandbytes) for more information.
 
-### CUDA
+### NVIDIA GPUs (CUDA)
+
+This backend is supported on Linux x86-64, Linux aarch64, and Windows platforms.
 
 | Feature | Minimum Hardware Requirement |
 |---------|-------------------------------|
-| 8-bit optimizers | NVIDIA Maxwell (GTX 900 series, TITAN X, M40) or newer GPUs * |
-| LLM.int8() | NVIDIA Turing (RTX 20 series, T4) or newer GPUs |
-| NF4/FP4 quantization | NVIDIA Maxwell (GTX 900 series, TITAN X, M40) or newer GPUs * |
+| 8-bit optimizers | NVIDIA Pascal (GTX 10X0 series, P100) or newer GPUs * |
+| LLM.int8() | NVIDIA Turing (RTX 20X0 series, T4) or newer GPUs |
+| NF4/FP4 quantization | NVIDIA Pascal (GTX 10X0 series, P100) or newer GPUs * |
+
+### Intel GPUs (XPU)
+
+This backend is supported on Linux x86-64 and Windows x86-64 platforms.
+
+### Intel Gaudi (HPU)
 
-### Multi-backend
+This backend is supported on Linux x86-64 for Gaudi2 and Gaudi3.
 
-| Backend | Supported Versions | Python versions | Architecture Support | Status |
-|---------|-------------------|----------------|---------------------|---------|
-| AMD ROCm | 6.1+ | 3.10+ | minimum CDNA - gfx90a, RDNA - gfx1100 | Alpha |
-| Apple Silicon (MPS) | WIP | 3.10+ | M1/M2 chips | Planned |
-| Intel CPU | v2.4.0+ (ipex) | 3.10+ | Intel CPU | Alpha |
-| Intel GPU | v2.4.0+ (ipex) | 3.10+ | Intel GPU | Experimental |
-| Ascend NPU | 2.1.0+ (torch_npu) | 3.10+ | Ascend NPU | Experimental |
+### CPU
 
-> **Note:** Bitsandbytes is moving away from the multi-backend approach towards using [Pytorch Custom Operators](https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html), as the main mechanism for supporting new hardware, and dispatching to the correct backend.
+This backend is supported on Linux x86-64, Linux aarch64, and Windows x86-64 platforms.
 
 ## Quantization Examples
 
diff --git a/docs/source/en/quantization/overview.md b/docs/source/en/quantization/overview.md
@@ -27,7 +27,7 @@ Use the Space below to help you pick a quantization method depending on your har
 | [AQLM](./aqlm)                            | 🔴                   | 🟢              |     🟢     | 🔴        | 🔴                                 | 🟢              | 🟢              | 1/2          | 🟢               | 🟢                          | 🟢                      | https://github.com/Vahe1994/AQLM            |
 | [AutoRound](./auto_round)                 | 🔴                   | 🟢               | 🟢          |   🔴        |   🔴                                |   🟢              |   🔴               | 2/3/4/8      |    🔴              |       🟢                      |    🟢                       |      https://github.com/intel/auto-round                                       |
 | [AWQ](./awq)                              | 🔴                   | 🟢              | 🟢        | 🟢        | 🔴                                 | 🟢              | ?               | 4            | 🟢               | 🟢                          | 🟢                      | https://github.com/casper-hansen/AutoAWQ    |
-| [bitsandbytes](./bitsandbytes)            | 🟢                   | 🟡 |     🟢     | 🟡 | 🔴                    | 🟡 | 🟢 | 4/8          | 🟢               | 🟢                          | 🟢                      | https://github.com/bitsandbytes-foundation/bitsandbytes |
+| [bitsandbytes](./bitsandbytes)            | 🟢                   | 🟢 |     🟢     | 🟡 | 🟡                    | 🟢 | 🟢 | 4/8          | 🟢               | 🟢                          | 🟢                      | https://github.com/bitsandbytes-foundation/bitsandbytes |
 | [compressed-tensors](./compressed_tensors) | 🔴                   | 🟢              |     🟢     | 🟢        | 🔴                                 | 🔴              | 🔴              | 1/8          | 🟢               | 🟢                          | 🟢                      | https://github.com/neuralmagic/compressed-tensors |
 | [EETQ](./eetq)                            | 🟢                   | 🔴              | 🟢        | 🔴        | 🔴                                 | 🔴              | ?               | 8            | 🟢               | 🟢                          | 🟢                      | https://github.com/NetEase-FuXi/EETQ        |
 | [FP-Quant](./fp_quant)                          | 🟢                   | 🔴              | 🟢        | 🔴        | 🔴                                 | 🔴              | 🟢              | 4           | 🔴               | 🟢                          | 🟢                      | https://github.com/IST-DASLab/FP-Quant      |
diff --git a/docs/source/en/quantization/selecting.md b/docs/source/en/quantization/selecting.md
@@ -26,7 +26,7 @@ Consider the quantization methods below for inference.
 
 | quantization method | use case |
 |---|---|
-| bitsandbytes | ease of use and QLoRA fine-tuning on NVIDIA GPUs |
+| bitsandbytes | ease of use and QLoRA fine-tuning on NVIDIA and Intel GPUs |
 | compressed-tensors | loading specific quantized formats (FP8, Sparse) |
 | GPTQModel or AWQ | good 4-bit accuracy with upfront calibration |
 | HQQ | fast on the fly quantization without calibration |