Commit d7d1b45
[Observers] Refactor for better FP4 support, static and memoryless observers (#1903)
## Purpose ##
* FP4
* Fix bug discovered
[here](#1830 (comment))
where dynamic="local" nvfp4 calculations would increment the observer
twice as fast as normal
* Enable MSE observer to be used with FP4
```psuedocode
mse_quant_error := mean((x - fake_quant(x))**2)
global_scale <- min[min_vals, max_vals,
global_scale](mse_quant_error(x))
scale, zp <- min[min_vals, max_vals](mse_quant_error(x, global_scale))
```
* Simplification
* Make supporting attention calibration easier by separating out
weight/activation/attention reshaping
* Improve readability of observer codes by removing many levels of
function indirection
* Drop support for calibration with non-divisible group sizes. This is
not really a loss, since [forward
passes](https://github.com/neuralmagic/compressed-tensors/blob/main/src/compressed_tensors/quantization/lifecycle/forward.py#L279)
also make this assumption
* New observers
* `memoryless_minmax` computes min and max values on the fly in a
dynamic-quantization style. This observer is useful for PTQ weight
quantization
* `static_minmax` computes absolute min and max values across all
observations. This observer is useful for PTQ activation quantization
* `memoryless_mse` computes best qparams w.r.t. MSE loss for each
observation. This observer is useful for PTQ weight quantization
* Memory improvements
* All observers no longer store copies of scales and zero points,
reducing the amount of required memory
* Newly introduced "memoryless" observers do not store any quantization
parameters, which greatly reduces the memory requirements for PTQ weight
quantization of very large models
| Diagrams |
| - |
| Before |
| <img width="886" height="595" alt="before"
src="https://github.com/user-attachments/assets/660d94c2-3ac8-4e05-9e9b-53d21145abac"
/> |
| After |
<img width="1527" height="595" alt="after"
src="https://github.com/user-attachments/assets/51a0107e-3fbd-413c-a7a6-03ddc3612169"
/> |
## Changes ##
* Standardize reshaping using `flatten_for_calibration`
* This function reshapes all observed values to `(num_observations,
*qparams_shape, group_size)`
* This function the complexity associated with passing "reduce dims" and
trying to handle weights, activations, and attention states all in the
same function
* In the future, this function could be applied to the quantization
forward pass, although there's probably no need to outside of
standardization
* Implement `get_global_scale` on `Observer` base
* This function decouples minmax calculations from regular qparam
calculations (avoiding the double increment bug)
* This function enables the MSE observer to be used with FP4 global
scales
## Testing ##
* Added additional minmax tests which check exact values of scales. This
test passes both on main and this branch, demonstrating that minmax
observer behavior remains unchanged
* Added additional MSE tests which check exact values of mse losses.
This test passes both on main and this branch, demonstrating that MSE
observer behavior remains unchanged
* Added FP4 MSE test
## Evaluation ##
```
nvfp4-static-minmax
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|--------|------:|------|-----:|--------|---|-----:|---|------|
|mmmu_val| 0|none | 0|mmmu_acc|↑ |0.6167|± | N/A|
```
```
nvfp4-minmax
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|--------|------:|------|-----:|--------|---|-----:|---|------|
|mmmu_val| 0|none | 0|mmmu_acc|↑ |0.6011|± | N/A|
```
---------
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Dan Huang <dan.huang@neuralmagic.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>1 parent 2a6a0a3 commit d7d1b45
File tree
17 files changed
+1219
-820
lines changed- docs
- src/llmcompressor
- modifiers/quantization
- gptq
- observers
- tests/llmcompressor
- modifiers/calibration
- observers
17 files changed
+1219
-820
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
68 | | - | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
69 | 73 | | |
70 | 74 | | |
71 | 75 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
89 | | - | |
90 | 89 | | |
91 | | - | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
92 | 93 | | |
93 | | - | |
94 | 94 | | |
95 | | - | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
96 | 98 | | |
97 | 99 | | |
98 | 100 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| |||
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | 23 | | |
29 | 24 | | |
30 | 25 | | |
| |||
54 | 49 | | |
55 | 50 | | |
56 | 51 | | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
71 | 63 | | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
| 64 | + | |
82 | 65 | | |
83 | 66 | | |
84 | 67 | | |
| |||
100 | 83 | | |
101 | 84 | | |
102 | 85 | | |
103 | | - | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
| 86 | + | |
| 87 | + | |
114 | 88 | | |
115 | 89 | | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
| 90 | + | |
120 | 91 | | |
121 | | - | |
122 | | - | |
123 | 92 | | |
124 | 93 | | |
125 | | - | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
133 | 97 | | |
134 | 98 | | |
135 | 99 | | |
| |||
148 | 112 | | |
149 | 113 | | |
150 | 114 | | |
151 | | - | |
152 | 115 | | |
153 | 116 | | |
154 | 117 | | |
| |||
Lines changed: 10 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
95 | 96 | | |
96 | 97 | | |
97 | 98 | | |
98 | | - | |
99 | | - | |
100 | | - | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
101 | 103 | | |
102 | 104 | | |
103 | 105 | | |
| |||
119 | 121 | | |
120 | 122 | | |
121 | 123 | | |
122 | | - | |
| 124 | + | |
| 125 | + | |
123 | 126 | | |
124 | 127 | | |
125 | 128 | | |
126 | 129 | | |
127 | 130 | | |
128 | | - | |
| 131 | + | |
129 | 132 | | |
130 | 133 | | |
131 | 134 | | |
132 | 135 | | |
133 | 136 | | |
134 | 137 | | |
135 | | - | |
| 138 | + | |
136 | 139 | | |
137 | | - | |
| 140 | + | |
138 | 141 | | |
139 | 142 | | |
140 | 143 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
0 commit comments