You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/user-guide/sparse-attention/kvstar.md
+9-8Lines changed: 9 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,16 +67,17 @@ ktc = KVTransferConfig(
67
67
68
68
69
69
70
-
71
-
72
70
## 🔥 Results
73
71
The following results were obtained using `Qwen2.5-14B-Instruct` under the hyperparameters in `examples/offline_inference_kvstar.py`.
74
72
73
+
### 📈 Accuracy
74
+
We use [LongBench](https://huggingface.co/datasets/zai-org/LongBench) to evaluate the accuracy (F1-score) of the KVstar algorithm on H20 GPU. The model is `Qwen2.5-14B-Instruct`
75
+
76
+
|| Dataset | full Attention | KVstar (25% KVcache on GPU) |
77
+
|-------|-----------|-----------|-------|
78
+
| H20 GPU | dureader | 32.20 | 29.93 |
79
+
| Ascend 910B NPU | dureader | 32.46 | 31.08 |
80
+
75
81
### 🏆 Performance
76
82
77
-
### 📈 Accuracy
78
-
We use [LongBench](https://huggingface.co/datasets/zai-org/LongBench) to evaluate the accuracy of the ESA algorithm.
79
-
| Dataset | F1-Score for full Attention | F1-Score for KVstar |
0 commit comments