Skip to content

Commit 9ec5136

Browse files
authored
Update gsa.md
1 parent 18ca373 commit 9ec5136

File tree

1 file changed

+1
-1
lines changed
  • docs/source/user-guide/sparse-attention

1 file changed

+1
-1
lines changed

docs/source/user-guide/sparse-attention/gsa.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ GSA (Geometric Sparse Attention) simultaneously tackles the high computational c
2929

3030

3131
## 🔥 Key Results
32-
In both performance and accuracy evaluations, we employed the DeepSeek-R1-Distill-Qwen-32B model deployed on two H20 GPUs.
32+
In both performance and accuracy evaluations, we deployed the DeepSeek-R1-Distill-Qwen-32B model on two H20 GPUs.
3333
## 🏆 Performance Highlights
3434
### End-to-End Performance with 80 % Prefix-Cache Hit Ratio
3535
Below are the end-to-end throughput results for inference scenarios without KVCache offloading. PC Baseline refers to the full attention method with an 80% prefix cache hit rate. The GSA method sparsifies each input request to 6K tokens, and in the experiments, each request generates 4K tokens of output.

0 commit comments

Comments
 (0)