Skip to content

Commit 264f2cc

Browse files
HaoLi980405HelenJia98zbb200819yxkyong
authored
[Feat] add cuda topk and gsa descriptions (#198)
* cuda_topk * 适配kv_block_size和IOsize * clean code * merge bug * mutli-bs bug * open_gsa deal * add GSA description framework * mutli bs deal * clean code * clean code * gsa status deal * add init file --------- Co-authored-by: xujia <42216276@qq.com> Co-authored-by: zbb200819 <1130072360@qq.com> Co-authored-by: yxkyong <1033480555@qq.com>
1 parent d5b735c commit 264f2cc

File tree

6 files changed

+240
-106
lines changed

6 files changed

+240
-106
lines changed
Lines changed: 63 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,63 @@
1-
# Gsa
1+
# 🌟 GSA: Geometric Sparse Attention for Efficient Inference of Large Models
2+
3+
## 🔍 Overview
4+
5+
GSA (Geometric Sparse Attention) simultaneously tackles the high computational complexity of long sequences and the concurrency limitations imposed by the HBM capacity wall.
6+
7+
8+
## 🎯 Key Innovations
9+
10+
- Representation-based Sparse Selection
11+
12+
- Efficient KV Transition
13+
14+
- Cross-hardware Support
15+
16+
- Request-level Sparse Strategy
17+
18+
- P+D Multi-stage Sparsity
19+
20+
21+
## 🔥 Key Results
22+
23+
### 🏆 Performance Highlights
24+
25+
### 📈 Accuracy Benchmarks
26+
27+
## 🧠 How It Works
28+
29+
### Core Algorithm
30+
31+
## 🚦 Quick Start
32+
33+
34+
### Basic Usage
35+
Similr to UCM's `offline_inference_esa.py` examples. We only need to specify `ucm_sparse_method` to be `GSA` as shown below.
36+
37+
38+
```python
39+
...
40+
ktc = KVTransferConfig(
41+
kv_connector=name,
42+
kv_connector_module_path=module_path,
43+
kv_role="kv_both",
44+
kv_connector_extra_config={
45+
"ucm_connector_name": "UcmDram",
46+
"ucm_connector_config": {
47+
"max_cache_size": 5368709120,
48+
"kv_block_size": 262144,
49+
},
50+
"ucm_sparse_method": "GSA",
51+
},
52+
)
53+
...
54+
```
55+
56+
57+
## 📊 Supported Models
58+
59+
| Model | Size | Support |
60+
|-------|------|-----------|
61+
| Qwen3-32B | 32B ||
62+
| QwQ-32B | 32B ||
63+
| DeepSeek-R1 | 671B ||

ucm/integration/vllm/ucm_sparse/__init__.py

Whitespace-only changes.

ucm/ucm_sparse/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)