You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# 🌟 GSA: Geometric Sparse Attention for Efficient Inference of Large Models
2
+
3
+
## 🔍 Overview
4
+
5
+
GSA (Geometric Sparse Attention) simultaneously tackles the high computational complexity of long sequences and the concurrency limitations imposed by the HBM capacity wall.
6
+
7
+
8
+
## 🎯 Key Innovations
9
+
10
+
- Representation-based Sparse Selection
11
+
12
+
- Efficient KV Transition
13
+
14
+
- Cross-hardware Support
15
+
16
+
- Request-level Sparse Strategy
17
+
18
+
- P+D Multi-stage Sparsity
19
+
20
+
21
+
## 🔥 Key Results
22
+
23
+
### 🏆 Performance Highlights
24
+
25
+
### 📈 Accuracy Benchmarks
26
+
27
+
## 🧠 How It Works
28
+
29
+
### Core Algorithm
30
+
31
+
## 🚦 Quick Start
32
+
33
+
34
+
### Basic Usage
35
+
Similr to UCM's `offline_inference_esa.py` examples. We only need to specify `ucm_sparse_method` to be `GSA` as shown below.
0 commit comments