Skip to content

Commit 5447a03

Browse files
authored
[Feature][main]reconstruction kvpool connector to ascend connector (#4438)
### What this PR does / why we need it? 1.In short, we renamed the existing MooncakeStoreConnector to AscendStoreConnector and extracted the storage engine interaction logic into a new Backend class. Associated RFC:#4329 2.Fixed the issue where the number of input parameters for the connector was incorrect, introduced in vllm 0.11.2 ### Does this PR introduce _any_ user-facing change? change MooncakeStoreConnector to AscendStoreConnector ### How was this patch tested? - vLLM version: v0.11.2 --------- Signed-off-by: fems14 <1804143737@qq.com>
1 parent 554f16a commit 5447a03

25 files changed

+1490
-1512
lines changed

docs/source/user_guide/feature_guide/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,6 @@ lora
1414
eplb_swift_balancer
1515
netloader
1616
dynamic_batch
17-
kv_pool_mooncake
17+
kv_pool
1818
external_dp
1919
:::

docs/source/user_guide/feature_guide/kv_pool_mooncake.md renamed to docs/source/user_guide/feature_guide/kv_pool.md

Lines changed: 36 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Mooncacke Store Deployment Guide
1+
# Ascend Store Deployment Guide
22

33
## Environmental Dependencies
44

@@ -8,27 +8,30 @@
88
* PyTorch >= 2.7.1, torch-npu >= 2.7.1.dev20250724
99
* vLLM:main branch
1010
* vLLM-Ascend:main branch
11-
* Mooncake:main branch
12-
13-
Installation and Compilation Guide:https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries
14-
15-
Make sure to build with `-DUSE_ASCEND_DIRECT` to enable ADXL engine.
16-
17-
An example command for compiling ADXL:
18-
19-
`rm -rf build && mkdir -p build && cd build \ && cmake .. -DCMAKE_INSTALL_PREFIX=/opt/transfer-engine/ -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DUSE_ASCEND_DIRECT=ON -DBUILD_SHARED_LIBS=ON -DBUILD_UNIT_TESTS=OFF \ && make -j \ && make install`
20-
21-
Also, you need to set environment variables to point to them `export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64/python3.11/site-packages/mooncake`, or copy the .so files to the `/usr/local/lib64` directory after compilation
2211

2312
### KV Pooling Parameter Description
2413
**kv_connector_extra_config**: Additional Configurable Parameters for Pooling.
25-
**mooncake_rpc_port**: Port for RPC Communication Between Pooling Scheduler Process and Worker Process: Each Instance Requires a Unique Port Configuration.
14+
**lookup_rpc_port**: Port for RPC Communication Between Pooling Scheduler Process and Worker Process: Each Instance Requires a Unique Port Configuration.
2615
**load_async**: Whether to Enable Asynchronous Loading. The default value is false.
27-
**register_buffer**: Whether to Register Video Memory with the Backend. Registration is Not Required When Used with MooncakeConnectorV1; It is Required in All Other Cases. The Default Value is false.
16+
**backend**: Set the storage backend for kvpool, with the default being mooncake.
17+
18+
## Example of using Mooncake as a KVCache pooling backend
19+
* Software:
20+
* Mooncake:main branch
21+
22+
Installation and Compilation Guide:https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries
2823

29-
## Run Mooncake Master
24+
Make sure to build with `-DUSE_ASCEND_DIRECT` to enable ADXL engine.
3025

31-
### 1.Configure mooncake.json
26+
An example command for compiling ADXL:
27+
28+
`rm -rf build && mkdir -p build && cd build \ && cmake .. -DCMAKE_INSTALL_PREFIX=/opt/transfer-engine/ -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DUSE_ASCEND_DIRECT=ON -DBUILD_SHARED_LIBS=ON -DBUILD_UNIT_TESTS=OFF \ && make -j \ && make install`
29+
30+
Also, you need to set environment variables to point to them `export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64/python3.11/site-packages/mooncake`, or copy the .so files to the `/usr/local/lib64` directory after compilation
31+
32+
### run mooncake master
33+
34+
#### 1.Configure mooncake.json
3235

3336
The environment variable **MOONCAKE_CONFIG_PATH** is configured to the full path where mooncake.json is located.
3437

@@ -54,7 +57,7 @@ The environment variable **MOONCAKE_CONFIG_PATH** is configured to the full path
5457
**master_server_address**: Configured with the IP and port of the master service.
5558
**global_segment_size**: Expands the kvcache size registered by the PD node to the master.
5659

57-
### 2. Start mooncake_master
60+
#### 2. Start mooncake_master
5861

5962
Under the mooncake folder:
6063

@@ -64,9 +67,9 @@ mooncake_master --port 50088 --eviction_high_watermark_ratio 0.95 --eviction_rat
6467

6568
`eviction_high_watermark_ratio` determines the watermark where Mooncake Store will perform eviction,and `eviction_ratio` determines the portion of stored objects that would be evicted.
6669

67-
## Pooling and Prefill Decode Disaggregate Scenario
70+
### Pooling and Prefill Decode Disaggregate Scenario
6871

69-
### 1.Run `prefill` Node and `decode` Node
72+
#### 1.Run `prefill` Node and `decode` Node
7073

7174
Using MultiConnector to simultaneously utilize both p2p connectors and pooled connectors. P2P performs kv_transfer, while pooling creates a larger prefix-cache.
7275

@@ -123,9 +126,10 @@ python3 -m vllm.entrypoints.openai.api_server \
123126
}
124127
},
125128
{
126-
"kv_connector": "MooncakeConnectorStoreV1",
129+
"kv_connector": "AscendStoreConnector",
127130
"kv_role": "kv_producer",
128-
"mooncake_rpc_port":"0"
131+
"lookup_rpc_port":"0",
132+
"backend": "mooncake"
129133
}
130134
]
131135
}
@@ -185,16 +189,17 @@ python3 -m vllm.entrypoints.openai.api_server \
185189
}
186190
},
187191
{
188-
"kv_connector": "MooncakeConnectorStoreV1",
192+
"kv_connector": "AscendStoreConnector",
189193
"kv_role": "kv_consumer",
190-
"mooncake_rpc_port":"1"
194+
"lookup_rpc_port":"1",
195+
"backend": "mooncake"
191196
}
192197
]
193198
}
194199
}' > d.log 2>&1
195200
```
196201

197-
### 2、Start proxy_server.
202+
#### 2、Start proxy_server.
198203

199204
```
200205
bash proxy.sh
@@ -212,7 +217,7 @@ python vllm-ascend/examples/disaggregated_prefill_v1/load_balance_proxy_server_e
212217
--decoder-ports 8200 \
213218
```
214219

215-
### 3. Run Inference
220+
#### 3. Run Inference
216221

217222
Configure the localhost, port, and model weight path in the command to your own settings.
218223

@@ -228,9 +233,9 @@ Long question:
228233
curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{ "model": "/xxxxx/Qwen2.5-7B-Instruct", "prompt": "Given the accelerating impacts of climate change—including rising sea levels, increasing frequency of extreme weather events, loss of biodiversity, and adverse effects on agriculture and human health—there is an urgent need for a robust, globally coordinated response. However, international efforts are complicated by a range of factors: economic disparities between high-income and low-income countries, differing levels of industrialization, varying access to clean energy technologies, and divergent political systems that influence climate policy implementation. In this context, how can global agreements like the Paris Accord be redesigned or strengthened to not only encourage but effectively enforce emission reduction targets? Furthermore, what mechanisms can be introduced to promote fair and transparent technology transfer, provide adequate financial support for climate adaptation in vulnerable regions, and hold nations accountable without exacerbating existing geopolitical tensions or disproportionately burdening those with historically lower emissions?", "max_tokens": 256, "temperature":0.0 }'
229234
```
230235

231-
## Pooling and Mixed Deployment Scenario
236+
### Pooling and Mixed Deployment Scenario
232237

233-
### 1、Run Mixed Department Script
238+
#### 1、Run Mixed Department Script
234239

235240
The mixed script is essentially a pure pooling scenario for the P node.
236241

@@ -263,19 +268,17 @@ python3 -m vllm.entrypoints.openai.api_server \
263268
--max-num-batched-tokens 4096 \
264269
--kv-transfer-config \
265270
'{
266-
"kv_connector": "MooncakeConnectorStoreV1",
271+
"kv_connector": "AscendStoreConnector",
267272
"kv_role": "kv_both",
268273
"kv_connector_extra_config": {
269-
"register_buffer": true,
270274
"use_layerwise": false,
271-
"mooncake_rpc_port":"0"
275+
"lookup_rpc_port":"1",
276+
"backend": "mooncake"
272277
}
273278
}' > mix.log 2>&1
274279
```
275280

276-
`register_buffer` is set to `false` by default and need to be set to `true` only in PD-mixed scenario.
277-
278-
### 2. Run Inference
281+
#### 2. Run Inference
279282

280283
Configure the localhost, port, and model weight path in the command to your own settings. The requests sent will only go to the port where the mixed deployment script is located, and there is no need to start a separate proxy.
281284

tests/ut/distributed/mooncake/test_config_data.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
1+
import sys
2+
import types
13
import unittest
4+
from unittest.mock import MagicMock
25

3-
from vllm_ascend.distributed.mooncake.config_data import (
6+
fake_engine = types.ModuleType("mooncake.engine")
7+
fake_engine.TransferEngine = MagicMock() # type: ignore[attr-defined]
8+
sys.modules["mooncake.engine"] = fake_engine
9+
10+
from vllm_ascend.distributed.kvpool.backend.mooncake_backend import ( # noqa: E402
411
_convert_to_bytes, _parse_global_segment_size)
512

613

tests/ut/kv_connector/test_mooncake_connector.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1051,7 +1051,7 @@ def setUp(self):
10511051
'vllm_ascend.distributed.mooncake_connector.string_to_int64_hash',
10521052
mock_string_to_int64_hash),
10531053
patch(
1054-
'vllm_ascend.distributed.mooncake.transfer_engine.TransferEngine',
1054+
'vllm_ascend.distributed.mooncake_transfer_engine.TransferEngine',
10551055
return_value=self.mock_transfer_engine),
10561056
patch(
10571057
'vllm_ascend.distributed.mooncake_connector.KVCacheSendingThread',

vllm_ascend/distributed/__init__.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,13 @@ def register_connector():
3131

3232
KVConnectorFactory.register_connector(
3333
"MooncakeConnectorStoreV1",
34-
"vllm_ascend.distributed.mooncake.mooncake_store_connector_v1",
35-
"MooncakeConnectorV1")
34+
"vllm_ascend.distributed.kvpool.ascend_store_connector",
35+
"AscendStoreConnector")
36+
37+
KVConnectorFactory.register_connector(
38+
"AscendStoreConnector",
39+
"vllm_ascend.distributed.kvpool.ascend_store_connector",
40+
"AscendStoreConnector")
3641

3742
KVConnectorFactory.register_connector(
3843
"MooncakeLayerwiseConnector",

vllm_ascend/distributed/cpu_offload_connector.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
from vllm.attention.backends.abstract import AttentionMetadata
3030
from vllm.forward_context import ForwardContext
3131
from vllm.v1.core.kv_cache_manager import KVCacheBlocks
32+
from vllm.v1.kv_cache_interface import KVCacheConfig
3233
from vllm.v1.request import Request
3334

3435

@@ -58,7 +59,10 @@ class CPUOffloadingConnectorMetadata(KVConnectorMetadata):
5859

5960
class CPUOffloadingConnector(KVConnectorBase_V1):
6061

61-
def __init__(self, vllm_config: "VllmConfig", role: KVConnectorRole):
62+
def __init__(self,
63+
vllm_config: VllmConfig,
64+
role: KVConnectorRole,
65+
kv_cache_config: Optional[KVCacheConfig] = None):
6266
if not vllm_config.cache_config.enable_prefix_caching:
6367
self.connector_scheduler: Optional[
6468
CPUOffloadingConnectorScheduler] = None
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

0 commit comments

Comments
 (0)