You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Feature][main]reconstruction kvpool connector to ascend connector (#4438)
### What this PR does / why we need it?
1.In short, we renamed the existing MooncakeStoreConnector to
AscendStoreConnector and extracted the storage engine interaction logic
into a new Backend class.
Associated RFC:#4329
2.Fixed the issue where the number of input parameters for the connector
was incorrect, introduced in vllm 0.11.2
### Does this PR introduce _any_ user-facing change?
change MooncakeStoreConnector to AscendStoreConnector
### How was this patch tested?
- vLLM version: v0.11.2
---------
Signed-off-by: fems14 <1804143737@qq.com>
Installation and Compilation Guide:https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries
14
-
15
-
Make sure to build with `-DUSE_ASCEND_DIRECT` to enable ADXL engine.
16
-
17
-
An example command for compiling ADXL:
18
-
19
-
`rm -rf build && mkdir -p build && cd build \ && cmake .. -DCMAKE_INSTALL_PREFIX=/opt/transfer-engine/ -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DUSE_ASCEND_DIRECT=ON -DBUILD_SHARED_LIBS=ON -DBUILD_UNIT_TESTS=OFF \ && make -j \ && make install`
20
-
21
-
Also, you need to set environment variables to point to them `export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64/python3.11/site-packages/mooncake`, or copy the .so files to the `/usr/local/lib64` directory after compilation
22
11
23
12
### KV Pooling Parameter Description
24
13
**kv_connector_extra_config**: Additional Configurable Parameters for Pooling.
25
-
**mooncake_rpc_port**: Port for RPC Communication Between Pooling Scheduler Process and Worker Process: Each Instance Requires a Unique Port Configuration.
14
+
**lookup_rpc_port**: Port for RPC Communication Between Pooling Scheduler Process and Worker Process: Each Instance Requires a Unique Port Configuration.
26
15
**load_async**: Whether to Enable Asynchronous Loading. The default value is false.
27
-
**register_buffer**: Whether to Register Video Memory with the Backend. Registration is Not Required When Used with MooncakeConnectorV1; It is Required in All Other Cases. The Default Value is false.
16
+
**backend**: Set the storage backend for kvpool, with the default being mooncake.
17
+
18
+
## Example of using Mooncake as a KVCache pooling backend
19
+
* Software:
20
+
* Mooncake:main branch
21
+
22
+
Installation and Compilation Guide:https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries
28
23
29
-
## Run Mooncake Master
24
+
Make sure to build with `-DUSE_ASCEND_DIRECT` to enable ADXL engine.
30
25
31
-
### 1.Configure mooncake.json
26
+
An example command for compiling ADXL:
27
+
28
+
`rm -rf build && mkdir -p build && cd build \ && cmake .. -DCMAKE_INSTALL_PREFIX=/opt/transfer-engine/ -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DUSE_ASCEND_DIRECT=ON -DBUILD_SHARED_LIBS=ON -DBUILD_UNIT_TESTS=OFF \ && make -j \ && make install`
29
+
30
+
Also, you need to set environment variables to point to them `export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64/python3.11/site-packages/mooncake`, or copy the .so files to the `/usr/local/lib64` directory after compilation
31
+
32
+
### run mooncake master
33
+
34
+
#### 1.Configure mooncake.json
32
35
33
36
The environment variable **MOONCAKE_CONFIG_PATH** is configured to the full path where mooncake.json is located.
34
37
@@ -54,7 +57,7 @@ The environment variable **MOONCAKE_CONFIG_PATH** is configured to the full path
54
57
**master_server_address**: Configured with the IP and port of the master service.
55
58
**global_segment_size**: Expands the kvcache size registered by the PD node to the master.
`eviction_high_watermark_ratio` determines the watermark where Mooncake Store will perform eviction,and `eviction_ratio` determines the portion of stored objects that would be evicted.
66
69
67
-
## Pooling and Prefill Decode Disaggregate Scenario
70
+
###Pooling and Prefill Decode Disaggregate Scenario
68
71
69
-
### 1.Run `prefill` Node and `decode` Node
72
+
####1.Run `prefill` Node and `decode` Node
70
73
71
74
Using MultiConnector to simultaneously utilize both p2p connectors and pooled connectors. P2P performs kv_transfer, while pooling creates a larger prefix-cache.
Configure the localhost, port, and model weight path in the command to your own settings.
218
223
@@ -228,9 +233,9 @@ Long question:
228
233
curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{ "model": "/xxxxx/Qwen2.5-7B-Instruct", "prompt": "Given the accelerating impacts of climate change—including rising sea levels, increasing frequency of extreme weather events, loss of biodiversity, and adverse effects on agriculture and human health—there is an urgent need for a robust, globally coordinated response. However, international efforts are complicated by a range of factors: economic disparities between high-income and low-income countries, differing levels of industrialization, varying access to clean energy technologies, and divergent political systems that influence climate policy implementation. In this context, how can global agreements like the Paris Accord be redesigned or strengthened to not only encourage but effectively enforce emission reduction targets? Furthermore, what mechanisms can be introduced to promote fair and transparent technology transfer, provide adequate financial support for climate adaptation in vulnerable regions, and hold nations accountable without exacerbating existing geopolitical tensions or disproportionately burdening those with historically lower emissions?", "max_tokens": 256, "temperature":0.0 }'
229
234
```
230
235
231
-
## Pooling and Mixed Deployment Scenario
236
+
###Pooling and Mixed Deployment Scenario
232
237
233
-
### 1、Run Mixed Department Script
238
+
####1、Run Mixed Department Script
234
239
235
240
The mixed script is essentially a pure pooling scenario for the P node.
`register_buffer` is set to `false` by default and need to be set to `true` only in PD-mixed scenario.
277
-
278
-
### 2. Run Inference
281
+
#### 2. Run Inference
279
282
280
283
Configure the localhost, port, and model weight path in the command to your own settings. The requests sent will only go to the port where the mixed deployment script is located, and there is no need to start a separate proxy.
0 commit comments