Skip to content

Commit 919547d

Browse files
committed
Update documentation from main repository
1 parent c8106fc commit 919547d

File tree

9 files changed

+187
-320
lines changed

9 files changed

+187
-320
lines changed

docs/api/cli.md

Lines changed: 148 additions & 281 deletions
Large diffs are not rendered by default.

docs/api/intro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ sidebar_position: 1
66

77
Welcome to the ServerlessLLM API documentation. This section contains detailed information about the various APIs provided by ServerlessLLM:
88

9-
- [CLI API](./cli.md) - Documentation for the `sllm-cli` command-line interface
9+
- [CLI API](./cli.md) - Documentation for the `sllm` command-line interface

docs/stable/deployment/multi_machine.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -186,22 +186,22 @@ This output confirms that both the head node and worker node are properly connec
186186
3. Ensure each worker has its own `RAY_NODE_IP` set correctly
187187
:::
188188

189-
### Step 3: Use `sllm-cli` to manage models
189+
### Step 3: Use `sllm` to manage models
190190

191191
#### Configure the Environment
192192

193-
**On any machine with `sllm-cli` installed, set the `LLM_SERVER_URL` environment variable:**
193+
**On any machine with `sllm` installed, set the `LLM_SERVER_URL` environment variable:**
194194

195195
> Replace `<HEAD_NODE_IP>` with the actual IP address of the head node.
196196
197197
```bash
198198
export LLM_SERVER_URL=http://<HEAD_NODE_IP>:8343
199199
```
200200

201-
#### Deploy a Model Using `sllm-cli`
201+
#### Deploy a Model Using `sllm`
202202

203203
```bash
204-
sllm-cli deploy --model facebook/opt-1.3b
204+
sllm deploy --model facebook/opt-1.3b
205205
```
206206

207207
> Note: This command will spend some time downloading the model from the Hugging Face Model Hub. You can use any model from the [Hugging Face Model Hub](https://huggingface.co/models) by specifying the model name in the `--model` argument.
@@ -238,12 +238,12 @@ Expected output:
238238
{"id":"chatcmpl-23d3c0e5-70a0-4771-acaf-bcb2851c6ea6","object":"chat.completion","created":1721706121,"model":"facebook/opt-1.3b","choices":[{"index":0,"message":{"role":"assistant","content":"system: You are a helpful assistant.\nuser: What is your name?\nsystem: I am a helpful assistant.\n"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":26,"total_tokens":42}}
239239
```
240240

241-
#### Delete a Deployed Model Using `sllm-cli`
241+
#### Delete a Deployed Model Using `sllm`
242242

243243
When you're done using a model, you can delete it:
244244

245245
```bash
246-
sllm-cli delete facebook/opt-1.3b
246+
sllm delete facebook/opt-1.3b
247247
```
248248

249249
This will remove the specified model from the ServerlessLLM server.

docs/stable/deployment/single_machine.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -158,18 +158,18 @@ I20241231 17:13:25.557906 2165054 checkpoint_store.cpp:83] Memory pool created w
158158
INFO 12-31 17:13:25 server.py:243] Starting gRPC server on 0.0.0.0:8073
159159
```
160160

161-
### 3. Start ServerlessLLM Serve
161+
### 3. Start ServerlessLLM
162+
163+
Now, start the ServerlessLLM service process using `sllm start`.
162164

163-
Now, start the ServerlessLLM Serve process ( `sllm-serve`).
164165

165166
Open a new terminal and run:
166167

167168
```bash
168-
conda activate sllm
169-
sllm-serve start
169+
sllm start
170170
```
171171

172-
At this point, you should have four terminals open: one for the Ray head node, one for the Ray worker node, one for the ServerlessLLM Store server, and one for ServerlessLLM Serve.
172+
At this point, you should have four terminals open: one for the Ray head node, one for the Ray worker node, one for the ServerlessLLM Store server, and one for the ServerlessLLM service (started via `sllm start`).
173173

174174
### 4. Deploy a Model
175175

@@ -179,7 +179,7 @@ Open a new terminal and run:
179179

180180
```bash
181181
conda activate sllm
182-
sllm-cli deploy --model facebook/opt-1.3b
182+
sllm deploy --model facebook/opt-1.3b
183183
```
184184

185185
This command downloads the specified model from Hugging Face Hub. To load a model from a local path, you can use a `config.json` file. Refer to the [CLI API documentation](../../api/cli.md#example-configuration-file-configjson) for details.
@@ -211,7 +211,7 @@ Expected output:
211211
To delete a deployed model, use the following command:
212212

213213
```bash
214-
sllm-cli delete facebook/opt-1.3b
214+
sllm delete facebook/opt-1.3b
215215
```
216216

217217
This command removes the specified model from the ServerlessLLM server.

docs/stable/deployment/slurm_cluster.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -67,15 +67,15 @@ compute up 2 down infinite JobNode[16-17]
6767
Only one JobNode is enough.
6868

6969
**`sbatch` Node Selection**
70+
Let's start a head on the main job node (`JobNode01`) and add the worker on other job node (`JobNode02`). The head and the worker should be on different job nodes to avoid resource contention. The `sllm-store` should be started on the job node that runs worker (`JobNode02`), for passing the model weights, and the `sllm start` should be started on the main job node (`JobNode01`), finally you can use `sllm` to manage the models on the login node.
7071

71-
Let's start a head on the main job node (`JobNode01`) and add the worker on other job node (`JobNode02`). The head and the worker should be on different job nodes to avoid resource contention. The `sllm-store` should be started on the job node that runs worker (`JobNode02`), for passing the model weights, and the `sllm-serve` should be started on the main job node (`JobNode01`), finally you can use `sllm-cli` to manage the models on the login node.
7272

7373
Note: `JobNode02` requires GPU, but `JobNode01` does not.
7474
- **Head**: JobNode01
7575
- **Worker**: JobNode02
7676
- **sllm-store**: JobNode02
7777
- **sllm-serve**: JobNode01
78-
- **sllm-cli**: Login Node
78+
- **sllm**: Login Node
7979

8080
---
8181
## SRUN
@@ -167,7 +167,7 @@ In the 5th window, let's deploy a model to the ServerlessLLM server. You can dep
167167
```shell
168168
source /opt/conda/bin/activate
169169
conda activate sllm
170-
sllm-cli deploy --model facebook/opt-1.3b --backend transformers
170+
sllm deploy --model facebook/opt-1.3b --backend transformers
171171
```
172172
This will download the model from HuggingFace transformers. After deploying, you can query the model by any OpenAI API client. For example, you can use the following Python code to query the model:
173173
```shell
@@ -189,7 +189,7 @@ Expected output:
189189
### Step 5: Clean up
190190
To delete a deployed model, use the following command:
191191
```shell
192-
sllm-cli delete facebook/opt-1.3b
192+
sllm delete facebook/opt-1.3b
193193
```
194194
This will remove the specified model from the ServerlessLLM server.
195195

@@ -350,8 +350,8 @@ We will start the worker node and store in the same script. Because the server l
350350
351351
conda activate sllm
352352
353-
sllm-serve start --host <HEAD_NODE_IP>
354-
# sllm-serve start --host <HEAD_NODE_IP> --port <avail_port> # if you have changed the port
353+
sllm start --host <HEAD_NODE_IP>
354+
# sllm start --host <HEAD_NODE_IP> --port <avail_port> # if you have changed the port
355355
```
356356
- Replace `your_partition` in the script as before.
357357
- Replace `/path/to/ServerlessLLM` as before.
@@ -369,17 +369,17 @@ We will start the worker node and store in the same script. Because the server l
369369
INFO: Application startup complete.
370370
INFO: Uvicorn running on http://xxx.xxx.xx.xx:8343 (Press CTRL+C to quit)
371371
```
372-
### Step 4: Use sllm-cli to manage models
372+
### Step 4: Use sllm to manage models
373373
1. **You can do this step on login node, and set the ```LLM_SERVER_URL``` environment variable:**
374374
```shell
375375
$ conda activate sllm
376376
(sllm)$ export LLM_SERVER_URL=http://<HEAD_NODE_IP>:8343
377377
```
378378
- Replace `<HEAD_NODE_IP>` with the actual IP address of the head node.
379379
- Replace ```8343``` with the actual port number (`<avail_port>` in Step1) if you have changed it.
380-
2. **Deploy a Model Using ```sllm-cli```**
380+
2. **Deploy a Model Using ```sllm```**
381381
```shell
382-
(sllm)$ sllm-cli deploy --model facebook/opt-1.3b
382+
(sllm)$ sllm deploy --model facebook/opt-1.3b
383383
```
384384
### Step 5: Query the Model Using OpenAI API Client
385385
**You can use the following command to query the model:**

docs/stable/features/live_migration.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -72,8 +72,8 @@ export LLM_SERVER_URL=http://127.0.0.1:8343
7272

7373
Deploy the models:
7474
```bash
75-
sllm-cli deploy --config config-qwen-1.5b.json
76-
sllm-cli deploy --config config-qwen-3b.json
75+
sllm deploy --config config-qwen-1.5b.json
76+
sllm deploy --config config-qwen-3b.json
7777
```
7878

7979
3. **Verify the Deployment**
@@ -139,8 +139,8 @@ export LLM_SERVER_URL=http://127.0.0.1:8343
139139
Deploy the models:
140140

141141
```bash
142-
sllm-cli deploy --config config-qwen-1.5b.json
143-
sllm-cli deploy --config config-qwen-3b.json
142+
sllm deploy --config config-qwen-1.5b.json
143+
sllm deploy --config config-qwen-3b.json
144144
```
145145

146146
3. **Verify the Deployment**

docs/stable/features/peft_lora_serving.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ export LLM_SERVER_URL=http://127.0.0.1:8343
7171
```
7272
2. Deploy models with specified lora adapters.
7373
```bash
74-
sllm-cli deploy --model facebook/opt-125m --backend transformers --enable-lora --lora-adapters demo_lora1=peft-internal-testing/opt-125m-dummy-lora demo_lora2=monsterapi/opt125M_alpaca
74+
sllm deploy --model facebook/opt-125m --backend transformers --enable-lora --lora-adapters "demo_lora1=peft-internal-testing/opt-125m-dummy-lora demo_lora2=monsterapi/opt125M_alpaca"
7575
```
7676
3. Verify the deployment.
7777
```bash
@@ -99,16 +99,16 @@ curl $LLM_SERVER_URL/v1/chat/completions \
9999
}'
100100
```
101101
### Step 5: Update LoRA Adapters
102-
If you wish to switch to a different set of LoRA adapters, you can still use `sllm-cli deploy` command with updated adapter configurations. ServerlessLLM will automatically reload the new adapters without restarting the backend.
102+
If you wish to switch to a different set of LoRA adapters, you can still use `sllm deploy` command with updated adapter configurations. ServerlessLLM will automatically reload the new adapters without restarting the backend.
103103
```bash
104-
sllm-cli deploy --model facebook/opt-125m --backend transformers --enable-lora --lora-adapters demo-lora1=edbeeching/opt-125m-lora demo-lora2=Hagatiana/opt-125m-lora
104+
sllm deploy --model facebook/opt-125m --backend transformers --enable-lora --lora-adapters "demo-lora1=edbeeching/opt-125m-lora demo-lora2=Hagatiana/opt-125m-lora"
105105
```
106106

107107
### Step 6: Clean Up
108108

109109
Delete the lora adapters by running the following command (this command will only delete lora adapters, the base model won't be deleted):
110110
```bash
111-
sllm-cli delete facebook/opt-125m --lora-adapters demo-lora1 demo-lora2
111+
sllm delete facebook/opt-125m --lora-adapters "demo-lora1 demo-lora2"
112112
```
113113
If you need to stop and remove the containers, you can use the following commands:
114114
```bash

docs/stable/features/storage_aware_scheduling.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,8 +70,8 @@ In the `examples/storage_aware_scheduling` directory, the example configuration
7070
conda activate sllm
7171
export LLM_SERVER_URL=http://127.0.0.1:8343
7272

73-
sllm-cli deploy --config config-opt-2.7b.json
74-
sllm-cli deploy --config config-opt-1.3b.json
73+
sllm deploy --config config-opt-2.7b.json
74+
sllm deploy --config config-opt-1.3b.json
7575
```
7676

7777
3. Verify the deployment.
@@ -112,7 +112,7 @@ As shown in the log message, the model "facebook/opt-2.7b" is scheduled on serve
112112
Delete the model deployment by running the following command:
113113

114114
```bash
115-
sllm-cli delete facebook/opt-1.3b facebook/opt-2.7b
115+
sllm delete facebook/opt-1.3b facebook/opt-2.7b
116116
```
117117

118118
If you need to stop and remove the containers, you can use the following commands:

docs/stable/getting_started.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ sidebar_position: 1
44

55
# Getting Started
66

7-
This guide demonstrates how to quickly set up a local ServerlessLLM cluster using Docker Compose on a single machine. We will initialize a minimal cluster, consisting of a head node and a single worker node. Then, we'll deploy a model using the `sllm-cli` and query the deployment through an OpenAI-compatible API.
7+
This guide demonstrates how to quickly set up a local ServerlessLLM cluster using Docker Compose on a single machine. We will initialize a minimal cluster, consisting of a head node and a single worker node. Then, we'll deploy a model using the `sllm` and query the deployment through an OpenAI-compatible API.
88

99
:::note
1010
We strongly recommend using Docker (Compose) to manage your ServerlessLLM cluster, whether you are using ServerlessLLM for testing or development. However, if Docker is not a viable option for you, please refer to the [deploy from scratch guide](./deployment/single_machine.md).
@@ -77,18 +77,18 @@ INFO: Uvicorn running on http://0.0.0.0:8343 (Press CTRL+C to quit)
7777
(FcfsScheduler pid=1604) INFO 05-26 15:40:49 fcfs_scheduler.py:111] Starting control loop
7878
```
7979

80-
## Deploy a Model Using sllm-cli
80+
## Deploy a Model Using sllm
8181

8282
Set the `LLM_SERVER_URL` environment variable:
8383

8484
```bash
8585
export LLM_SERVER_URL=http://127.0.0.1:8343
8686
```
8787

88-
Deploy a model to the ServerlessLLM cluster using the `sllm-cli`:
88+
Deploy a model to the ServerlessLLM cluster using the `sllm`:
8989

9090
```bash
91-
sllm-cli deploy --model facebook/opt-1.3b
91+
sllm deploy --model facebook/opt-1.3b
9292
```
9393
> Note: This command will take some time to download the model from the Hugging Face Model Hub.
9494
> You can use any model from the [Hugging Face Model Hub](https://huggingface.co/models) by specifying its name in the `--model` argument.
@@ -125,7 +125,7 @@ Expected output:
125125
To delete a deployed model, execute the following command:
126126

127127
```bash
128-
sllm-cli delete facebook/opt-1.3b
128+
sllm delete facebook/opt-1.3b
129129
```
130130

131131
This command removes the specified model from the ServerlessLLM server.

0 commit comments

Comments
 (0)