You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/stable/deployment/multi_machine.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -186,22 +186,22 @@ This output confirms that both the head node and worker node are properly connec
186
186
3. Ensure each worker has its own `RAY_NODE_IP` set correctly
187
187
:::
188
188
189
-
### Step 3: Use `sllm-cli` to manage models
189
+
### Step 3: Use `sllm` to manage models
190
190
191
191
#### Configure the Environment
192
192
193
-
**On any machine with `sllm-cli` installed, set the `LLM_SERVER_URL` environment variable:**
193
+
**On any machine with `sllm` installed, set the `LLM_SERVER_URL` environment variable:**
194
194
195
195
> Replace `<HEAD_NODE_IP>` with the actual IP address of the head node.
196
196
197
197
```bash
198
198
export LLM_SERVER_URL=http://<HEAD_NODE_IP>:8343
199
199
```
200
200
201
-
#### Deploy a Model Using `sllm-cli`
201
+
#### Deploy a Model Using `sllm`
202
202
203
203
```bash
204
-
sllm-cli deploy --model facebook/opt-1.3b
204
+
sllm deploy --model facebook/opt-1.3b
205
205
```
206
206
207
207
> Note: This command will spend some time downloading the model from the Hugging Face Model Hub. You can use any model from the [Hugging Face Model Hub](https://huggingface.co/models) by specifying the model name in the `--model` argument.
@@ -238,12 +238,12 @@ Expected output:
238
238
{"id":"chatcmpl-23d3c0e5-70a0-4771-acaf-bcb2851c6ea6","object":"chat.completion","created":1721706121,"model":"facebook/opt-1.3b","choices":[{"index":0,"message":{"role":"assistant","content":"system: You are a helpful assistant.\nuser: What is your name?\nsystem: I am a helpful assistant.\n"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":26,"total_tokens":42}}
239
239
```
240
240
241
-
#### Delete a Deployed Model Using `sllm-cli`
241
+
#### Delete a Deployed Model Using `sllm`
242
242
243
243
When you're done using a model, you can delete it:
244
244
245
245
```bash
246
-
sllm-cli delete facebook/opt-1.3b
246
+
sllm delete facebook/opt-1.3b
247
247
```
248
248
249
249
This will remove the specified model from the ServerlessLLM server.
Copy file name to clipboardExpand all lines: docs/stable/deployment/single_machine.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -158,18 +158,18 @@ I20241231 17:13:25.557906 2165054 checkpoint_store.cpp:83] Memory pool created w
158
158
INFO 12-31 17:13:25 server.py:243] Starting gRPC server on 0.0.0.0:8073
159
159
```
160
160
161
-
### 3. Start ServerlessLLM Serve
161
+
### 3. Start ServerlessLLM
162
+
163
+
Now, start the ServerlessLLM service process using `sllm start`.
162
164
163
-
Now, start the ServerlessLLM Serve process ( `sllm-serve`).
164
165
165
166
Open a new terminal and run:
166
167
167
168
```bash
168
-
conda activate sllm
169
-
sllm-serve start
169
+
sllm start
170
170
```
171
171
172
-
At this point, you should have four terminals open: one for the Ray head node, one for the Ray worker node, one for the ServerlessLLM Store server, and one for ServerlessLLM Serve.
172
+
At this point, you should have four terminals open: one for the Ray head node, one for the Ray worker node, one for the ServerlessLLM Store server, and one forthe ServerlessLLM service (started via `sllm start`).
173
173
174
174
### 4. Deploy a Model
175
175
@@ -179,7 +179,7 @@ Open a new terminal and run:
179
179
180
180
```bash
181
181
conda activate sllm
182
-
sllm-cli deploy --model facebook/opt-1.3b
182
+
sllm deploy --model facebook/opt-1.3b
183
183
```
184
184
185
185
This command downloads the specified model from Hugging Face Hub. To load a model from a local path, you can use a `config.json` file. Refer to the [CLI API documentation](../../api/cli.md#example-configuration-file-configjson) for details.
@@ -211,7 +211,7 @@ Expected output:
211
211
To delete a deployed model, use the following command:
212
212
213
213
```bash
214
-
sllm-cli delete facebook/opt-1.3b
214
+
sllm delete facebook/opt-1.3b
215
215
```
216
216
217
217
This command removes the specified model from the ServerlessLLM server.
Copy file name to clipboardExpand all lines: docs/stable/deployment/slurm_cluster.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,15 +67,15 @@ compute up 2 down infinite JobNode[16-17]
67
67
Only one JobNode is enough.
68
68
69
69
**`sbatch` Node Selection**
70
+
Let's start a head on the main job node (`JobNode01`) and add the worker on other job node (`JobNode02`). The head and the worker should be on different job nodes to avoid resource contention. The `sllm-store` should be started on the job node that runs worker (`JobNode02`), for passing the model weights, and the `sllm start` should be started on the main job node (`JobNode01`), finally you can use `sllm` to manage the models on the login node.
70
71
71
-
Let's start a head on the main job node (`JobNode01`) and add the worker on other job node (`JobNode02`). The head and the worker should be on different job nodes to avoid resource contention. The `sllm-store` should be started on the job node that runs worker (`JobNode02`), for passing the model weights, and the `sllm-serve` should be started on the main job node (`JobNode01`), finally you can use `sllm-cli` to manage the models on the login node.
72
72
73
73
Note: `JobNode02` requires GPU, but `JobNode01` does not.
74
74
-**Head**: JobNode01
75
75
-**Worker**: JobNode02
76
76
-**sllm-store**: JobNode02
77
77
-**sllm-serve**: JobNode01
78
-
-**sllm-cli**: Login Node
78
+
-**sllm**: Login Node
79
79
80
80
---
81
81
## SRUN
@@ -167,7 +167,7 @@ In the 5th window, let's deploy a model to the ServerlessLLM server. You can dep
This will download the model from HuggingFace transformers. After deploying, you can query the model by any OpenAI API client. For example, you can use the following Python code to query the model:
173
173
```shell
@@ -189,7 +189,7 @@ Expected output:
189
189
### Step 5: Clean up
190
190
To delete a deployed model, use the following command:
191
191
```shell
192
-
sllm-cli delete facebook/opt-1.3b
192
+
sllm delete facebook/opt-1.3b
193
193
```
194
194
This will remove the specified model from the ServerlessLLM server.
195
195
@@ -350,8 +350,8 @@ We will start the worker node and store in the same script. Because the server l
350
350
351
351
conda activate sllm
352
352
353
-
sllm-serve start --host <HEAD_NODE_IP>
354
-
# sllm-serve start --host <HEAD_NODE_IP> --port <avail_port> # if you have changed the port
353
+
sllm start --host <HEAD_NODE_IP>
354
+
# sllm start --host <HEAD_NODE_IP> --port <avail_port> # if you have changed the port
355
355
```
356
356
- Replace `your_partition`in the script as before.
357
357
- Replace `/path/to/ServerlessLLM` as before.
@@ -369,17 +369,17 @@ We will start the worker node and store in the same script. Because the server l
369
369
INFO: Application startup complete.
370
370
INFO: Uvicorn running on http://xxx.xxx.xx.xx:8343 (Press CTRL+C to quit)
371
371
```
372
-
### Step 4: Use sllm-cli to manage models
372
+
### Step 4: Use sllm to manage models
373
373
1. **You can do this step on login node, and set the ```LLM_SERVER_URL``` environment variable:**
If you wish to switch to a different set of LoRA adapters, you can still use `sllm-cli deploy` command with updated adapter configurations. ServerlessLLM will automatically reload the new adapters without restarting the backend.
102
+
If you wish to switch to a different set of LoRA adapters, you can still use `sllm deploy` command with updated adapter configurations. ServerlessLLM will automatically reload the new adapters without restarting the backend.
Copy file name to clipboardExpand all lines: docs/stable/getting_started.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ sidebar_position: 1
4
4
5
5
# Getting Started
6
6
7
-
This guide demonstrates how to quickly set up a local ServerlessLLM cluster using Docker Compose on a single machine. We will initialize a minimal cluster, consisting of a head node and a single worker node. Then, we'll deploy a model using the `sllm-cli` and query the deployment through an OpenAI-compatible API.
7
+
This guide demonstrates how to quickly set up a local ServerlessLLM cluster using Docker Compose on a single machine. We will initialize a minimal cluster, consisting of a head node and a single worker node. Then, we'll deploy a model using the `sllm` and query the deployment through an OpenAI-compatible API.
8
8
9
9
:::note
10
10
We strongly recommend using Docker (Compose) to manage your ServerlessLLM cluster, whether you are using ServerlessLLM for testing or development. However, if Docker is not a viable option for you, please refer to the [deploy from scratch guide](./deployment/single_machine.md).
@@ -77,18 +77,18 @@ INFO: Uvicorn running on http://0.0.0.0:8343 (Press CTRL+C to quit)
77
77
(FcfsScheduler pid=1604) INFO 05-26 15:40:49 fcfs_scheduler.py:111] Starting control loop
78
78
```
79
79
80
-
## Deploy a Model Using sllm-cli
80
+
## Deploy a Model Using sllm
81
81
82
82
Set the `LLM_SERVER_URL` environment variable:
83
83
84
84
```bash
85
85
export LLM_SERVER_URL=http://127.0.0.1:8343
86
86
```
87
87
88
-
Deploy a model to the ServerlessLLM cluster using the `sllm-cli`:
88
+
Deploy a model to the ServerlessLLM cluster using the `sllm`:
89
89
90
90
```bash
91
-
sllm-cli deploy --model facebook/opt-1.3b
91
+
sllm deploy --model facebook/opt-1.3b
92
92
```
93
93
> Note: This command will take some time to download the model from the Hugging Face Model Hub.
94
94
> You can use any model from the [Hugging Face Model Hub](https://huggingface.co/models) by specifying its name in the `--model` argument.
@@ -125,7 +125,7 @@ Expected output:
125
125
To delete a deployed model, execute the following command:
126
126
127
127
```bash
128
-
sllm-cli delete facebook/opt-1.3b
128
+
sllm delete facebook/opt-1.3b
129
129
```
130
130
131
131
This command removes the specified model from the ServerlessLLM server.
0 commit comments