Skip to content

Commit f91c57d

Browse files
authored
Merge pull request #625 from oracle-samples/mayoor-patch-4
Update deploy-openai-llm-byoc.md
2 parents f5b7963 + fc21ce3 commit f91c57d

File tree

1 file changed

+78
-86
lines changed

1 file changed

+78
-86
lines changed

LLM/deploy-openai-llm-byoc.md

Lines changed: 78 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -1,147 +1,141 @@
11
# Deploy OpenAI open-source models
22

3-
This guide demonstrates how to deploy and perform inference using AI Quick Action registered models with Oracle Data Science Service Managed Containers (SMC) powered by vLLM. In this example, we will use a model downloaded from Hugging Face specifically, [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) from OpenAI.
4-
3+
This guide demonstrates how to deploy and perform inference using OCI Data Science Service. In this example, we will use a model downloaded from Hugging Face, specifically [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) from OpenAI.
54

65
## Required IAM Policies
76

87
Add these [policies](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2#required-iam-policies) to grant access to OCI services.
98

109
## Setup
1110

11+
Create a data science notebook session with at least 400GB of space. We will use a notebook session to:
12+
1. Download model weights
13+
2. Create Model Catalog entry
14+
3. Deploy the model
1215

16+
To prepare the inference container, we will use local laptop since this step requires Docker commands. The notebook session does not come with the docker tooling.
1317

14-
```python
15-
# Install required python packages
1618

17-
!pip install oracle-ads
18-
!pip install oci
19-
!pip install huggingface_hub
20-
```
19+
# Prepare Inference container
2120

21+
vLLM is an easy-to-use library for LLM inference and serving. You can get the container image from [DockerHub](https://hub.docker.com/r/vllm/vllm-openai/tags).
2222

23-
```python
24-
# Uncomment this code and set the correct proxy links if have to setup proxy for internet
25-
# import os
26-
# os.environ['http_proxy']="http://myproxy"
27-
# os.environ['https_proxy']="http://myproxy"
23+
Following commands are to run on your laptop -
2824

29-
# Use os.environ['no_proxy'] to route traffic directly
25+
```shell
26+
docker pull --platform linux/amd64 vllm/vllm-openai:gptoss
3027
```
3128

29+
Currently, OCI Data Science Model Deployment only supports container images residing in the OCI Registry. Before we can push the pulled vLLM container, make sure you have created a repository in your tenancy.
30+
- Go to your tenancy Container Registry
31+
- Click on the Create repository button
32+
- Select Private under Access types
33+
- Set a name for Repository name. We are using "vllm-odsc "in the example.
34+
- Click on Create button
3235

33-
```python
34-
import ads
35-
import os
36-
37-
ads.set_auth("resource_principal")
38-
```
39-
36+
You may need to docker login to the Oracle Cloud Container Registry (OCIR) first, if you haven't done so before in order to push the image. To login, you have to use your API Auth Token that can be created under your Oracle Cloud Account->Auth Token. You need to login only once. Replace <region> with the OCI region you are using.
4037

41-
```python
42-
# Extract region information from the Notebook environment variables and signer.
43-
ads.common.utils.extract_region()
38+
```shell
39+
docker login -u '<tenant-namespace>/<username>' <region>.ocir.io
4440
```
4541

46-
### Common variables
47-
48-
49-
```python
50-
# change as required for your environment
51-
compartment_id = os.environ["PROJECT_COMPARTMENT_OCID"]
52-
project_id = os.environ["PROJECT_OCID"]
53-
54-
log_group_id = "ocid1.loggroup.oc1.xxx.xxxxx"
55-
log_id = "ocid1.log.oc1.xxx.xxxxx"
42+
If your tenancy is federated with Oracle Identity Cloud Service, use the format <tenancy-namespace>/oracleidentitycloudservice/<username>. You can then push the container image to the OCI Registry:
5643

57-
instance_shape = "BM.GPU.H100.8"
58-
59-
region = "<your-region>"
44+
```shell
45+
docker tag vllm/vllm-openai:gptoss -t <region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss
46+
docker push <region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss
6047
```
6148

62-
## API Endpoint Usage
63-
64-
The `/v1/completions` is for interacting with non-chat base models or the instruction trained chat model. This endpoint provides the completion for a single prompt and takes a single string as input, whereas the `/v1/chat/completions` endpoint provides the responses for a given dialog and requires the input in a specific format corresponding to the message history. This guide uses `/v1/chat/completions` endpoint.
49+
# Deployment
6550

51+
Following steps are to be performed on OCI Notebook Session -
6652

6753
## Prepare The Model Artifacts
6854

6955
To prepare Model artifacts for LLM model deployment:
7056

71-
- Download the model files from huggingface to local directory using a valid huggingface token (only needed for gated models). If you don't have Huggingface Token, refer [this](https://huggingface.co/docs/hub/en/security-tokens) to generate one.
57+
- Download the model files from huggingface to local directory.
7258
- Upload the model folder to a [versioned bucket](https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/usingversioning.htm) in Oracle Object Storage. If you don’t have an Object Storage bucket, create one using the OCI SDK or the Console. Create an Object Storage bucket. Make a note of the `namespace`, `compartment`, and `bucketname`. Configure the policies to allow the Data Science service to read and write the model artifact to the Object Storage bucket in your tenancy. An administrator must configure the policies in IAM in the Console.
7359
- Create model catalog entry for the model using the Object storage path
7460

7561
### Model Download from HuggingFace Model Hub
7662

77-
78-
```shell
79-
# Login to huggingface using env variable
80-
huggingface-cli login --token <HUGGINGFACE_TOKEN>
81-
```
82-
83-
[This](https://huggingface.co/docs/huggingface_hub/en/guides/cli#download-an-entire-repository) provides more information on using `huggingface-cli` to download an entire repository at a given revision. Models in the HuggingFace hub are stored in their own repository.
63+
[This documentation](https://huggingface.co/docs/huggingface_hub/en/guides/cli#download-an-entire-repository) provides more information on using `huggingface-cli` to download an entire repository at a given revision. Models in the HuggingFace hub are stored in their own repository.
8464

8565

8666
```shell
8767
# Select the the model that you want to deploy.
8868

89-
huggingface-cli download openai/gpt-oss-120b --local-dir models/gpt-oss-120b
69+
huggingface-cli download openai/gpt-oss-120b --local-dir models/gpt-oss-120b --exclude metal/*
9070
```
9171

72+
Download the titoken file -
73+
74+
```shell
75+
wget -P models/gpt-oss-120b https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken
76+
```
9277
## Upload Model to OCI Object Storage
9378

79+
**Note**: **The bucket has to be a versioned bucket**
9480

9581
```shell
96-
oci os object bulk-upload --src-dir $local_dir --prefix gpt-oss-120b/ -bn <bucket_name> -ns <bucket_namespace> --auth "resource_principal"
82+
oci os object bulk-upload --src-dir models/gpt-oss-120b --prefix gpt-oss-120b/ -bn <bucket_name> -ns <bucket_namespace> --auth "resource_principal"
9783
```
9884

9985
## Create Model by Reference using ADS
10086

87+
```python
88+
# Uncomment this code and set the correct proxy links if have to setup proxy for internet
89+
# import os
90+
# os.environ['http_proxy']="http://myproxy"
91+
# os.environ['https_proxy']="http://myproxy"
92+
93+
# Use os.environ['no_proxy'] to route traffic directly
94+
```
10195

10296

10397
```python
104-
from ads.model.datascience_model import DataScienceModel
98+
import ads
99+
import os
105100

106-
artifact_path = f"oci://{bucket}@{namespace}/{model_prefix}"
101+
ads.set_auth("resource_principal")
107102

108-
model = (
109-
DataScienceModel()
110-
.with_compartment_id(compartment_id)
111-
.with_project_id(project_id)
112-
.with_display_name("gpt-oss-120b ")
113-
.with_artifact(artifact_path)
114-
)
115103

116-
model.create(model_by_reference=True)
104+
# Extract region information from the Notebook environment variables and signer.
105+
ads.common.utils.extract_region()
117106
```
118107

119-
## Inference container
108+
```python
109+
# change as required for your environment
110+
compartment_id = os.environ["PROJECT_COMPARTMENT_OCID"]
111+
project_id = os.environ["PROJECT_OCID"]
120112

121-
vLLM is an easy-to-use library for LLM inference and server. You can get the container image from [DockerHub](https://hub.docker.com/r/vllm/vllm-openai/tags).
113+
log_group_id = "ocid1.loggroup.oc1.xxx.xxxxx"
114+
log_id = "ocid1.log.oc1.xxx.xxxxx"
122115

123-
```shell
124-
docker pull --platform linux/amd64 vllm/vllm-openai:gptoss
116+
instance_shape = "BM.GPU.H100.8"
117+
118+
region = ads.common.utils.extract_region()
125119
```
126120

127-
Currently, OCI Data Science Model Deployment only supports container images residing in the OCI Registry. Before we can push the pulled vLLM container, make sure you have created a repository in your tenancy.
128-
- Go to your tenancy Container Registry
129-
- Click on the Create repository button
130-
- Select Private under Access types
131-
- Set a name for Repository name. We are using "vllm-odsc "in the example.
132-
- Click on Create button
121+
```python
122+
from ads.model.datascience_model import DataScienceModel
133123

134-
You may need to docker login to the Oracle Cloud Container Registry (OCIR) first, if you haven't done so before in order to push the image. To login, you have to use your API Auth Token that can be created under your Oracle Cloud Account->Auth Token. You need to login only once. Replace <region> with the OCI region you are using.
124+
bucket=<bucket-name>
125+
namespace=<namespace>
135126

136-
```shell
137-
docker login -u '<tenant-namespace>/<username>' <region>.ocir.io
138-
```
127+
artifact_path = f"oci://{bucket}@{namespace}/gpt-oss-120b"
139128

140-
If your tenancy is federated with Oracle Identity Cloud Service, use the format <tenancy-namespace>/oracleidentitycloudservice/<username>. You can then push the container image to the OCI Registry
129+
model = (
130+
DataScienceModel()
131+
.with_compartment_id(compartment_id)
132+
.with_project_id(project_id)
133+
.with_display_name("gpt-oss-120b")
134+
.with_artifact(artifact_path)
135+
)
141136

142-
```shell
143-
docker tag vllm/vllm-openai:gptoss -t <region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss
144-
docker push <region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss
137+
model.create(model_by_reference=True)
138+
```
145139

146140

147141
### Import Model Deployment Modules
@@ -157,10 +151,8 @@ from ads.model.deployment import (
157151

158152
## Setup Model Deployment Infrastructure
159153

160-
161-
162154
```python
163-
container_image = "<region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss" # name given to vllm image pushed to oracle container registry
155+
container_image = "<region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss" # name given to vllm image pushed to Oracle container registry
164156
```
165157

166158
```python
@@ -190,11 +182,13 @@ infrastructure = (
190182
```python
191183
env_var = {
192184
"MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/chat/completions",
185+
"SHM_SIZE": "10g",
186+
"TIKTOKEN_RS_CACHE_DIR":"/opt/ds/model/gpt-oss-120b"
193187
}
194188

195189
cmd_var = [
196190
"--model",
197-
f"/opt/ds/model/deployed_model/{model_prefix}",
191+
f"/opt/ds/model/deployed_model/gpt-oss-120b",
198192
"--tensor-parallel-size",
199193
"8",
200194
"--port",
@@ -228,8 +222,8 @@ container_runtime = (
228222
```python
229223
deployment = (
230224
ModelDeployment()
231-
.with_display_name(f"{model_prefix} MD with BYOC")
232-
.with_description(f"Deployment of {model_prefix} MD with vLLM BYOC container")
225+
.with_display_name(f"gpt-oss-120b MD with BYOC")
226+
.with_description(f"Deployment of gpt-oss-120b MD with vLLM BYOC container")
233227
.with_infrastructure(infrastructure)
234228
.with_runtime(container_runtime)
235229
).deploy(wait_for_completion=False)
@@ -255,8 +249,6 @@ endpoint = f"https://modeldeployment.us-ashburn-1.oci.customer-oci.com/{deployme
255249

256250
current_date = datetime.now().strftime("%d %B %Y")
257251

258-
prompt="What amateur radio bands are best to use when there are solar flares?"
259-
260252
body = {
261253
"model": "openai/gpt-oss-120b", # this is a constant
262254
"messages":[

0 commit comments

Comments
 (0)