You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: LLM/deploy-openai-llm-byoc.md
+78-86Lines changed: 78 additions & 86 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,147 +1,141 @@
1
1
# Deploy OpenAI open-source models
2
2
3
-
This guide demonstrates how to deploy and perform inference using AI Quick Action registered models with Oracle Data Science Service Managed Containers (SMC) powered by vLLM. In this example, we will use a model downloaded from Hugging Face specifically, [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) from OpenAI.
4
-
3
+
This guide demonstrates how to deploy and perform inference using OCI Data Science Service. In this example, we will use a model downloaded from Hugging Face, specifically [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) from OpenAI.
5
4
6
5
## Required IAM Policies
7
6
8
7
Add these [policies](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2#required-iam-policies) to grant access to OCI services.
9
8
10
9
## Setup
11
10
11
+
Create a data science notebook session with at least 400GB of space. We will use a notebook session to:
12
+
1. Download model weights
13
+
2. Create Model Catalog entry
14
+
3. Deploy the model
12
15
16
+
To prepare the inference container, we will use local laptop since this step requires Docker commands. The notebook session does not come with the docker tooling.
13
17
14
-
```python
15
-
# Install required python packages
16
18
17
-
!pip install oracle-ads
18
-
!pip install oci
19
-
!pip install huggingface_hub
20
-
```
19
+
# Prepare Inference container
21
20
21
+
vLLM is an easy-to-use library for LLM inference and serving. You can get the container image from [DockerHub](https://hub.docker.com/r/vllm/vllm-openai/tags).
22
22
23
-
```python
24
-
# Uncomment this code and set the correct proxy links if have to setup proxy for internet
25
-
# import os
26
-
# os.environ['http_proxy']="http://myproxy"
27
-
# os.environ['https_proxy']="http://myproxy"
23
+
Following commands are to run on your laptop -
28
24
29
-
# Use os.environ['no_proxy'] to route traffic directly
Currently, OCI Data Science Model Deployment only supports container images residing in the OCI Registry. Before we can push the pulled vLLM container, make sure you have created a repository in your tenancy.
30
+
- Go to your tenancy Container Registry
31
+
- Click on the Create repository button
32
+
- Select Private under Access types
33
+
- Set a name for Repository name. We are using "vllm-odsc "in the example.
34
+
- Click on Create button
32
35
33
-
```python
34
-
import ads
35
-
import os
36
-
37
-
ads.set_auth("resource_principal")
38
-
```
39
-
36
+
You may need to docker login to the Oracle Cloud Container Registry (OCIR) first, if you haven't done so before in order to push the image. To login, you have to use your API Auth Token that can be created under your Oracle Cloud Account->Auth Token. You need to login only once. Replace <region> with the OCI region you are using.
40
37
41
-
```python
42
-
# Extract region information from the Notebook environment variables and signer.
If your tenancy is federated with Oracle Identity Cloud Service, use the format <tenancy-namespace>/oracleidentitycloudservice/<username>. You can then push the container image to the OCI Registry:
56
43
57
-
instance_shape ="BM.GPU.H100.8"
58
-
59
-
region ="<your-region>"
44
+
```shell
45
+
docker tag vllm/vllm-openai:gptoss -t <region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss
The `/v1/completions` is for interacting with non-chat base models or the instruction trained chat model. This endpoint provides the completion for a single prompt and takes a single string as input, whereas the `/v1/chat/completions` endpoint provides the responses for a given dialog and requires the input in a specific format corresponding to the message history. This guide uses `/v1/chat/completions` endpoint.
49
+
# Deployment
65
50
51
+
Following steps are to be performed on OCI Notebook Session -
66
52
67
53
## Prepare The Model Artifacts
68
54
69
55
To prepare Model artifacts for LLM model deployment:
70
56
71
-
- Download the model files from huggingface to local directory using a valid huggingface token (only needed for gated models). If you don't have Huggingface Token, refer [this](https://huggingface.co/docs/hub/en/security-tokens) to generate one.
57
+
- Download the model files from huggingface to local directory.
72
58
- Upload the model folder to a [versioned bucket](https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/usingversioning.htm) in Oracle Object Storage. If you don’t have an Object Storage bucket, create one using the OCI SDK or the Console. Create an Object Storage bucket. Make a note of the `namespace`, `compartment`, and `bucketname`. Configure the policies to allow the Data Science service to read and write the model artifact to the Object Storage bucket in your tenancy. An administrator must configure the policies in IAM in the Console.
73
59
- Create model catalog entry for the model using the Object storage path
74
60
75
61
### Model Download from HuggingFace Model Hub
76
62
77
-
78
-
```shell
79
-
# Login to huggingface using env variable
80
-
huggingface-cli login --token <HUGGINGFACE_TOKEN>
81
-
```
82
-
83
-
[This](https://huggingface.co/docs/huggingface_hub/en/guides/cli#download-an-entire-repository) provides more information on using `huggingface-cli` to download an entire repository at a given revision. Models in the HuggingFace hub are stored in their own repository.
63
+
[This documentation](https://huggingface.co/docs/huggingface_hub/en/guides/cli#download-an-entire-repository) provides more information on using `huggingface-cli` to download an entire repository at a given revision. Models in the HuggingFace hub are stored in their own repository.
vLLM is an easy-to-use library for LLM inference and server. You can get the container image from [DockerHub](https://hub.docker.com/r/vllm/vllm-openai/tags).
Currently, OCI Data Science Model Deployment only supports container images residing in the OCI Registry. Before we can push the pulled vLLM container, make sure you have created a repository in your tenancy.
128
-
- Go to your tenancy Container Registry
129
-
- Click on the Create repository button
130
-
- Select Private under Access types
131
-
- Set a name for Repository name. We are using "vllm-odsc "in the example.
132
-
- Click on Create button
121
+
```python
122
+
from ads.model.datascience_model import DataScienceModel
133
123
134
-
You may need to docker login to the Oracle Cloud Container Registry (OCIR) first, if you haven't done so before in order to push the image. To login, you have to use your API Auth Token that can be created under your Oracle Cloud Account->Auth Token. You need to login only once. Replace <region> with the OCI region you are using.
If your tenancy is federated with Oracle Identity Cloud Service, use the format <tenancy-namespace>/oracleidentitycloudservice/<username>. You can then push the container image to the OCI Registry
129
+
model = (
130
+
DataScienceModel()
131
+
.with_compartment_id(compartment_id)
132
+
.with_project_id(project_id)
133
+
.with_display_name("gpt-oss-120b")
134
+
.with_artifact(artifact_path)
135
+
)
141
136
142
-
```shell
143
-
docker tag vllm/vllm-openai:gptoss -t <region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss
0 commit comments