Skip to content

Commit 72ee1a2

Browse files
authored
Update deploy-openai-llm-byoc.md
1 parent 0bb013c commit 72ee1a2

File tree

1 file changed

+33
-43
lines changed

1 file changed

+33
-43
lines changed

LLM/deploy-openai-llm-byoc.md

Lines changed: 33 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Deploy OpenAI open-source models
22

3-
This guide demonstrates how to deploy and perform inference using AI Quick Action registered models with Oracle Data Science Service Managed Containers (SMC) powered by vLLM. In this example, we will use a model downloaded from Hugging Face specifically, [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) from OpenAI.
3+
This guide demonstrates how to deploy and perform inference using OCI Data Science Service. In this example, we will use a model downloaded from Hugging Face specifically, [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) from OpenAI.
4+
45

56

67
## Required IAM Policies
@@ -9,40 +10,58 @@ Add these [policies](https://github.com/oracle-samples/oci-data-science-ai-sampl
910

1011
## Setup
1112

13+
Create a data science notebook session with at least 400GB of space. We will use notebook session to -
14+
1. Download model weights
15+
2. Create Model Catalog entry
16+
3. Deploy the model
1217

18+
To prepare inference container, we will use local laptop since this step requires docker commmands. The notebook session does not come with the docker tooling.
1319

14-
```shell
15-
# Install required python packages
1620

17-
pip install oracle-ads
18-
pip install oci
19-
pip install huggingface_hub
20-
```
21+
# Prepare Inference container
2122

23+
vLLM is an easy-to-use library for LLM inference and server. You can get the container image from [DockerHub](https://hub.docker.com/r/vllm/vllm-openai/tags).
24+
25+
Following commands are to run on your laptop -
26+
27+
```shell
28+
docker pull --platform linux/amd64 vllm/vllm-openai:gptoss
29+
```
2230

31+
Currently, OCI Data Science Model Deployment only supports container images residing in the OCI Registry. Before we can push the pulled vLLM container, make sure you have created a repository in your tenancy.
32+
- Go to your tenancy Container Registry
33+
- Click on the Create repository button
34+
- Select Private under Access types
35+
- Set a name for Repository name. We are using "vllm-odsc "in the example.
36+
- Click on Create button
2337

24-
### Common variables
38+
You may need to docker login to the Oracle Cloud Container Registry (OCIR) first, if you haven't done so before in order to push the image. To login, you have to use your API Auth Token that can be created under your Oracle Cloud Account->Auth Token. You need to login only once. Replace <region> with the OCI region you are using.
2539

40+
```shell
41+
docker login -u '<tenant-namespace>/<username>' <region>.ocir.io
42+
```
2643

44+
If your tenancy is federated with Oracle Identity Cloud Service, use the format <tenancy-namespace>/oracleidentitycloudservice/<username>. You can then push the container image to the OCI Registry
2745

28-
## API Endpoint Usage
46+
```shell
47+
docker tag vllm/vllm-openai:gptoss -t <region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss
48+
docker push <region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss
49+
```
2950

30-
The `/v1/completions` is for interacting with non-chat base models or the instruction trained chat model. This endpoint provides the completion for a single prompt and takes a single string as input, whereas the `/v1/chat/completions` endpoint provides the responses for a given dialog and requires the input in a specific format corresponding to the message history. This guide uses `/v1/chat/completions` endpoint.
51+
# Deployment
3152

53+
Following steps are to be performed on OCI Notebook Session -
3254

3355
## Prepare The Model Artifacts
3456

3557
To prepare Model artifacts for LLM model deployment:
3658

37-
- Download the model files from huggingface to local directory using a valid huggingface token (only needed for gated models). If you don't have Huggingface Token, refer [this](https://huggingface.co/docs/hub/en/security-tokens) to generate one.
59+
- Download the model files from huggingface to local directory.
3860
- Upload the model folder to a [versioned bucket](https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/usingversioning.htm) in Oracle Object Storage. If you don’t have an Object Storage bucket, create one using the OCI SDK or the Console. Create an Object Storage bucket. Make a note of the `namespace`, `compartment`, and `bucketname`. Configure the policies to allow the Data Science service to read and write the model artifact to the Object Storage bucket in your tenancy. An administrator must configure the policies in IAM in the Console.
3961
- Create model catalog entry for the model using the Object storage path
4062

4163
### Model Download from HuggingFace Model Hub
4264

43-
44-
45-
4665
[This documentation](https://huggingface.co/docs/huggingface_hub/en/guides/cli#download-an-entire-repository) provides more information on using `huggingface-cli` to download an entire repository at a given revision. Models in the HuggingFace hub are stored in their own repository.
4766

4867

@@ -120,33 +139,6 @@ model = (
120139
model.create(model_by_reference=True)
121140
```
122141

123-
## Inference container
124-
125-
vLLM is an easy-to-use library for LLM inference and server. You can get the container image from [DockerHub](https://hub.docker.com/r/vllm/vllm-openai/tags).
126-
127-
```shell
128-
docker pull --platform linux/amd64 vllm/vllm-openai:gptoss
129-
```
130-
131-
Currently, OCI Data Science Model Deployment only supports container images residing in the OCI Registry. Before we can push the pulled vLLM container, make sure you have created a repository in your tenancy.
132-
- Go to your tenancy Container Registry
133-
- Click on the Create repository button
134-
- Select Private under Access types
135-
- Set a name for Repository name. We are using "vllm-odsc "in the example.
136-
- Click on Create button
137-
138-
You may need to docker login to the Oracle Cloud Container Registry (OCIR) first, if you haven't done so before in order to push the image. To login, you have to use your API Auth Token that can be created under your Oracle Cloud Account->Auth Token. You need to login only once. Replace <region> with the OCI region you are using.
139-
140-
```shell
141-
docker login -u '<tenant-namespace>/<username>' <region>.ocir.io
142-
```
143-
144-
If your tenancy is federated with Oracle Identity Cloud Service, use the format <tenancy-namespace>/oracleidentitycloudservice/<username>. You can then push the container image to the OCI Registry
145-
146-
```shell
147-
docker tag vllm/vllm-openai:gptoss -t <region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss
148-
docker push <region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss
149-
150142

151143
### Import Model Deployment Modules
152144

@@ -161,8 +153,6 @@ from ads.model.deployment import (
161153

162154
## Setup Model Deployment Infrastructure
163155

164-
165-
166156
```python
167157
container_image = "<region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss" # name given to vllm image pushed to oracle container registry
168158
```

0 commit comments

Comments
 (0)