|
| 1 | +# Deploy OpenAI open-source models |
| 2 | + |
| 3 | +This guide demonstrates how to deploy and perform inference using OCI Data Science Service. In this example, we will use a model downloaded from Hugging Face, specifically [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) from OpenAI. |
| 4 | + |
| 5 | +## Required IAM Policies |
| 6 | + |
| 7 | +Add these [policies](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2#required-iam-policies) to grant access to OCI services. |
| 8 | + |
| 9 | +## Setup |
| 10 | + |
| 11 | +Create a data science notebook session with at least 400GB of space. We will use a notebook session to: |
| 12 | +1. Download model weights |
| 13 | +2. Create Model Catalog entry |
| 14 | +3. Deploy the model |
| 15 | + |
| 16 | +To prepare the inference container, we will use local laptop since this step requires Docker commands. The notebook session does not come with the docker tooling. |
| 17 | + |
| 18 | + |
| 19 | +# Prepare Inference container |
| 20 | + |
| 21 | +vLLM is an easy-to-use library for LLM inference and serving. You can get the container image from [DockerHub](https://hub.docker.com/r/vllm/vllm-openai/tags). |
| 22 | + |
| 23 | +Following commands are to run on your laptop - |
| 24 | + |
| 25 | +```shell |
| 26 | +docker pull --platform linux/amd64 vllm/vllm-openai:gptoss |
| 27 | +``` |
| 28 | + |
| 29 | +Currently, OCI Data Science Model Deployment only supports container images residing in the OCI Registry. Before we can push the pulled vLLM container, make sure you have created a repository in your tenancy. |
| 30 | +- Go to your tenancy Container Registry |
| 31 | +- Click on the Create repository button |
| 32 | +- Select Private under Access types |
| 33 | +- Set a name for Repository name. We are using "vllm-odsc "in the example. |
| 34 | +- Click on Create button |
| 35 | + |
| 36 | +You may need to docker login to the Oracle Cloud Container Registry (OCIR) first, if you haven't done so before in order to push the image. To login, you have to use your API Auth Token that can be created under your Oracle Cloud Account->Auth Token. You need to login only once. Replace <region> with the OCI region you are using. |
| 37 | + |
| 38 | +```shell |
| 39 | +docker login -u '<tenant-namespace>/<username>' <region>.ocir.io |
| 40 | +``` |
| 41 | + |
| 42 | +If your tenancy is federated with Oracle Identity Cloud Service, use the format <tenancy-namespace>/oracleidentitycloudservice/<username>. You can then push the container image to the OCI Registry: |
| 43 | + |
| 44 | +```shell |
| 45 | +docker tag vllm/vllm-openai:gptoss -t <region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss |
| 46 | +docker push <region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss |
| 47 | +``` |
| 48 | + |
| 49 | +# Deployment |
| 50 | + |
| 51 | +Following steps are to be performed on OCI Notebook Session - |
| 52 | + |
| 53 | +## Prepare The Model Artifacts |
| 54 | + |
| 55 | +To prepare Model artifacts for LLM model deployment: |
| 56 | + |
| 57 | +- Download the model files from huggingface to local directory. |
| 58 | +- Upload the model folder to a [versioned bucket](https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/usingversioning.htm) in Oracle Object Storage. If you don’t have an Object Storage bucket, create one using the OCI SDK or the Console. Create an Object Storage bucket. Make a note of the `namespace`, `compartment`, and `bucketname`. Configure the policies to allow the Data Science service to read and write the model artifact to the Object Storage bucket in your tenancy. An administrator must configure the policies in IAM in the Console. |
| 59 | +- Create model catalog entry for the model using the Object storage path |
| 60 | + |
| 61 | +### Model Download from HuggingFace Model Hub |
| 62 | + |
| 63 | +[This documentation](https://huggingface.co/docs/huggingface_hub/en/guides/cli#download-an-entire-repository) provides more information on using `huggingface-cli` to download an entire repository at a given revision. Models in the HuggingFace hub are stored in their own repository. |
| 64 | + |
| 65 | + |
| 66 | +```shell |
| 67 | +# Select the the model that you want to deploy. |
| 68 | + |
| 69 | +huggingface-cli download openai/gpt-oss-120b --local-dir models/gpt-oss-120b --exclude metal/* |
| 70 | +``` |
| 71 | + |
| 72 | +Download the titoken file - |
| 73 | + |
| 74 | +```shell |
| 75 | +wget -P models/gpt-oss-120b https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken |
| 76 | +``` |
| 77 | +## Upload Model to OCI Object Storage |
| 78 | + |
| 79 | +**Note**: **The bucket has to be a versioned bucket** |
| 80 | + |
| 81 | +```shell |
| 82 | +oci os object bulk-upload --src-dir models/gpt-oss-120b --prefix gpt-oss-120b/ -bn <bucket_name> -ns <bucket_namespace> --auth "resource_principal" |
| 83 | +``` |
| 84 | + |
| 85 | +## Create Model by Reference using ADS |
| 86 | + |
| 87 | +```python |
| 88 | +# Uncomment this code and set the correct proxy links if have to setup proxy for internet |
| 89 | +# import os |
| 90 | +# os.environ['http_proxy']="http://myproxy" |
| 91 | +# os.environ['https_proxy']="http://myproxy" |
| 92 | + |
| 93 | +# Use os.environ['no_proxy'] to route traffic directly |
| 94 | +``` |
| 95 | + |
| 96 | + |
| 97 | +```python |
| 98 | +import ads |
| 99 | +import os |
| 100 | + |
| 101 | +ads.set_auth("resource_principal") |
| 102 | + |
| 103 | + |
| 104 | +# Extract region information from the Notebook environment variables and signer. |
| 105 | +ads.common.utils.extract_region() |
| 106 | +``` |
| 107 | + |
| 108 | +```python |
| 109 | +# change as required for your environment |
| 110 | +compartment_id = os.environ["PROJECT_COMPARTMENT_OCID"] |
| 111 | +project_id = os.environ["PROJECT_OCID"] |
| 112 | + |
| 113 | +log_group_id = "ocid1.loggroup.oc1.xxx.xxxxx" |
| 114 | +log_id = "ocid1.log.oc1.xxx.xxxxx" |
| 115 | + |
| 116 | +instance_shape = "BM.GPU.H100.8" |
| 117 | + |
| 118 | +region = ads.common.utils.extract_region() |
| 119 | +``` |
| 120 | + |
| 121 | +```python |
| 122 | +from ads.model.datascience_model import DataScienceModel |
| 123 | + |
| 124 | +bucket=<bucket-name> |
| 125 | +namespace=<namespace> |
| 126 | + |
| 127 | +artifact_path = f"oci://{bucket}@{namespace}/gpt-oss-120b" |
| 128 | + |
| 129 | +model = ( |
| 130 | + DataScienceModel() |
| 131 | + .with_compartment_id(compartment_id) |
| 132 | + .with_project_id(project_id) |
| 133 | + .with_display_name("gpt-oss-120b") |
| 134 | + .with_artifact(artifact_path) |
| 135 | +) |
| 136 | + |
| 137 | +model.create(model_by_reference=True) |
| 138 | +``` |
| 139 | + |
| 140 | + |
| 141 | +### Import Model Deployment Modules |
| 142 | + |
| 143 | +```python |
| 144 | +from ads.model.deployment import ( |
| 145 | + ModelDeployment, |
| 146 | + ModelDeploymentContainerRuntime, |
| 147 | + ModelDeploymentInfrastructure, |
| 148 | + ModelDeploymentMode, |
| 149 | +) |
| 150 | +``` |
| 151 | + |
| 152 | +## Setup Model Deployment Infrastructure |
| 153 | + |
| 154 | +```python |
| 155 | +container_image = "<region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss" # name given to vllm image pushed to Oracle container registry |
| 156 | +``` |
| 157 | + |
| 158 | +```python |
| 159 | +infrastructure = ( |
| 160 | + ModelDeploymentInfrastructure() |
| 161 | + .with_project_id(project_id) |
| 162 | + .with_compartment_id(compartment_id) |
| 163 | + .with_shape_name(instance_shape) |
| 164 | + .with_bandwidth_mbps(10) |
| 165 | + .with_replica(1) |
| 166 | + .with_web_concurrency(1) |
| 167 | + .with_access_log( |
| 168 | + log_group_id=log_group_id, |
| 169 | + log_id=log_id, |
| 170 | + ) |
| 171 | + .with_predict_log( |
| 172 | + log_group_id=log_group_id, |
| 173 | + log_id=log_id, |
| 174 | + ) |
| 175 | +) |
| 176 | +``` |
| 177 | + |
| 178 | +## Configure Model Deployment Runtime |
| 179 | + |
| 180 | + |
| 181 | + |
| 182 | +```python |
| 183 | +env_var = { |
| 184 | + "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/chat/completions", |
| 185 | + "SHM_SIZE": "10g", |
| 186 | + "TIKTOKEN_RS_CACHE_DIR":"/opt/ds/model/gpt-oss-120b" |
| 187 | +} |
| 188 | + |
| 189 | +cmd_var = [ |
| 190 | + "--model", |
| 191 | + f"/opt/ds/model/deployed_model/gpt-oss-120b", |
| 192 | + "--tensor-parallel-size", |
| 193 | + "8", |
| 194 | + "--port", |
| 195 | + "8080", |
| 196 | + "--served-model-name", |
| 197 | + "openai/gpt-oss-120b", |
| 198 | + "--host", |
| 199 | + "0.0.0.0", |
| 200 | + "--trust-remote-code", |
| 201 | + "--quantization", |
| 202 | + "mxfp4" |
| 203 | +] |
| 204 | + |
| 205 | +container_runtime = ( |
| 206 | + ModelDeploymentContainerRuntime() |
| 207 | + .with_image(container_image) |
| 208 | + .with_server_port(8080) |
| 209 | + .with_health_check_port(8080) |
| 210 | + .with_env(env_var) |
| 211 | + .with_cmd(cmd_var) |
| 212 | + .with_deployment_mode(ModelDeploymentMode.HTTPS) |
| 213 | + .with_model_uri(model.id) |
| 214 | + .with_region(region) |
| 215 | +) |
| 216 | +``` |
| 217 | + |
| 218 | +## Deploy Model using Container Runtime |
| 219 | + |
| 220 | + |
| 221 | + |
| 222 | +```python |
| 223 | +deployment = ( |
| 224 | + ModelDeployment() |
| 225 | + .with_display_name(f"gpt-oss-120b MD with BYOC") |
| 226 | + .with_description(f"Deployment of gpt-oss-120b MD with vLLM BYOC container") |
| 227 | + .with_infrastructure(infrastructure) |
| 228 | + .with_runtime(container_runtime) |
| 229 | +).deploy(wait_for_completion=False) |
| 230 | +``` |
| 231 | + |
| 232 | + |
| 233 | +```python |
| 234 | +deployment.watch() |
| 235 | +``` |
| 236 | + |
| 237 | +## Inference |
| 238 | + |
| 239 | + |
| 240 | +```python |
| 241 | +import requests |
| 242 | +from string import Template |
| 243 | +from datetime import datetime |
| 244 | + |
| 245 | + |
| 246 | +auth = ads.common.auth.default_signer()["signer"] |
| 247 | +prompt = "What amateur radio bands are best to use when there are solar flares? Keep you response to 100 words" |
| 248 | +endpoint = f"https://modeldeployment.us-ashburn-1.oci.customer-oci.com/{deployment.model_deployment_id}/predict" |
| 249 | + |
| 250 | +current_date = datetime.now().strftime("%d %B %Y") |
| 251 | + |
| 252 | +body = { |
| 253 | + "model": "openai/gpt-oss-120b", # this is a constant |
| 254 | + "messages":[ |
| 255 | + {"role": "user", |
| 256 | + "content": prompt |
| 257 | + }] |
| 258 | +} |
| 259 | +requests.post(endpoint, json=body, auth=auth, headers={}).json() |
| 260 | +``` |
| 261 | + |
| 262 | +#### Output: |
| 263 | + |
| 264 | + |
| 265 | +During solar flares the ionospheric D‑layer becomes heavily ionized, causing severe absorption of lower HF (3–10 MHz). The most reliable amateur bands are therefore the higher HF bands that are less affected—particularly 15 m (21 MHz), 12 m (24 MHz), 10 m (28 MHz) and the VHF/UHF “line‑of‑sight” bands (50 MHz, 70 MHz, 144 MHz, 432 MHz) which can still work via sporadic E or auroral propagation. If you must use lower HF, stick to the 20 m (14 MHz) band during the flare’s peak, as it often remains usable. Keep power modest and monitor real‑time solar flux indices. |
0 commit comments