|
| 1 | +# Deploy LLM Models using BYOC |
| 2 | + |
| 3 | +This guide demonstrates how to deploy and perform inference using AI Quick Action registered models with Oracle Data Science Service Managed Containers (SMC) powered by vLLM. In this example, we will use a model downloaded from Hugging Face specifically, [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) from OpenAI. |
| 4 | + |
| 5 | + |
| 6 | +## Required IAM Policies |
| 7 | + |
| 8 | +Add these [policies](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2#required-iam-policies) to grant access to OCI services. |
| 9 | + |
| 10 | +## Setup |
| 11 | + |
| 12 | + |
| 13 | + |
| 14 | +```python |
| 15 | +# Install required python packages |
| 16 | + |
| 17 | +!pip install oracle-ads |
| 18 | +!pip install oci |
| 19 | +!pip install huggingface_hub |
| 20 | +``` |
| 21 | + |
| 22 | + |
| 23 | +```python |
| 24 | +# Uncomment this code and set the correct proxy links if have to setup proxy for internet |
| 25 | +# import os |
| 26 | +# os.environ['http_proxy']="http://myproxy" |
| 27 | +# os.environ['https_proxy']="http://myproxy" |
| 28 | + |
| 29 | +# Use os.environ['no_proxy'] to route traffic directly |
| 30 | +``` |
| 31 | + |
| 32 | + |
| 33 | +```python |
| 34 | +import ads |
| 35 | +import os |
| 36 | + |
| 37 | +ads.set_auth("resource_principal") |
| 38 | +``` |
| 39 | + |
| 40 | + |
| 41 | +```python |
| 42 | +# Extract region information from the Notebook environment variables and signer. |
| 43 | +ads.common.utils.extract_region() |
| 44 | +``` |
| 45 | + |
| 46 | +### Common variables |
| 47 | + |
| 48 | + |
| 49 | +```python |
| 50 | +# change as required for your environment |
| 51 | +compartment_id = os.environ["PROJECT_COMPARTMENT_OCID"] |
| 52 | +project_id = os.environ["PROJECT_OCID"] |
| 53 | + |
| 54 | +log_group_id = "ocid1.loggroup.oc1.xxx.xxxxx" |
| 55 | +log_id = "ocid1.log.oc1.xxx.xxxxx" |
| 56 | + |
| 57 | +instance_shape = "BM.GPU.H100.8" |
| 58 | + |
| 59 | +region = "<your-region>" |
| 60 | +``` |
| 61 | + |
| 62 | +## API Endpoint Usage |
| 63 | + |
| 64 | +The `/v1/completions` is for interacting with non-chat base models or the instruction trained chat model. This endpoint provides the completion for a single prompt and takes a single string as input, whereas the `/v1/chat/completions` endpoint provides the responses for a given dialog and requires the input in a specific format corresponding to the message history. This guide uses `/v1/chat/completions` endpoint. |
| 65 | + |
| 66 | + |
| 67 | +## Prepare The Model Artifacts |
| 68 | + |
| 69 | +To prepare Model artifacts for LLM model deployment: |
| 70 | + |
| 71 | +- Download the model files from huggingface to local directory using a valid huggingface token (only needed for gated models). If you don't have Huggingface Token, refer [this](https://huggingface.co/docs/hub/en/security-tokens) to generate one. |
| 72 | +- Upload the model folder to a [versioned bucket](https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/usingversioning.htm) in Oracle Object Storage. If you don’t have an Object Storage bucket, create one using the OCI SDK or the Console. Create an Object Storage bucket. Make a note of the `namespace`, `compartment`, and `bucketname`. Configure the policies to allow the Data Science service to read and write the model artifact to the Object Storage bucket in your tenancy. An administrator must configure the policies in IAM in the Console. |
| 73 | +- Create model catalog entry for the model using the Object storage path |
| 74 | + |
| 75 | +### Model Download from HuggingFace Model Hub |
| 76 | + |
| 77 | + |
| 78 | +```shell |
| 79 | +# Login to huggingface using env variable |
| 80 | +huggingface-cli login --token <HUGGINGFACE_TOKEN> |
| 81 | +``` |
| 82 | + |
| 83 | +[This](https://huggingface.co/docs/huggingface_hub/en/guides/cli#download-an-entire-repository) provides more information on using `huggingface-cli` to download an entire repository at a given revision. Models in the HuggingFace hub are stored in their own repository. |
| 84 | + |
| 85 | + |
| 86 | +```shell |
| 87 | +# Select the the model that you want to deploy. |
| 88 | + |
| 89 | +huggingface-cli download openai/gpt-oss-120b --local-dir models/gpt-oss-120b |
| 90 | +``` |
| 91 | + |
| 92 | +## Upload Model to OCI Object Storage |
| 93 | + |
| 94 | + |
| 95 | +```shell |
| 96 | +oci os object bulk-upload --src-dir $local_dir --prefix gpt-oss-120b/ -bn <bucket_name> -ns <bucket_namespace> --auth "resource_principal" |
| 97 | +``` |
| 98 | + |
| 99 | +## Create Model by Reference using ADS |
| 100 | + |
| 101 | + |
| 102 | + |
| 103 | +```python |
| 104 | +from ads.model.datascience_model import DataScienceModel |
| 105 | + |
| 106 | +artifact_path = f"oci://{bucket}@{namespace}/{model_prefix}" |
| 107 | + |
| 108 | +model = ( |
| 109 | + DataScienceModel() |
| 110 | + .with_compartment_id(compartment_id) |
| 111 | + .with_project_id(project_id) |
| 112 | + .with_display_name("gpt-oss-120b ") |
| 113 | + .with_artifact(artifact_path) |
| 114 | +) |
| 115 | + |
| 116 | +model.create(model_by_reference=True) |
| 117 | +``` |
| 118 | + |
| 119 | +## Inference container |
| 120 | + |
| 121 | +vLLM is an easy-to-use library for LLM inference and server. You can get the container image from [DockerHub](https://hub.docker.com/r/vllm/vllm-openai/tags). |
| 122 | + |
| 123 | +```shell |
| 124 | +docker pull --platform linux/amd64 vllm/vllm-openai:gptoss |
| 125 | +``` |
| 126 | + |
| 127 | +Currently, OCI Data Science Model Deployment only supports container images residing in the OCI Registry. Before we can push the pulled vLLM container, make sure you have created a repository in your tenancy. |
| 128 | +- Go to your tenancy Container Registry |
| 129 | +- Click on the Create repository button |
| 130 | +- Select Private under Access types |
| 131 | +- Set a name for Repository name. We are using "vllm-odsc "in the example. |
| 132 | +- Click on Create button |
| 133 | + |
| 134 | +You may need to docker login to the Oracle Cloud Container Registry (OCIR) first, if you haven't done so before in order to push the image. To login, you have to use your API Auth Token that can be created under your Oracle Cloud Account->Auth Token. You need to login only once. Replace <region> with the OCI region you are using. |
| 135 | + |
| 136 | +```shell |
| 137 | +docker login -u '<tenant-namespace>/<username>' <region>.ocir.io |
| 138 | +``` |
| 139 | + |
| 140 | +If your tenancy is federated with Oracle Identity Cloud Service, use the format <tenancy-namespace>/oracleidentitycloudservice/<username>. You can then push the container image to the OCI Registry |
| 141 | + |
| 142 | +```shell |
| 143 | +docker tag vllm/vllm-openai:gptoss -t <region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss |
| 144 | +docker push <region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss |
| 145 | + |
| 146 | + |
| 147 | +### Import Model Deployment Modules |
| 148 | + |
| 149 | +```python |
| 150 | +from ads.model.deployment import ( |
| 151 | + ModelDeployment, |
| 152 | + ModelDeploymentContainerRuntime, |
| 153 | + ModelDeploymentInfrastructure, |
| 154 | + ModelDeploymentMode, |
| 155 | +) |
| 156 | +``` |
| 157 | + |
| 158 | +## Setup Model Deployment Infrastructure |
| 159 | + |
| 160 | + |
| 161 | + |
| 162 | +```python |
| 163 | +container_image = "<region>.ocir.io/<tenancy>/vllm-odsc/vllm-openai:gptoss" # name given to vllm image pushed to oracle container registry |
| 164 | +``` |
| 165 | + |
| 166 | +```python |
| 167 | +infrastructure = ( |
| 168 | + ModelDeploymentInfrastructure() |
| 169 | + .with_project_id(project_id) |
| 170 | + .with_compartment_id(compartment_id) |
| 171 | + .with_shape_name(instance_shape) |
| 172 | + .with_bandwidth_mbps(10) |
| 173 | + .with_replica(1) |
| 174 | + .with_web_concurrency(1) |
| 175 | + .with_access_log( |
| 176 | + log_group_id=log_group_id, |
| 177 | + log_id=log_id, |
| 178 | + ) |
| 179 | + .with_predict_log( |
| 180 | + log_group_id=log_group_id, |
| 181 | + log_id=log_id, |
| 182 | + ) |
| 183 | +) |
| 184 | +``` |
| 185 | + |
| 186 | +## Configure Model Deployment Runtime |
| 187 | + |
| 188 | + |
| 189 | + |
| 190 | +```python |
| 191 | +env_var = { |
| 192 | + "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/chat/completions", |
| 193 | +} |
| 194 | +
|
| 195 | +cmd_var = [ |
| 196 | + "--model", |
| 197 | + f"/opt/ds/model/deployed_model/{model_prefix}", |
| 198 | + "--tensor-parallel-size", |
| 199 | + "2", |
| 200 | + "--port", |
| 201 | + "8080", |
| 202 | + "--served-model-name", |
| 203 | + "openai/gpt-oss-120b", |
| 204 | + "--host", |
| 205 | + "0.0.0.0", |
| 206 | + "--trust-remote-code", |
| 207 | +] |
| 208 | +
|
| 209 | +container_runtime = ( |
| 210 | + ModelDeploymentContainerRuntime() |
| 211 | + .with_image(container_image) |
| 212 | + .with_server_port(8080) |
| 213 | + .with_health_check_port(8080) |
| 214 | + .with_env(env_var) |
| 215 | + .with_cmd(cmd_var) |
| 216 | + .with_deployment_mode(ModelDeploymentMode.HTTPS) |
| 217 | + .with_model_uri(model.id) |
| 218 | + .with_region(region) |
| 219 | +) |
| 220 | +``` |
| 221 | + |
| 222 | +## Deploy Model using Container Runtime |
| 223 | + |
| 224 | + |
| 225 | + |
| 226 | +```python |
| 227 | +deployment = ( |
| 228 | + ModelDeployment() |
| 229 | + .with_display_name(f"{model_prefix} MD with BYOC") |
| 230 | + .with_description(f"Deployment of {model_prefix} MD with vLLM BYOC container") |
| 231 | + .with_infrastructure(infrastructure) |
| 232 | + .with_runtime(container_runtime) |
| 233 | +).deploy(wait_for_completion=False) |
| 234 | +``` |
| 235 | + |
| 236 | + |
| 237 | +```python |
| 238 | +deployment.watch() |
| 239 | +``` |
| 240 | + |
| 241 | +## Inference |
| 242 | + |
| 243 | + |
| 244 | +```python |
| 245 | +import requests |
| 246 | +from string import Template |
| 247 | +from datetime import datetime |
| 248 | +
|
| 249 | +
|
| 250 | +auth = ads.common.auth.default_signer()["signer"] |
| 251 | +prompt = "What amateur radio bands are best to use when there are solar flares?" |
| 252 | +endpoint = f"https://modeldeployment.us-ashburn-1.oci.customer-oci.com/{deployment.model_deployment_id}/predict" |
| 253 | +
|
| 254 | +current_date = datetime.now().strftime("%d %B %Y") |
| 255 | +
|
| 256 | +prompt="What amateur radio bands are best to use when there are solar flares?" |
| 257 | +
|
| 258 | +body = { |
| 259 | + "model": "openai/gpt-oss-120b", # this is a constant |
| 260 | + "messages":[ |
| 261 | + {"role": "user", |
| 262 | + "content": prompt |
| 263 | + }] |
| 264 | +} |
| 265 | +requests.post(endpoint, json=body, auth=auth, headers={}).json() |
| 266 | +``` |
| 267 | + |
| 268 | +#### Output: |
| 269 | + |
| 270 | + |
| 271 | +**Short answer:** |
| 272 | +During a solar flare the **higher HF bands (≈10 MHz and up)** tend to work best, while the **lower HF bands (≤ 15 MHz, especially 80 m/160 m)** are usually “blacked‑out” by D‑layer absorption. The most usable bands are generally **15 m, 12 m, 10 m and, for a short burst, 6 m** (and occasionally VHF/UHF if a sporadic‑E layer is present). |
| 273 | + |
| 274 | +Below is a practical guide that explains why, how to recognise the conditions, and what you can actually do on the air. |
| 275 | + |
| 276 | +--- |
| 277 | + |
| 278 | +## 1. What a solar flare does to the ionosphere |
| 279 | + |
| 280 | +| Phenomenon | How it affects propagation | Time scale | |
| 281 | +|------------|---------------------------|-----------| |
| 282 | +| **X‑ray & extreme‑UV burst** (seconds to minutes) | Sudden increase of ionisation in the **D‑layer (≈60‑90 km)** → **enhanced absorption** of HF signals, especially below ~15 MHz. | Immediate, lasts a few minutes (the “Sudden Ionospheric Disturbance”, SID). | |
| 283 | +| **UV/EUV hardening** (minutes) | Raises the **E‑layer MUF** (Maximum Usable Frequency) → higher‑frequency HF can travel farther. | 5–30 min after flare onset. | |
| 284 | +| **Cosmic‑ray induced ionisation (in the F‑layer)** | Slightly improves F‑layer density → modest long‑term HF enhancement on the high bands. | Hours‑to‑days after a large flare. | |
| 285 | +| **Associated CME & geomagnetic storm** (hours‑days) | If a coronal mass ejection follows, it can cause **geomagnetic disturbance** (Kp ↑) → spread‑F, auroral absorption, and HF degradation on the very high bands (often > 30 MHz). | Hours‑days later, a separate phenomenon from the prompt flare. | |
| 286 | + |
| 287 | +> **Rule of thumb:** |
| 288 | +> *If you see a sudden loss of signal on 80 m/40 m/20 m right after a flare, the D‑layer is “turned on”. Switch to a band above the current MUF (typically 15 m‑10 m) and you’ll often get a clear opening.* |
| 289 | + |
| 290 | +--- |
| 291 | + |
| 292 | +## 2. Which amateur bands survive – and why |
| 293 | + |
| 294 | +| Band (approx.) | Typical behavior during a flare | Why it works (or fails) | |
| 295 | +|----------------|--------------------------------|------------------------| |
| 296 | +| **160 m (1.8 – 2.0 MHz)** | Almost always **dead** during the X‑ray burst. | Deep D‑layer absorption; low MUF. | |
| 297 | +| **80 m (3.5 – 4.0 MHz)** | Heavy fade‑out; may recover only after the X‑ray flux drops. | Still within D‑layer absorption zone. | |
| 298 | +| **60 m (5.3 – 5.4 MHz, US only)** | Similar to 80 m; may see short “pings” when the flare decays. | Near the edge of D‑layer absorption. | |
| 299 | +| **40 m (7.0 – 7.3 MHz)** | Often dead for the first 5‑15 min; may recover later if the flare is modest (C‑class). | Still vulnerable; MUF may stay < 7 MHz. | |
| 300 | +| **30 m (10.1 – 10.15 MHz)** | **Best of the lower HF**; can survive a weak flare but usually fades with M‑class or stronger events. | Near the D‑layer limit; occasional openings. | |
| 301 | +| **20 m (14.0 – 14.35 MHz)** | **Usually usable**, especially on the rising edge of the flare when the MUF is driven up. | Above the D‑layer cut‑off, and the **MUF often rises to 18‑20 MHz**. | |
| 302 | +| **17 m (18.068 – 18.168 MHz)** | Good, often better than 20 m during the flare peak. | MUF can exceed 20 MHz. | |
| 303 | +| **15 m (21.0 – 21.45 MHz)** | **Very reliable** for a few minutes to an hour after the flare begins. | Well above the absorption region; the ionosphere is “pumped up”. | |
| 304 | +| **12 m (24.89 – 24.99 MHz)** | Excellent when the flare is strong (M‑ or X‑class). | High MUF, low absorption. | |
| 305 | +| **10 m (28.0 – 29.7 MHz)** | Often the **best** band during and immediately after a strong flare; can support worldwide contacts if the Sun is active. | MUF frequently > 30 MHz; propagation driven by the F‑layer. | |
| 306 | +| **6 m (50‑54 MHz)** | **Sporadic‑E openings** can appear for 5‑30 min after a strong flare, giving VHF‑range contacts. | The flare can trigger short‑lived enhancements of the E‑layer irregularities. | |
| 307 | +| **2 m/70 cm (144‑148 MHz, 430‑440 MHz)** | Mostly unaffected except during **auroral absorption** from a subsequent geomagnetic storm. | Propagation is line‑of‑sight; solar flare impact is minimal. | |
| 308 | + |
| 309 | +**Bottom line:** **15 m, 12 m and 10 m are the “go‑to” bands** when a flare erupts. If you have a VHF setup, keep an eye on **6 m** for a brief Sporadic‑E window. |
| 310 | + |
| 311 | +--- |
| 312 | + |
| 313 | +## 3. How to know a flare is occurring (real‑time tools) |
| 314 | + |
| 315 | +| Tool | What it shows | How to use it for band choice | |
| 316 | +|------|----------------|-------------------------------| |
| 317 | +| **NOAA Space Weather Prediction Center (SWPC) – X‑ray flux plot** | GOES satellite X‑ray flux (C, M, X class) in 0.1‑8 Å band, updated every minute. | When **≥ M‑class** appears, expect D‑layer absorption. Move to > 10‑MHz bands. | |
| 318 | +| **NOAA A‑index & K‑index** | Global geomagnetic activity. A‑index spikes during flare‑related ionospheric disturbances. | A‑index > 5 → D‑layer absorption heavy; stay on high HF. | |
| 319 | +| **NASA DSCOVR + ACE real‑time solar wind data** | Solar wind speed, density, Bz. Useful for upcoming CME (hours later). | If a CME is inbound, plan for later geomagnetic storm; may need to drop back to lower bands after the flare fades. | |
| 320 | +| **Ham‑radio specific sites** (e.g., *DXMaps*, *SolarHam*, *N4EP’s Solar Flare Alerts*) | Summarise current solar flux, sunspot number, and flare alerts. | Quick check before a night‑time contest or QSO. | |
| 321 | +| **Propagation prediction software** (VOACAP, Ham Radio Deluxe, Hamlib *propagation* tools) | Calculates MUF and expected signal‑to‑noise for given time, band, and solar conditions. | Input current solar flux (S‑index) and A‑index to see which bands will be above the MUF. | |
| 322 | + |
| 323 | +--- |
| 324 | + |
| 325 | +## 4. Practical operating tips |
| 326 | + |
| 327 | +1. **Listen first.** Tune a 20 m or 15 m receiver while the flare is active. If you can hear stations that were silent before, the MUF has risen. |
| 328 | +2. **Keep a “band‑switch” plan.** Have a preset list: |
| 329 | + - **Start:** 20 m (if you’re already there). |
| 330 | + - **If dead:** Jump to 15 m → 12 m → 10 m. |
| 331 | + - **If you have a 6 m rig:** Try a quick “Sporadic‑E” sweep (73 – 75 MHz) after the flare’s peak. |
| 332 | +3. **Power & antennas.** Higher frequencies need slightly more power for the same distance because free‑space loss rises with frequency, but the reduced absorption more than compensates. A simple half‑wave dipole on 10 m or a vertical with a good ground works well. |
| 333 | +4. **Log the time.** Note the exact UTC time of the flare onset (you can copy the GOES timestamp). This data is useful for later propagation analysis and for other hams. |
| 334 | +5. **Avoid the “N‑range” (near‑field) on VHF/UHF** during an accompanying geomagnetic storm, as auroral absorption can produce erratic signal fading. |
| 335 | +6. **Be ready for rapid fade‑out.** Flares can cause a *Sudden Ionospheric Disturbance* that lasts only a few minutes. If you’re on a low band, you may get a brief “ripple” of S‑SB on SSB or CW before the signal disappears; switch to a higher band immediately. |
| 336 | + |
| 337 | +--- |
| 338 | + |
| 339 | +## 5. Example scenario |
| 340 | + |
| 341 | +| Time (UTC) | Solar event | Ionospheric effect | Recommended band(s) | |
| 342 | +|------------|-------------|--------------------|----------------------| |
| 343 | +| 12:30 | **C‑class flare** peaks (1 × 10⁻⁶ W/m²) | Mild D‑layer absorption; MUF rises to ~13 MHz. | 20 m still usable, 15 m opens. | |
| 344 | +| 12:35 | **M‑class flare** peaks (5 × 10⁻⁵ W/m²) | Strong D‑layer absorption; MUF climbs to 18‑20 MHz. | Switch to 15 m, 12 m, 10 m. | |
| 345 | +| 12:40 | **X‑class flare** peaks (1 × 10⁻⁴ W/m²) | D‑layer blackout of < 15 MHz; MUF may exceed 25 MHz for ~10 min. | 10 m is best; try 6 m if you have a VHF rig. | |
| 346 | +| 12:55 | Flare decays, A‑index spikes to 7 | D‑layer recovers; MUF drops back to ~16‑18 MHz. | Return to 15 m/12 m; 20 m may become usable again after ~15 min. | |
| 347 | + |
| 348 | +--- |
| 349 | + |
| 350 | +## 6. Quick “cheat sheet” for the radio operator |
| 351 | + |
| 352 | +| Solar flare class | Expected absorption (low HF) | MUF trend | Best bands (immediate) | |
| 353 | +|-------------------|------------------------------|-----------|------------------------| |
| 354 | +| **C‑class** | Light | 10‑13 MHz | 20 m → 15 m | |
| 355 | +| **M‑class** | Moderate → heavy | 15‑20 MHz | 15 m, 12 m, 10 m | |
| 356 | +| **X‑class** | Very heavy (D‑layer blackout) | 20‑30 MHz+ (short‑lived) | 10 m, 12 m, 15 m, 6 m (if you have it) | |
| 357 | + |
| 358 | +**Remember:** |
| 359 | +- The **higher the class**, the *more* the low bands are suppressed, **but the *higher* the MUF becomes**. |
| 360 | +- The **effect lasts only a few minutes** (the X‑ray burst); the **enhanced propagation on high HF may linger for 30‑60 min** as the ionosphere “settles”. |
| 361 | + |
| 362 | +--- |
| 363 | + |
| 364 | +## 7. Resources you can bookmark |
| 365 | + |
| 366 | +| Resource | URL | What you get | |
| 367 | +|----------|-----|--------------| |
| 368 | +| NOAA Space Weather Prediction Center (SWPC) – X‑ray flux | <https://www.swpc.noaa.gov/online-data/goes-x-ray-flux> | Real‑time flare intensities. | |
| 369 | +| Space Weather Live – Solar Data | <https://www.spaceweatherlive.com/> | Solar flux (S‑index), sunspot number, flare alerts. | |
| 370 | +| VOACAP Online (propagation predictions) | <https://qsl.net/kb9v/voacap/> | MUF, signal‑to‑noise for any band/time. | |
| 371 | +| Ham Radio Deluxe “Propagation” window | (Desktop software) | Instant MUF for your location. | |
| 372 | +| N4EP Solar Flare Alerts (email) | <https://n4ep.com/> | Short‑msg alerts for strong flares. | |
| 373 | +| DXMaps – Current band conditions | <https://dxmaps.com/> | Crowd‑sourced band opening reports. | |
| 374 | + |
| 375 | +--- |
| 376 | + |
| 377 | +### Bottom line |
| 378 | + |
| 379 | +- **When a flare erupts, drop to the **higher HF** part of the spectrum (≥ 15 m, preferably 12 m‑10 m).** |
| 380 | +- **Avoid the lower HF bands** (80 m, 40 m, 30 m) while the X‑ray burst is on‑line. |
| 381 | +- **If you have a VHF kit, try 6 m** after the flare’s peak for a brief Sporadic‑E window. |
| 382 | +- **Monitor real‑time solar data** (GOES X‑ray, A‑index) and use a propagation tool (VOACAP) to confirm the current MUF before you switch. |
| 383 | + |
| 384 | +Happy DX’ing, and may the Sun be with you! |
0 commit comments