Skip to content

Commit 1a9505e

Browse files
Readme updates for diffusion model (#612)
* Readme updates for diffusion model * Add NIM vanilla container solution
1 parent c9d97fb commit 1a9505e

File tree

2 files changed

+112
-11
lines changed

2 files changed

+112
-11
lines changed
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# Overview
2+
3+
This Readme walks through how to use NIM - [ Meta-Llama-3-8B-Instruct](https://huggingface.co/Undi95/Meta-Llama-3-8B-Instruct-hf) based container to deploy on OCI Data Science Model Deploy. This readme is applicable if you want to source NIM containers without any modifications and can implement a solution where NGC key is visible at model deployment level.
4+
5+
* [llama3](https://github.com/meta-llama/llama3) from Meta.
6+
* [NIM](https://catalog.ngc.nvidia.com/orgs/nim/teams/meta/containers/llama3-8b-instruct) by Nvidia
7+
8+
We describe two approaches to create this Model Deployment on OCI:
9+
* Download Model using API-KEY from NGC Nvidia (described below)
10+
* Utilising Object storage to store the model and creating a model catalog pointing to Object storage bucket [Refer](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/nim/README-MODEL-CATALOG.md)
11+
12+
## Prerequisites
13+
* Access the corresponding NIM container for the model. For example for llama3, fetch the latest available image from [NGC catalog](https://catalog.ngc.nvidia.com/orgs/nim/teams/meta/containers/llama3-8b-instruct/tags). If you are a first time user, you need to sign up a developer account and wait for access to be granted to required container image.
14+
Click Get Container Button and click Request Access for NIM. At the time of writing this blog, you need a business email address to get access to NIM.
15+
* For downloading this image from NGC catalog, you need to perform docker login to nvcr.io. Details of login information are mentioned on their [public doc](https://docs.nvidia.com/launchpad/ai/base-command-coe/latest/bc-coe-docker-basics-step-02.html).
16+
Once logged in, we can directly pull image using -
17+
`docker pull nvcr.io/nim/meta/llama3-8b-instruct:latest`
18+
* Generate API key to interact with NIM NGC APIs. [Reference document](https://org.ngc.nvidia.com/setup/api-key).
19+
* Create a VCN with public connectivity as NIM container needs to call NGC publicaly exposed APIs. Please refer [public document](https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-create-cus-net.htm) for relevant information on custom networking.
20+
* Once the image is successfully pulled on your workstation, we will bring this image to Oracle Cloud Infrastructure Registry (OCIR). The necessary policies and process for OCIR interaction are mentioned in our [public docs](https://docs.oracle.com/en-us/iaas/data-science/using/mod-dep-byoc.htm).
21+
22+
## OCI Logging
23+
When experimenting with new frameworks and models, it is highly advisable to attach log groups to model deployment in order to enable self assistance in debugging. Follow below steps to create log groups.
24+
25+
* Create logging for the model deployment (if you have to already created, you can skip this step)
26+
* Go to the [OCI Logging Service](https://cloud.oracle.com/logging/log-groups) and select `Log Groups`
27+
* Either select one of the existing Log Groups or create a new one
28+
* In the log group create ***two*** `Log`, one predict log and one access log, like:
29+
* Click on the `Create custom log`
30+
* Specify a name (predict|access) and select the log group you want to use
31+
* Under `Create agent configuration` select `Add configuration later`
32+
* Then click `Create agent configuration`
33+
34+
35+
##### To directly get image from Nvidia NIM catalogue and upload to OCIR check: ```./README-SOURCE-NIM-TO-OCIR.MD```
36+
37+
## OCI Container Registry
38+
39+
* You need to `docker login` to the Oracle Cloud Container Registry (OCIR) first, if you haven't done so before been able to push the image. To login, you have to use your [API Auth Token](https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrygettingauthtoken.htm) that can be created under your `Oracle Cloud Account->Auth Token`. You need to login only once.
40+
41+
```bash
42+
docker login -u '<tenant-namespace>/<username>' <region>.ocir.io
43+
```
44+
45+
If `your tenancy` is **federated** with Oracle Identity Cloud Service, use the format `<tenancy-namespace>/oracleidentitycloudservice/<username>`
46+
47+
* Push the container image to the OCIR
48+
49+
```bash
50+
docker push `odsc-nim-llama3:latest`
51+
```
52+
53+
## Deploy on OCI Data Science Model Deployment
54+
55+
Once you built and pushed the NIM container, you can now use the `Bring Your Own Container` Deployment in OCI Data Science to deploy the Llama3 model
56+
57+
### Creating Model catalog
58+
Use any zip file to create a dummy model artifact. As we will be downloading model directly from NGC, we do not need to catalog the model. For catalogued based solution, refer [Readme](README-MODEL-CATALOG.md).
59+
60+
### Create Model deploy
61+
62+
* To deploy the model now in the console, navigate to your [OCI Data Science Project](https://cloud.oracle.com/data-science/project)
63+
* Select the project created earlier and then select `Model Deployment`
64+
* Click on `Create model deployment`
65+
* Under `Default configuration` set following custom environment variables
66+
* Key: `MODEL_DEPLOY_PREDICT_ENDPOINT`, Value: `/v1/completions`
67+
* Key: `MODEL_DEPLOY_HEALTH_ENDPOINT`, Value: `/v1/health/ready`
68+
* Key: `NIM_SERVER_PORT`, Value: `8080`
69+
* Key: `SHM_SIZE`, Value: `10g`
70+
* Key: `STORAGE_SIZE_IN_GB`, Value: `120`
71+
* Key: `NCCL_CUMEM_ENABLE`, Value: `0`
72+
* Key: `WEB_CONCURRENCY`, Value: `1`
73+
* Under `Models` click on the `Select` button and select the Model Catalog entry we created earlier
74+
* Under `Compute` and then `Specialty and previous generation` select the `VM.GPU.A10.2` instance
75+
* Under `Networking` choose the `Custom Networking` option and bring the VCN and subnet, which allows Internet access.
76+
* Under `Logging` select the Log Group where you've created your predict and access log and select those correspondingly
77+
* Select the custom container option `Use a Custom Container Image` and click `Select`
78+
* Select the OCIR repository and image we pushed earlier
79+
* Leave the ports as the default port is 8080.
80+
* Leave CMD as below and Entrypoint as blank. Use `Add parameter` and populate each text field with comma separated values -
81+
`python3, -m, vllm_nvext.entrypoints.openai.api_server, --enforce-eager, --gpu-memory-utilization, 0.85, --max-model-len, 2048`
82+
* Click on `Create` button to create the model deployment
83+
84+
* Once the model is deployed and shown as `Active`, you can execute inference against it.
85+
* Go to the model you've just deployed and click on it
86+
* Under the left side under `Resources` select `Invoking your model`
87+
* You will see the model endpoint under `Your model HTTP endpoint` copy it.
88+
89+
## Inference
90+
91+
```bash
92+
oci raw-request \
93+
--http-method POST \
94+
--target-uri <MODEL-DEPLOY-ENDPOINT> \
95+
--request-body '{"model": "meta/llama3-8b-instruct", "messages": [ { "role":"user", "content":"Hello! How are you?" }, { "role":"assistant", "content":"Hi! I am quite well, how can I help you today?" }, { "role":"user", "content":"Can you write me a song?" } ], "top_p": 1, "n": 1, "max_tokens": 200, "stream": false, "frequency_penalty": 1.0, "stop": ["hello"] }' \
96+
--auth resource_principal
97+
```
98+
99+
## Troubleshooting
100+
101+
[Reference](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2#troubleshooting)
Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,24 @@
11
# Introduction
22
Diffusion models are a type of generative model that learns to create new data samples by reversing a gradual process of adding noise to an initial sample. They work by first adding noise to real data points, gradually transforming them into pure noise, and then training a neural network to reverse this process, effectively learning to generate data from noise.
33

4-
[BentoML](https://github.com/bentoml/BentoML) is a Python library for building online serving systems optimized for AI apps and model inference. WHat sets it apart from other text generation frameworks is that it can also support image generation usecase with Stable Diffusion 3 Medium, Stable Video Diffusion, Stable Diffusion XL Turbo, ControlNet, and LCM LoRAs.
5-
In this sample, we are going to deploy [Stable Diffusion 3 Medium](https://github.com/bentoml/BentoDiffusion/tree/main/sd3-medium) with BentoML
4+
[BentoML](https://github.com/bentoml/BentoML) is a Python library for building online serving systems optimized for AI apps and model inference. What sets it apart from other text generation frameworks, is that it can also support image generation usecases with Stable Diffusion 3 Medium, Stable Video Diffusion, Stable Diffusion XL Turbo, ControlNet, and LCM LoRAs.
5+
In this sample, we are going to deploy [Stable Diffusion 3 Medium](https://github.com/bentoml/BentoDiffusion/tree/main/sd3-medium) with BentoML.
66

77
# Steps
88

99
## Dockerize
10-
First let's dockerize the our model serving framework using the [Dockerfile](./Dockerfile).
10+
First let's dockerize our model serving framework using the [Dockerfile](./Dockerfile).
1111
```
12-
docker build -f Dockerfile -t bentoml:latest .
12+
docker build -f Dockerfile -t bentoml:latest .
1313
```
1414

15-
## Create BentoML framework API code to serve Stable Diffusion 3 Medium on the framework
16-
Refer code in [directory](./sd3-medium).
17-
Note the changes done in order to support this on OCI Data Science Model Deployment.
15+
## Create BentoML framework API code to serve Stable Diffusion 3 Medium
16+
Refer the code in [directory](./sd3-medium).
17+
Note the changes that are done, in order to support it on OCI Data Science Model Deployment.
1818
* Add readiness logic if needed, for checking health of model server.
1919
* Add route in bentoml api to support `predict` api endpoint for image generation.
20-
* Check OCI Buckets integration using resource principal to put the generated images in bucket of your choice.
21-
NOTE - In order to allow model deployment service create objects in your bucket, add the policy
20+
* Check OCI Buckets integration using resource principal, to put the generated images in the bucket of your choice.
21+
NOTE - In order to allow model deployment service create objects in your bucket, add below policy -
2222
```
2323
allow any-user to manage objects in compartment <compartment> where ALL { request.principal.type='datasciencemodeldeployment', target.bucket.name='<BUCKET_NAME>' }
2424
```
@@ -36,9 +36,9 @@ Note - Create a VCN, Subnet with internet connectivity in order to fetch the mod
3636
Create model deployment using the [file](./model-deployment.py) as reference.
3737

3838
## Prediction
39-
Once MD is active, use below curl request to send a request
39+
Once MD is active, use below oci cli request to send an image generation call -
4040
```
4141
oci raw-request --http-method POST --target-uri <MODEL_DEPLOYMENT_ENDPOINT> --request-body '{ "prompt": "A cat holding a sign that says hello World", "num_inference_steps": 10,"guidance_scale": 7.0 }' --request-headers '{"Content-Type":"application/json"}'
4242
```
4343

44-
Genrated image will be placed in chosen bucket.
44+
Generated image will be placed in chosen bucket.

0 commit comments

Comments
 (0)