Skip to content

Commit 00f0f6b

Browse files
committed
multi llm NIM
1 parent 983bd7e commit 00f0f6b

File tree

1 file changed

+9
-14
lines changed

1 file changed

+9
-14
lines changed

model-deployment/containers/nim/README-multillm-nim-containers.md

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,11 @@ We describe two approaches to create this Model Deployment on OCI:
1111
* Utilising Object storage to store the model and creating a model catalog pointing to Object storage bucket [Refer](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/nim/README-MODEL-CATALOG.md)
1212

1313
## Prerequisites
14-
* Access the corresponding NIM container for the model. For example for llama3, fetch the latest available image from [NGC catalog](https://catalog.ngc.nvidia.com/orgs/nim/teams/meta/containers/llama3-8b-instruct/tags). If you are a first time user, you need to sign up a developer account and wait for access to be granted to required container image.
15-
Click Get Container Button and click Request Access for NIM. At the time of writing this blog, you need a business email address to get access to NIM.
14+
* Access the corresponding NIM container llm. Click Get Container Button and click Request Access for NIM. At the time of writing this blog, you need a business email address to get access to NIM.
1615
* For downloading this image from NGC catalog, you need to perform docker login to nvcr.io. Details of login information are mentioned on their [public doc](https://docs.nvidia.com/launchpad/ai/base-command-coe/latest/bc-coe-docker-basics-step-02.html).
1716
Once logged in, we can directly pull image using -
1817
`docker pull nvcr.io/nim/nvidia/llm-nim:1.12.0`
1918
* Generate API key to interact with NIM NGC APIs. [Reference document](https://org.ngc.nvidia.com/setup/api-key).
20-
* Create a VCN with public connectivity as NIM container needs to call NGC publicaly exposed APIs. Please refer [public document](https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-create-cus-net.htm) for relevant information on custom networking.
2119
* Once the image is successfully pulled on your workstation, we will bring this image to Oracle Cloud Infrastructure Registry (OCIR). The necessary policies and process for OCIR interaction are mentioned in our [public docs](https://docs.oracle.com/en-us/iaas/data-science/using/mod-dep-byoc.htm).
2220

2321
## OCI Logging
@@ -48,15 +46,18 @@ When experimenting with new frameworks and models, it is highly advisable to att
4846
* Push the container image to the OCIR
4947

5048
```bash
51-
docker push `odsc-llm-nim:1.12.0`
49+
docker push <region>.ocir.io/<tenant-namespace>/`llm-nim:1.12.0`
5250
```
5351

5452
## Deploy on OCI Data Science Model Deployment
5553

5654
Once you built and pushed the NIM container, you can now use the `Bring Your Own Container` Deployment in OCI Data Science to deploy the Llama3 model
5755

5856
### Creating Model catalog
59-
Use any zip file to create a dummy model artifact. As we will be downloading model directly from NGC, we do not need to catalog the model. For catalogued based solution, refer [Readme](README-MODEL-CATALOG.md).
57+
58+
Follow the steps mentioned [here](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/model-deployment/containers/llama2/README.md#model-store-export-api-for-creating-model-artifacts-greater-than-6-gb-in-size)), refer the section One time download to OCI Model Catalog.
59+
60+
We would utilise the above created model in the next steps to create the Model Deployment.
6061

6162
### Create Model deploy
6263

@@ -68,17 +69,11 @@ Use any zip file to create a dummy model artifact. As we will be downloading mod
6869
* Key: `MODEL_DEPLOY_HEALTH_ENDPOINT`, Value: `/v1/health/ready`
6970
* Key: `NIM_SERVER_PORT`, Value: `8080`
7071
* Key: `SHM_SIZE`, Value: `10g`
71-
* Key: `STORAGE_SIZE_IN_GB`, Value: `120`
72-
* Key: `NCCL_CUMEM_ENABLE`, Value: `0`
73-
* Key: `WEB_CONCURRENCY`, Value: `1`
74-
* Key: `NGC_API_KEY`, Value: `<KEY_GENERATED_FROM_NGC>`
72+
* Key: `NIM_MODEL_NAME`, Value: `/opt/ds/model/deployed_model`
7573
* Key: `OPENSSL_FORCE_FIPS_MODE`, Value: `0`
76-
* Key: `NIM_MODEL_NAME`, Value: `hf://meta-llama/Meta-Llama-3-8B`
77-
* Key: `NIM_SERVED_MODEL_NAME`, Value: `meta/llama3-8b`
78-
* Key: `HF_TOKEN`, Value: `<HF_GENERATED_TOKEN>`
7974
* Under `Models` click on the `Select` button and select the Model Catalog entry we created earlier
8075
* Under `Compute` and then `Specialty and previous generation` select the `VM.GPU.A10.2` instance
81-
* Under `Networking` choose the `Custom Networking` option and bring the VCN and subnet, which allows Internet access.
76+
* Under `Networking` choose the `Default Networking` option.
8277
* Under `Logging` select the Log Group where you've created your predict and access log and select those correspondingly
8378
* Select the custom container option `Use a Custom Container Image` and click `Select`
8479
* Select the OCIR repository and image we pushed earlier
@@ -97,7 +92,7 @@ Use any zip file to create a dummy model artifact. As we will be downloading mod
9792
oci raw-request \
9893
--http-method POST \
9994
--target-uri <MODEL-DEPLOY-ENDPOINT> \
100-
--request-body '{"model": "meta/llama3-8b-instruct", "messages": [ { "role":"user", "content":"Hello! How are you?" }, { "role":"assistant", "content":"Hi! I am quite well, how can I help you today?" }, { "role":"user", "content":"Can you write me a song?" } ], "top_p": 1, "n": 1, "max_tokens": 200, "stream": false, "frequency_penalty": 1.0, "stop": ["hello"] }' \
95+
--request-body '{"model": "/opt/ds/model/deployed_model", "messages": [ { "role":"user", "content":"Hello! How are you?" }, { "role":"assistant", "content":"Hi! I am quite well, how can I help you today?" }, { "role":"user", "content":"Can you write me a song?" } ], "top_p": 1, "n": 1, "max_tokens": 200, "stream": false, "frequency_penalty": 1.0, "stop": ["hello"] }' \
10196
--auth resource_principal
10297
```
10398

0 commit comments

Comments
 (0)