Skip to content

Commit 9feef48

Browse files
committed
Update working configuration
1 parent f64e0a0 commit 9feef48

File tree

1 file changed

+7
-8
lines changed

1 file changed

+7
-8
lines changed

model-deployment/containers/nim/README-Nemotron.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,6 @@
33
The NVIDIA Nemotron family of multimodal models provides state-of-the-art agentic reasoning for graduate-level scientific reasoning, advanced math, coding, instruction following, tool calling, and visual reasoning. Nemotron models excel in vision for enterprise optical character recognition (OCR) and in reasoning for building agentic AI.
44
In this guide, we are going to deploy [Mistral-Nemo-12B-Instruct](https://catalog.ngc.nvidia.com/orgs/nim/teams/nv-mistralai/containers/mistral-nemo-12b-instruct) on OCI Data Science.
55

6-
We will describe an [AirGap solution](https://docs.nvidia.com/nim/large-language-models/latest/deploy-air-gap.html) where all pre-requisities like model and inference engine will be first brought to OCI and then used for deployment. Hence, there will be no need for any external connectivity.
7-
8-
* Download [nvidia/Mistral-NeMo-12B-Instruct model](https://huggingface.co/nvidia/Mistral-NeMo-12B-Instruct/tree/main) from huggingface.
9-
* Utilising Object storage to store the model and creating a model catalog pointing to Object storage bucket [Refer](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/nim/README-MODEL-CATALOG.md)
106

117
## Prerequisites
128
* Access the corresponding NIM container llm. Click Get Container Button and click Request Access for NIM. At the time of writing this blog, you need a business email address to get access to NIM.
@@ -53,8 +49,7 @@ Once you built and pushed the NIM container, you can now use the `Bring Your Own
5349

5450
### Creating Model catalog
5551

56-
Follow the steps mentioned [here](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/model-deployment/containers/llama2/README.md#model-store-export-api-for-creating-model-artifacts-greater-than-6-gb-in-size)), refer the section One time download to OCI Model Catalog.
57-
52+
Create a dummy model catalog entry to link it with a model deployment using any zip file.
5853
We would utilise the above created model in the next steps to create the Model Deployment.
5954

6055
### Create Model deploy
@@ -69,9 +64,13 @@ We would utilise the above created model in the next steps to create the Model D
6964
* Key: `SHM_SIZE`, Value: `10g`
7065
* Key: `NIM_MODEL_NAME`, Value: `/opt/ds/model/deployed_model`
7166
* Key: `OPENSSL_FORCE_FIPS_MODE`, Value: `0`
67+
* Key: `NGC_API_KEY`, Value: `<NGC KEY>`
68+
* Key: `WEB_CONCURRENCY`, Value: `1`
69+
* Key: `NCCL_CUMEM_ENABLE`, Value: `0`
70+
* Key: `NIM_MAX_MODEL_LEN`, Value: `4000` Note: Can be increased based on instance shape used
7271
* Under `Models` click on the `Select` button and select the Model Catalog entry we created earlier
7372
* Under `Compute` and then `Specialty and previous generation` select the `VM.GPU.A10.2` instance
74-
* Under `Networking` choose the `Default Networking` option.
73+
* Under `Networking` choose the `Custom Networking` option and bring the VCN and subnet, which allows Internet access.
7574
* Under `Logging` select the Log Group where you've created your predict and access log and select those correspondingly
7675
* Select the custom container option `Use a Custom Container Image` and click `Select`
7776
* Select the OCIR repository and image we pushed earlier
@@ -90,7 +89,7 @@ We would utilise the above created model in the next steps to create the Model D
9089
oci raw-request \
9190
--http-method POST \
9291
--target-uri <MODEL-DEPLOY-ENDPOINT> \
93-
--request-body '{"model": "/opt/ds/model/deployed_model", "messages": [ { "role":"user", "content":"Hello! How are you?" }, { "role":"assistant", "content":"Hi! I am quite well, how can I help you today?" }, { "role":"user", "content":"Can you write me a song?" } ], "top_p": 1, "n": 1, "max_tokens": 200, "stream": false, "frequency_penalty": 1.0, "stop": ["hello"] }' \
92+
--request-body '{"model": "mistral-nemo-12b-instruct", "messages": [ { "role":"user", "content":"Hello! How are you?" }, { "role":"assistant", "content":"Hi! I am quite well, how can I help you today?" }, { "role":"user", "content":"Can you write me a song?" } ], "top_p": 1, "n": 1, "max_tokens": 200, "stream": false, "frequency_penalty": 1.0, "stop": ["hello"] }' \
9493
--auth resource_principal
9594
```
9695

0 commit comments

Comments
 (0)