Skip to content

Commit cf7a711

Browse files
authored
[Ready to Merge] Added Docs for AQUA Stacked Deployments (#663)
* Update model-deployment-tips.md * Update multimodel-deployment-tips.md * Create stacked-deployment-tips.md
1 parent 32f5501 commit cf7a711

9 files changed

+994
-4
lines changed

ai-quick-actions/model-deployment-tips.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ Table of Contents:
99
- [Model Evaluation](evaluation-tips.md)
1010
- [Model Registration](register-tips.md)
1111
- [Multi Modal Inferencing](multimodal-models-tips.md)
12+
- [Multi Model Inferencing](multimodal-models-tips.md)
13+
- [Stacked Model Inferencing](stacked-deployment-tips.md)
1214
- [Private_Endpoints](model-deployment-private-endpoint-tips.md)
1315
- [Tool Calling](model-deployment-tool-calling-tips.md)
1416

@@ -918,4 +920,4 @@ Table of Contents:
918920
- [Model Registration](register-tips.md)
919921
- [Multi Modal Inferencing](multimodal-models-tips.md)
920922
- [Private_Endpoints](model-deployment-private-endpoint-tips.md)
921-
- [Tool Calling](model-deployment-tool-calling-tips.md)
923+
- [Tool Calling](model-deployment-tool-calling-tips.md)

ai-quick-actions/multimodel-deployment-tips.md

Lines changed: 154 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,8 @@ For fine-tuned models, requests specifying the base model name (ex. model: meta-
6363
- [CLI Output](#cli-output-3)
6464
- [Create Multi-Model (1 Embedding Model, 1 LLM) deployment with `/v1/completions`](#create-multi-model-1-embedding-model-1-llm-deployment-with-v1completions)
6565
- [Manage Multi-Model Deployments](#manage-multi-model-deployments)
66+
- [List Multi-Model Deployments](#list-multi-model-deployments)
67+
- [Edit Multi-Model Deployments](#edit-multi-model-deployments)
6668
- [Multi-Model Inferencing](#multi-model-inferencing)
6769
- [Using oci-cli](#using-oci-cli)
6870
- [Using Python SDK (without streaming)](#using-python-sdk-without-streaming)
@@ -101,16 +103,22 @@ Only Multi-Model Deployments with **base service LLM models (text-generation)**
101103

102104
### Select 'Deploy Multi Model'
103105
- Based on the 'models' field, a Compute Shape will be recommended to accomidate both models.
106+
- Select the 'Fine Tuned Weights'.
107+
- Only fine tuned model with version `V2` is allowed to be deployed as weights in Multi Deployment. For deploying old fine tuned model weight, run the following command to convert it to version `V2` and apply the new fine tuned model to the deployment creation. This command deletes the old fine tuned model by default after conversion but you can add ``--delete_model False`` to keep it instead.
108+
109+
```bash
110+
ads aqua model convert_fine_tune --model_id [FT_OCID]
111+
```
104112
- Select logging and endpoints (/v1/completions | /v1/chat/completions).
105113
- Submit form via 'Deploy Button' at bottom.
106-
![mmd-form](web_assets/deploy-mmd.png)
114+
![mmd-form](web_assets/deploy-multi.png)
107115

108116
### Inferencing with Multi-Model Deployment
109117

110118
There are two ways to send inference requests to models within a Multi-Model Deployment
111119

112120
1. Python SDK (recommended)- see [here](#Multi-Model-Inferencing)
113-
2. Using AQUA UI (see below, ok for testing)
121+
2. Using AQUA UI - see [here](#using-aqua-ui-interface-for-multi-model-deployment)
114122

115123
Once the Deployment is Active, view the model deployment details and inferencing form by clicking on the 'Deployments' Tab and selecting the model within the Model Deployment list.
116124

@@ -472,8 +480,13 @@ ads aqua deployment get_multimodel_deployment_config --model_ids '["ocid1.datasc
472480

473481
## 3. Create Multi-Model Deployment
474482

475-
Only **base service LLM models** are supported for MultiModel Deployment. All selected models will run on the same **GPU shape**, sharing the available compute resources. Make sure to choose a shape that meets the needs of all models in your deployment using [MultiModel Configuration command](#get-multimodel-configuration)
483+
All selected models will run on the same **GPU shape**, sharing the available compute resources. Make sure to choose a shape that meets the needs of all models in your deployment using [MultiModel Configuration command](#get-multimodel-configuration)
484+
485+
Only fine tuned model with version `V2` is allowed to be deployed as weights in Multi Deployment. For deploying old fine tuned model weight, run the following command to convert it to version `V2` and apply the new fine tuned model OCID to the deployment creation. This command deletes the old fine tuned model by default after conversion but you can add ``--delete_model False`` to keep it instead.
476486

487+
```bash
488+
ads aqua model convert_fine_tune --model_id [FT_OCID]
489+
```
477490

478491
### Description
479492

@@ -750,6 +763,144 @@ To list all AQUA deployments (both Multi-Model and single-model) within a specif
750763

751764
Note: Multi-Model deployments are identified by the tag `"aqua_multimodel": "true",` associated with them.
752765

766+
### Edit Multi-Model Deployments
767+
768+
AQUA deployment must be in `ACTIVE` state to be updated and can only be updated one at a time for the following option groups. There are two ways to update model deployment: `ZDT` and `LIVE`. The default update type for AQUA deployment is `ZDT` but `LIVE` will be adopted if `models` are changed in multi deployment.
769+
770+
- `Name or description`: Change the name or description.
771+
- `Default configuration`: Change or add freeform and defined tags.
772+
- `Models`: Change the model.
773+
- `Compute`: Change the number of CPUs or amount of memory for each CPU in gigabytes.
774+
- `Logging`: Change the logging configuration for access and predict logs.
775+
- `Load Balancer`: Change the load balancing bandwidth.
776+
777+
#### Usage
778+
779+
```bash
780+
ads aqua deployment update [OPTIONS]
781+
```
782+
783+
#### Required Parameters
784+
785+
`--model_deployment_id [str]`
786+
787+
The model deployment OCID to be updated.
788+
789+
#### Optional Parameters
790+
791+
`--models [str]`
792+
793+
The String representation of a JSON array, where each object defines a model’s OCID and the number of GPUs assigned to it. The gpu count should always be a **power of two (e.g., 1, 2, 4, 8)**. <br>
794+
Example: `'[{"model_id":"<model_ocid>", "gpu_count":1},{"model_id":"<model_ocid>", "gpu_count":1}]'` for `VM.GPU.A10.2` shape. <br>
795+
796+
`--display_name [str]`
797+
798+
The name of model deployment.
799+
800+
`--description [str]`
801+
802+
The description of the model deployment. Defaults to None.
803+
804+
`--instance_count [int]`
805+
806+
The number of instance used for model deployment. Defaults to 1.
807+
808+
`--log_group_id [str]`
809+
810+
The oci logging group id. The access log and predict log share the same log group.
811+
812+
`--access_log_id [str]`
813+
814+
The access log OCID for the access logs. Check [model deployment logging](https://docs.oracle.com/en-us/iaas/data-science/using/model_dep_using_logging.htm) for more details.
815+
816+
`--predict_log_id [str]`
817+
818+
The predict log OCID for the predict logs. Check [model deployment logging](https://docs.oracle.com/en-us/iaas/data-science/using/model_dep_using_logging.htm) for more details.
819+
820+
`--web_concurrency [int]`
821+
822+
The number of worker processes/threads to handle incoming requests.
823+
824+
`--bandwidth_mbps [int]`
825+
826+
The bandwidth limit on the load balancer in Mbps.
827+
828+
`--memory_in_gbs [float]`
829+
830+
Memory (in GB) for the selected shape.
831+
832+
`--ocpus [float]`
833+
834+
OCPU count for the selected shape.
835+
836+
`--freeform_tags [dict]`
837+
838+
Freeform tags for model deployment.
839+
840+
`--defined_tags [dict]`
841+
Defined tags for model deployment.
842+
843+
#### Example
844+
845+
##### Edit Multi-Model deployment with `/v1/completions`
846+
847+
```bash
848+
ads aqua deployment update \
849+
--model_deployment_id "ocid1.datasciencemodeldeployment.oc1.iad.<ocid>" \
850+
--models '[{"model_id":"ocid1.datasciencemodel.oc1.iad.<ocid>", "model_name":"test_updated_model_name", "gpu_count":2}]' \
851+
--display_name "updated_modelDeployment_multmodel_model1_model2"
852+
853+
```
854+
855+
##### CLI Output
856+
857+
```json
858+
{
859+
"id": "ocid1.datasciencemodeldeployment.oc1.iad.<ocid>",
860+
"display_name": "updated_modelDeployment_multmodel_model1_model2",
861+
"aqua_service_model": false,
862+
"model_id": "ocid1.datasciencemodelgroup.oc1.iad.<ocid>",
863+
"models": [
864+
{
865+
"model_id": "ocid1.datasciencemodel.oc1.iad.<ocid>",
866+
"model_name": "mistralai/Mistral-7B-v0.1",
867+
"gpu_count": 1,
868+
"env_var": {}
869+
},
870+
{
871+
"model_id": "ocid1.datasciencemodel.oc1.iad.<ocid>",
872+
"model_name": "tiiuae/falcon-7b",
873+
"gpu_count": 1,
874+
"env_var": {}
875+
}
876+
],
877+
"aqua_model_name": "",
878+
"state": "UPDATING",
879+
"description": null,
880+
"created_on": "2025-03-10 19:09:40.793000+00:00",
881+
"created_by": "ocid1.user.oc1..<ocid>",
882+
"endpoint": "https://modeldeployment.us-ashburn-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.iad.<ocid>",
883+
"private_endpoint_id": null,
884+
"console_link": "https://cloud.oracle.com/data-science/model-deployments/ocid1.datasciencemodeldeployment.oc1.iad.<ocid>",
885+
"lifecycle_details": null,
886+
"shape_info": {
887+
"instance_shape": "VM.GPU.A10.2",
888+
"instance_count": 1,
889+
"ocpus": null,
890+
"memory_in_gbs": null
891+
},
892+
"tags": {
893+
"aqua_model_id": "ocid1.datasciencemodelgroup.oc1.<ocid>",
894+
"aqua_multimodel": "true",
895+
"OCI_AQUA": "active"
896+
},
897+
"environment_variables": {
898+
"MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/chat/completions",
899+
"MODEL_DEPLOY_ENABLE_STREAMING": "true",
900+
},
901+
}
902+
```
903+
753904
# Multi-Model Inferencing
754905

755906
The only change required to infer a specific model from a Multi-Model deployment is to update the value of `"model"` parameter in the request payload. The values for this parameter can be found in the Model Deployment details, under the field name `"model_name"`. This parameter segregates the request flow, ensuring that the inference request is directed to the correct model within the MultiModel deployment.

0 commit comments

Comments
 (0)