diff --git a/ai-quick-actions/model-deployment-tips.md b/ai-quick-actions/model-deployment-tips.md
index 90547d3e..05a6b875 100644
--- a/ai-quick-actions/model-deployment-tips.md
+++ b/ai-quick-actions/model-deployment-tips.md
@@ -9,6 +9,8 @@ Table of Contents:
- [Model Evaluation](evaluation-tips.md)
- [Model Registration](register-tips.md)
- [Multi Modal Inferencing](multimodal-models-tips.md)
+- [Multi Model Inferencing](multimodal-models-tips.md)
+- [Stacked Model Inferencing](stacked-deployment-tips.md)
- [Private_Endpoints](model-deployment-private-endpoint-tips.md)
- [Tool Calling](model-deployment-tool-calling-tips.md)
@@ -918,4 +920,4 @@ Table of Contents:
- [Model Registration](register-tips.md)
- [Multi Modal Inferencing](multimodal-models-tips.md)
- [Private_Endpoints](model-deployment-private-endpoint-tips.md)
-- [Tool Calling](model-deployment-tool-calling-tips.md)
\ No newline at end of file
+- [Tool Calling](model-deployment-tool-calling-tips.md)
diff --git a/ai-quick-actions/multimodel-deployment-tips.md b/ai-quick-actions/multimodel-deployment-tips.md
index 183afc03..38e415ad 100644
--- a/ai-quick-actions/multimodel-deployment-tips.md
+++ b/ai-quick-actions/multimodel-deployment-tips.md
@@ -63,6 +63,8 @@ For fine-tuned models, requests specifying the base model name (ex. model: meta-
- [CLI Output](#cli-output-3)
- [Create Multi-Model (1 Embedding Model, 1 LLM) deployment with `/v1/completions`](#create-multi-model-1-embedding-model-1-llm-deployment-with-v1completions)
- [Manage Multi-Model Deployments](#manage-multi-model-deployments)
+ - [List Multi-Model Deployments](#list-multi-model-deployments)
+ - [Edit Multi-Model Deployments](#edit-multi-model-deployments)
- [Multi-Model Inferencing](#multi-model-inferencing)
- [Using oci-cli](#using-oci-cli)
- [Using Python SDK (without streaming)](#using-python-sdk-without-streaming)
@@ -101,16 +103,22 @@ Only Multi-Model Deployments with **base service LLM models (text-generation)**
### Select 'Deploy Multi Model'
- Based on the 'models' field, a Compute Shape will be recommended to accomidate both models.
+- Select the 'Fine Tuned Weights'.
+ - Only fine tuned model with version `V2` is allowed to be deployed as weights in Multi Deployment. For deploying old fine tuned model weight, run the following command to convert it to version `V2` and apply the new fine tuned model to the deployment creation. This command deletes the old fine tuned model by default after conversion but you can add ``--delete_model False`` to keep it instead.
+
+ ```bash
+ ads aqua model convert_fine_tune --model_id [FT_OCID]
+ ```
- Select logging and endpoints (/v1/completions | /v1/chat/completions).
- Submit form via 'Deploy Button' at bottom.
-
+
### Inferencing with Multi-Model Deployment
There are two ways to send inference requests to models within a Multi-Model Deployment
1. Python SDK (recommended)- see [here](#Multi-Model-Inferencing)
-2. Using AQUA UI (see below, ok for testing)
+2. Using AQUA UI - see [here](#using-aqua-ui-interface-for-multi-model-deployment)
Once the Deployment is Active, view the model deployment details and inferencing form by clicking on the 'Deployments' Tab and selecting the model within the Model Deployment list.
@@ -472,8 +480,13 @@ ads aqua deployment get_multimodel_deployment_config --model_ids '["ocid1.datasc
## 3. Create Multi-Model Deployment
-Only **base service LLM models** are supported for MultiModel Deployment. All selected models will run on the same **GPU shape**, sharing the available compute resources. Make sure to choose a shape that meets the needs of all models in your deployment using [MultiModel Configuration command](#get-multimodel-configuration)
+All selected models will run on the same **GPU shape**, sharing the available compute resources. Make sure to choose a shape that meets the needs of all models in your deployment using [MultiModel Configuration command](#get-multimodel-configuration)
+
+Only fine tuned model with version `V2` is allowed to be deployed as weights in Multi Deployment. For deploying old fine tuned model weight, run the following command to convert it to version `V2` and apply the new fine tuned model OCID to the deployment creation. This command deletes the old fine tuned model by default after conversion but you can add ``--delete_model False`` to keep it instead.
+```bash
+ads aqua model convert_fine_tune --model_id [FT_OCID]
+```
### Description
@@ -750,6 +763,144 @@ To list all AQUA deployments (both Multi-Model and single-model) within a specif
Note: Multi-Model deployments are identified by the tag `"aqua_multimodel": "true",` associated with them.
+### Edit Multi-Model Deployments
+
+AQUA deployment must be in `ACTIVE` state to be updated and can only be updated one at a time for the following option groups. There are two ways to update model deployment: `ZDT` and `LIVE`. The default update type for AQUA deployment is `ZDT` but `LIVE` will be adopted if `models` are changed in multi deployment.
+
+ - `Name or description`: Change the name or description.
+ - `Default configuration`: Change or add freeform and defined tags.
+ - `Models`: Change the model.
+ - `Compute`: Change the number of CPUs or amount of memory for each CPU in gigabytes.
+ - `Logging`: Change the logging configuration for access and predict logs.
+ - `Load Balancer`: Change the load balancing bandwidth.
+
+#### Usage
+
+```bash
+ads aqua deployment update [OPTIONS]
+```
+
+#### Required Parameters
+
+`--model_deployment_id [str]`
+
+The model deployment OCID to be updated.
+
+#### Optional Parameters
+
+`--models [str]`
+
+The String representation of a JSON array, where each object defines a model’s OCID and the number of GPUs assigned to it. The gpu count should always be a **power of two (e.g., 1, 2, 4, 8)**.
+Example: `'[{"model_id":"", "gpu_count":1},{"model_id":"", "gpu_count":1}]'` for `VM.GPU.A10.2` shape.
+
+`--display_name [str]`
+
+The name of model deployment.
+
+`--description [str]`
+
+The description of the model deployment. Defaults to None.
+
+`--instance_count [int]`
+
+The number of instance used for model deployment. Defaults to 1.
+
+`--log_group_id [str]`
+
+The oci logging group id. The access log and predict log share the same log group.
+
+`--access_log_id [str]`
+
+The access log OCID for the access logs. Check [model deployment logging](https://docs.oracle.com/en-us/iaas/data-science/using/model_dep_using_logging.htm) for more details.
+
+`--predict_log_id [str]`
+
+The predict log OCID for the predict logs. Check [model deployment logging](https://docs.oracle.com/en-us/iaas/data-science/using/model_dep_using_logging.htm) for more details.
+
+`--web_concurrency [int]`
+
+The number of worker processes/threads to handle incoming requests.
+
+`--bandwidth_mbps [int]`
+
+The bandwidth limit on the load balancer in Mbps.
+
+`--memory_in_gbs [float]`
+
+Memory (in GB) for the selected shape.
+
+`--ocpus [float]`
+
+OCPU count for the selected shape.
+
+`--freeform_tags [dict]`
+
+Freeform tags for model deployment.
+
+`--defined_tags [dict]`
+Defined tags for model deployment.
+
+#### Example
+
+##### Edit Multi-Model deployment with `/v1/completions`
+
+```bash
+ads aqua deployment update \
+ --model_deployment_id "ocid1.datasciencemodeldeployment.oc1.iad." \
+ --models '[{"model_id":"ocid1.datasciencemodel.oc1.iad.", "model_name":"test_updated_model_name", "gpu_count":2}]' \
+ --display_name "updated_modelDeployment_multmodel_model1_model2"
+
+```
+
+##### CLI Output
+
+```json
+{
+ "id": "ocid1.datasciencemodeldeployment.oc1.iad.",
+ "display_name": "updated_modelDeployment_multmodel_model1_model2",
+ "aqua_service_model": false,
+ "model_id": "ocid1.datasciencemodelgroup.oc1.iad.",
+ "models": [
+ {
+ "model_id": "ocid1.datasciencemodel.oc1.iad.",
+ "model_name": "mistralai/Mistral-7B-v0.1",
+ "gpu_count": 1,
+ "env_var": {}
+ },
+ {
+ "model_id": "ocid1.datasciencemodel.oc1.iad.",
+ "model_name": "tiiuae/falcon-7b",
+ "gpu_count": 1,
+ "env_var": {}
+ }
+ ],
+ "aqua_model_name": "",
+ "state": "UPDATING",
+ "description": null,
+ "created_on": "2025-03-10 19:09:40.793000+00:00",
+ "created_by": "ocid1.user.oc1..",
+ "endpoint": "https://modeldeployment.us-ashburn-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.iad.",
+ "private_endpoint_id": null,
+ "console_link": "https://cloud.oracle.com/data-science/model-deployments/ocid1.datasciencemodeldeployment.oc1.iad.",
+ "lifecycle_details": null,
+ "shape_info": {
+ "instance_shape": "VM.GPU.A10.2",
+ "instance_count": 1,
+ "ocpus": null,
+ "memory_in_gbs": null
+ },
+ "tags": {
+ "aqua_model_id": "ocid1.datasciencemodelgroup.oc1.",
+ "aqua_multimodel": "true",
+ "OCI_AQUA": "active"
+ },
+ "environment_variables": {
+ "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/chat/completions",
+ "MODEL_DEPLOY_ENABLE_STREAMING": "true",
+ },
+}
+```
+
# Multi-Model Inferencing
The only change required to infer a specific model from a Multi-Model deployment is to update the value of `"model"` parameter in the request payload. The values for this parameter can be found in the Model Deployment details, under the field name `"model_name"`. This parameter segregates the request flow, ensuring that the inference request is directed to the correct model within the MultiModel deployment.
diff --git a/ai-quick-actions/stacked-deployment-tips.md b/ai-quick-actions/stacked-deployment-tips.md
new file mode 100644
index 00000000..0e13ccee
--- /dev/null
+++ b/ai-quick-actions/stacked-deployment-tips.md
@@ -0,0 +1,837 @@
+# **AI Quick Actions Stacked Deployment**
+
+# Table of Contents
+- # Introduction to Stacked Deployment and Serving
+- [Models](#models)
+ - [Fine Tuned Models](#fine-tuned-models)
+- [Stacked Deployment](#stacked-deployment)
+ - [Create Stacked Deployment via AQUA UI](#create-stacked-deployment-via-aqua-ui)
+ - [Create Stacked Deployment via ADS CLI](#create-stacked-deployment-via-ads-cli)
+ - [Manage Stacked Deployments](#manage-stacked-deployments)
+ - [List Stacked Deployments](#list-stacked-deployments)
+ - [Edit Stacked Deployments](#edit-stacked-deployments)
+- [Stacked Model Inferencing](#stacked-model-inferencing)
+- [Stacked Model Evaluation](#stacked-model-evaluation)
+ - [Create Model Evaluations](#create-model-evaluations)
+
+# Introduction to Stacked Deployment and Serving
+
+Stacked Model Deployment enables deploying a base model alongside multiple fine-tuned weights within the same deployment. During inference, responses can be generated using either the base model or the associated fine-tuned weights, depending on the request. The Data Science server has prebuilt **vLLM service container** that make deploying and serving stacked large language model very easy, simplifying the deployment process and reducing operational complexity. This container comes with **VLLM's native routing** which routes requests to the appropriate model, ensuring seamless prediction.
+
+This document provides documentation on how to create stacked deployment using AI Quick Actions (AQUA) model deployments, and evaluate the models.
+
+# Models
+
+First step in process is to get the OCIDs of the desired base service LLM AQUA models, which are required to initiate the stacked deployment process. Refer to [AQUA CLI tips](cli-tips.md) for detailed instructions on how to obtain the OCIDs of base service LLM AQUA models.
+
+You can also obtain the OCID from the AQUA user interface by clicking on the model card and selecting the `Copy OCID` button from the `More Options` dropdown in the top-right corner of the screen.
+
+## Fine Tuned Models
+
+Only fine tuned model with version `V2` is allowed to be deployed as weights in Stacked Deployment. For deploying old fine tuned model weight, run the following command to convert it to version `V2` and apply the new fine tuned model OCID to the deployment creation. This command deletes the old fine tuned model by default after conversion but you can add ``--delete_model False`` to keep it instead.
+
+```bash
+ads aqua model convert_fine_tune --model_id [FT_OCID]
+```
+
+If fine tuned model `V2` is deployed as single deployment, AQUA will fetch its base model, attach it as weight and deploy them as stack deployment instead.
+
+# Stacked Deployment
+
+## Create Stacked Deployment via AQUA UI
+
+### Create Stack Deployment
+
+Open AQUA UI and navigate to the `Deployments` tab. Click `Create Deployment` on the upper right and you should see the following page. Select `Deploy Model Stack` and select the service model and its corresponding fine tuned weights. You can customize the inference keys for each service and fine tuned model.
+
+
+
+### Compute Shape
+
+The compute shape selection is critical, the list available is selected to be suitable for the
+chosen model.
+
+- VM.GPU.A10.1 has 24GB of GPU memory and 240GB of CPU memory. The limiting factor is usually the
+GPU memory which needs to be big enough to hold the model.
+- VM.GPU.A10.2 has 48GB GPU memory
+- BM.GPU.A10.4 has 96GB GPU memory and runs on a bare metal machine, rather than a VM.
+
+For a full list of shapes and their definitions see the [compute shape docs](https://docs.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm)
+
+The relationship between model parameter size and GPU memory is roughly 2x parameter count in GB, so for example a model that has 7B parameters will need a minimum of 14 GB for inference. At runtime the
+memory is used for both holding the weights, along with the concurrent contexts for the user's requests.
+
+### Advanced Options
+
+You may click on the "Show Advanced Options" to configure options for "inference container".
+
+
+
+### Inference Container Configuration
+
+The service allows for model deployment configuration to be overridden when creating a model deployment. Depending on
+the type of inference container used for deployment, i.e. vLLM or TGI, the parameters vary and need to be passed with the format
+`(--param-name, param-value)`.
+
+For more details, please visit [vLLM](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#command-line-arguments-for-the-server) documentation to know more about the parameters accepted by the respective containers.
+
+## Create Stacked Deployment via ADS CLI
+
+### Description
+
+You'll need the latest version of ADS to create a new Aqua Stacked deployment. Installation instructions are available [here](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/cli/quickstart.html).
+
+### Usage
+
+```bash
+ads aqua deployment create [OPTIONS]
+```
+
+### Required Parameters
+
+`--models [str]`
+
+The String representation of a JSON array, where each object defines a model OCID, model name and its associating fine tuned weights. The model names are used to reference specific models during inference requests and support a [maximum length of 32 characters](https://docs.oracle.com/en-us/iaas/Content/data-science/using/models-mms-top.htm#models-mms-key-concepts). Model OCID will be used for inferencing if no model name is provided. Only **one** base model is allowed for creating stacked deployment
+Example: `'[{"model_id":"", "model_name":"", "fine_tune_weights": [{"model_id": "", "model_name":""},{"model_id":"", "model_name": ""}]}]'` for `VM.GPU.A10.2` shape.
+
+
+`--instance_shape [str]`
+
+The shape (GPU) of the instance used for model deployment.
+Example: `VM.GPU.A10.2, BM.GPU.A10.4, BM.GPU4.8, BM.GPU.A100-v2.8`.
+
+`--display_name [str]`
+
+The name of model deployment.
+
+`--container_image_uri [str]`
+
+The URI of the inference container associated with the model being registered. In case of Stacked, the value is vLLM container URI.
+Example: `dsmc://odsc-vllm-serving:0.6.4.post1.2` or `dsmc://odsc-vllm-serving:0.8.1.2`
+
+`--deployment_type [str]`
+
+The deployment type for creating model deployment. In case of Stacked, the value must be `STACKED`. Failing to provide `--deployment_type` will result in creating multi model deployment instead.
+
+### Optional Parameters
+
+`--compartment_id [str]`
+
+The compartment OCID where model deployment is to be created. If not provided, then it defaults to user's compartment.
+
+`--project_id [str]`
+
+The project OCID where model deployment is to be created. If not provided, then it defaults to user's project.
+
+`--description [str]`
+
+The description of the model deployment. Defaults to None.
+
+`--instance_count [int]`
+
+The number of instance used for model deployment. Defaults to 1.
+
+`--log_group_id [str]`
+
+The oci logging group id. The access log and predict log share the same log group.
+
+`--access_log_id [str]`
+
+The access log OCID for the access logs. Check [model deployment logging](https://docs.oracle.com/en-us/iaas/data-science/using/model_dep_using_logging.htm) for more details.
+
+`--predict_log_id [str]`
+
+The predict log OCID for the predict logs. Check [model deployment logging](https://docs.oracle.com/en-us/iaas/data-science/using/model_dep_using_logging.htm) for more details.
+
+`--web_concurrency [int]`
+
+The number of worker processes/threads to handle incoming requests.
+
+`--server_port [int]`
+
+The server port for docker container image. Defaults to 8080.
+
+`--health_check_port [int]`
+
+The health check port for docker container image. Defaults to 8080.
+
+`--env_var [dict]`
+
+Environment variable for the model deployment, defaults to None.
+
+`--private_endpoint_id [str]`
+
+The private endpoint id of model deployment.
+
+### Example
+
+#### Create Stacked deployment with `/v1/completions`
+
+```bash
+ads aqua deployment create \
+ --container_image_uri "dsmc://odsc-vllm-serving:0.6.4.post1.2" \
+ --models '[{"model_id":"ocid1.datasciencemodel.oc1.iad.", "model_name":"test_model_name", "fine_tune_weights": [{"model_id": "ocid1.datasciencemodel.oc1.iad.", "model_name":"test_ft_name_one"},{"model_id":"ocid1.datasciencemodel.oc1.iad.", "model_name": "test_ft_name_two"}]}]' \
+ --instance_shape "VM.GPU.A10.1" \
+ --display_name "modelDeployment_stacked_model"
+ --deployment_type "STACKED"
+
+```
+
+##### CLI Output
+
+```json
+{
+ "id": "ocid1.datasciencemodeldeployment.oc1.iad.",
+ "display_name": "modelDeployment_stacked_model",
+ "aqua_service_model": false,
+ "model_id": "ocid1.datasciencemodelgroup.oc1.iad.",
+ "models": [],
+ "aqua_model_name": "meta-llama/Meta-Llama-3.1-8B-Instruct",
+ "state": "CREATING",
+ "description": null,
+ "created_on": "2025-10-13 17:48:53.416000+00:00",
+ "created_by": "ocid1.user.oc1..",
+ "endpoint": "https://modeldeployment.us-ashburn-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.iad.",
+ "private_endpoint_id": null,
+ "console_link": "https://cloud.oracle.com/data-science/model-deployments/ocid1.datasciencemodeldeployment.oc1.iad.",
+ "lifecycle_details": null,
+ "shape_info": {
+ "instance_shape": "VM.GPU.A10.1",
+ "instance_count": 1,
+ "ocpus": null,
+ "memory_in_gbs": null
+ },
+ "tags": {
+ "task": "text_generation",
+ "aqua_model_name": "meta-llama/Meta-Llama-3.1-8B-Instruct",
+ "OCI_AQUA": "active"
+ },
+ "environment_variables": {
+ "BASE_MODEL": "service_models/Meta-Llama-3.1-8B-Instruct/5206a32/artifact",
+ "VLLM_ALLOW_RUNTIME_LORA_UPDATING": "true",
+ "MODEL": "/opt/ds/model/deployed_model/ocid1.datasciencemodel.oc1.iad./",
+ "PARAMS": "--served-model-name test_model_name --disable-custom-all-reduce --seed 42 --max-model-len 4096 --max-lora-rank 32 --enable_lora",
+ "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/completions",
+ "MODEL_DEPLOY_ENABLE_STREAMING": "true",
+ "PORT": "8080",
+ "HEALTH_CHECK_PORT": "8080",
+ "AQUA_TELEMETRY_BUCKET_NS": "ociodscdev",
+ "AQUA_TELEMETRY_BUCKET": "service-managed-models"
+ },
+ "cmd": []
+}
+```
+
+#### Create Stacked deployment with `/v1/chat/completions`
+
+```bash
+ads aqua deployment create \
+ --container_image_uri "dsmc://odsc-vllm-serving:0.6.4.post1.2" \
+ --models '[{"model_id":"ocid1.datasciencemodel.oc1.iad.", "model_name":"test_model_name", "fine_tune_weights": [{"model_id": "ocid1.datasciencemodel.oc1.iad.", "model_name":"test_ft_name_one"},{"model_id":"ocid1.datasciencemodel.oc1.iad.", "model_name": "test_ft_name_two"}]}]' \
+ --env-var '{"MODEL_DEPLOY_PREDICT_ENDPOINT":"/v1/chat/completions"}' \
+ --instance_shape "VM.GPU.A10.1" \
+ --display_name "modelDeployment_stacked_model"
+ --deployment_type "STACKED"
+
+```
+
+##### CLI Output
+
+```json
+{
+ "id": "ocid1.datasciencemodeldeployment.oc1.iad.",
+ "display_name": "modelDeployment_stacked_model",
+ "aqua_service_model": false,
+ "model_id": "ocid1.datasciencemodelgroup.oc1.iad.",
+ "models": [],
+ "aqua_model_name": "meta-llama/Meta-Llama-3.1-8B-Instruct",
+ "state": "CREATING",
+ "description": null,
+ "created_on": "2025-10-13 17:48:53.416000+00:00",
+ "created_by": "ocid1.user.oc1..",
+ "endpoint": "https://modeldeployment.us-ashburn-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.iad.",
+ "private_endpoint_id": null,
+ "console_link": "https://cloud.oracle.com/data-science/model-deployments/ocid1.datasciencemodeldeployment.oc1.iad.",
+ "lifecycle_details": null,
+ "shape_info": {
+ "instance_shape": "VM.GPU.A10.1",
+ "instance_count": 1,
+ "ocpus": null,
+ "memory_in_gbs": null
+ },
+ "tags": {
+ "task": "text_generation",
+ "aqua_model_name": "meta-llama/Meta-Llama-3.1-8B-Instruct",
+ "OCI_AQUA": "active"
+ },
+ "environment_variables": {
+ "BASE_MODEL": "service_models/Meta-Llama-3.1-8B-Instruct/5206a32/artifact",
+ "VLLM_ALLOW_RUNTIME_LORA_UPDATING": "true",
+ "MODEL": "/opt/ds/model/deployed_model/ocid1.datasciencemodel.oc1.iad./",
+ "PARAMS": "--served-model-name test_model_name --disable-custom-all-reduce --seed 42 --max-model-len 4096 --max-lora-rank 32 --enable_lora",
+ "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/chat/completions",
+ "MODEL_DEPLOY_ENABLE_STREAMING": "true",
+ "PORT": "8080",
+ "HEALTH_CHECK_PORT": "8080",
+ "AQUA_TELEMETRY_BUCKET_NS": "ociodscdev",
+ "AQUA_TELEMETRY_BUCKET": "service-managed-models"
+ },
+ "cmd": []
+}
+```
+
+## Manage Stacked Deployments
+
+### List Stacked Deployments
+
+To list all AQUA deployments (all Stacked, MultiModel and single-model) within a specified compartment or project, or to get detailed information on a specific Stacked deployment, kindly refer to the [AQUA CLI tips](cli-tips.md) documentation.
+
+Note: Stacked deployments are identified by the tag `"aqua_stacked_model": "true",` associated with them.
+
+### Edit Stacked Deployments
+
+AQUA deployment must be in `ACTIVE` state to be updated and can only be updated one at a time for the following option groups. There are two ways to update model deployment: `ZDT` and `LIVE`. The default update type for AQUA deployment is `ZDT` but `LIVE` will be adopted if `models` are changed in stacked deployment.
+
+ - `Name or description`: Change the name or description.
+ - `Default configuration`: Change or add freeform and defined tags.
+ - `Models`: Change the model.
+ - `Compute`: Change the number of CPUs or amount of memory for each CPU in gigabytes.
+ - `Logging`: Change the logging configuration for access and predict logs.
+ - `Load Balancer`: Change the load balancing bandwidth.
+
+#### Usage
+
+```bash
+ads aqua deployment update [OPTIONS]
+```
+
+#### Required Parameters
+
+`--model_deployment_id [str]`
+
+The model deployment OCID to be updated.
+
+#### Optional Parameters
+
+`--models [str]`
+
+The String representation of a JSON array, where each object defines a model OCID, model name and its associating fine tuned weights. The model names are used to reference specific models during inference requests and support a [maximum length of 32 characters](https://docs.oracle.com/en-us/iaas/Content/data-science/using/models-mms-top.htm#models-mms-key-concepts). Only **one** base model is allowed for updating stacked deployment
+Example: `'[{"model_id":"", "model_name":"", "fine_tune_weights": [{"model_id": "", "model_name":""},{"model_id":"", "model_name": ""}]}]'` for `VM.GPU.A10.2` shape.
+
+`--display_name [str]`
+
+The name of model deployment.
+
+`--description [str]`
+
+The description of the model deployment. Defaults to None.
+
+`--instance_count [int]`
+
+The number of instance used for model deployment. Defaults to 1.
+
+`--log_group_id [str]`
+
+The oci logging group id. The access log and predict log share the same log group.
+
+`--access_log_id [str]`
+
+The access log OCID for the access logs. Check [model deployment logging](https://docs.oracle.com/en-us/iaas/data-science/using/model_dep_using_logging.htm) for more details.
+
+`--predict_log_id [str]`
+
+The predict log OCID for the predict logs. Check [model deployment logging](https://docs.oracle.com/en-us/iaas/data-science/using/model_dep_using_logging.htm) for more details.
+
+`--web_concurrency [int]`
+
+The number of worker processes/threads to handle incoming requests.
+
+`--bandwidth_mbps [int]`
+
+The bandwidth limit on the load balancer in Mbps.
+
+`--memory_in_gbs [float]`
+
+Memory (in GB) for the selected shape.
+
+`--ocpus [float]`
+
+OCPU count for the selected shape.
+
+`--freeform_tags [dict]`
+
+Freeform tags for model deployment.
+
+`--defined_tags [dict]`
+Defined tags for model deployment.
+
+#### Example
+
+##### Edit Stacked deployment with `/v1/completions`
+
+```bash
+ads aqua deployment update \
+ --model_deployment_id "ocid1.datasciencemodeldeployment.oc1.iad." \
+ --models '[{"model_id":"ocid1.datasciencemodel.oc1.iad.", "model_name":"test_updated_model_name"}]' \
+ --display_name "updated_modelDeployment_stacked_model"
+
+```
+
+##### CLI Output
+
+```json
+{
+ "id": "ocid1.datasciencemodeldeployment.oc1.iad.",
+ "display_name": "updated_modelDeployment_stacked_model",
+ "aqua_service_model": false,
+ "model_id": "ocid1.datasciencemodelgroup.oc1.iad.",
+ "models": [],
+ "aqua_model_name": "meta-llama/Meta-Llama-3.1-8B-Instruct",
+ "state": "UPDATING",
+ "description": null,
+ "created_on": "2025-10-13 17:48:53.416000+00:00",
+ "created_by": "ocid1.user.oc1..",
+ "endpoint": "https://modeldeployment.us-ashburn-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.iad.",
+ "private_endpoint_id": null,
+ "console_link": "https://cloud.oracle.com/data-science/model-deployments/ocid1.datasciencemodeldeployment.oc1.iad.",
+ "lifecycle_details": null,
+ "shape_info": {
+ "instance_shape": "VM.GPU.A10.1",
+ "instance_count": 1,
+ "ocpus": null,
+ "memory_in_gbs": null
+ },
+ "tags": {
+ "task": "text_generation",
+ "aqua_model_name": "meta-llama/Meta-Llama-3.1-8B-Instruct",
+ "OCI_AQUA": "active"
+ },
+ "environment_variables": {
+ "BASE_MODEL": "service_models/Meta-Llama-3.1-8B-Instruct/5206a32/artifact",
+ "VLLM_ALLOW_RUNTIME_LORA_UPDATING": "true",
+ "MODEL": "/opt/ds/model/deployed_model/ocid1.datasciencemodel.oc1.iad./",
+ "PARAMS": "--served-model-name test_updated_model_name --disable-custom-all-reduce --seed 42 --max-model-len 4096 --max-lora-rank 32 --enable_lora",
+ "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/completions",
+ "MODEL_DEPLOY_ENABLE_STREAMING": "true",
+ "PORT": "8080",
+ "HEALTH_CHECK_PORT": "8080",
+ "AQUA_TELEMETRY_BUCKET_NS": "ociodscdev",
+ "AQUA_TELEMETRY_BUCKET": "service-managed-models"
+ },
+ "cmd": []
+}
+```
+
+# Stacked Model Inferencing
+
+The only change required to infer a specific model from a Stacked deployment is to update the value of `"model"` parameter in the request payload. The values for this parameter can be found in the Model Deployment details, under the field name `"model_name"`. This parameter segregates the request flow, ensuring that the inference request is directed to the correct model within the Stacked deployment.
+
+## Using AQUA UI
+
+
+
+## Using oci-cli
+
+```bash
+oci raw-request \
+ --http-method POST \
+ --target-uri /predict \
+ --request-body '{
+ "model": "",
+ "prompt": "what are activation functions?",
+ "max_tokens": 250,
+ "temperature": 0.7,
+ "top_p": 0.8
+ }' \
+ --auth
+
+```
+
+Note: Currently `oci-cli` does not support streaming response, use Python or Java SDK instead.
+
+## Using Python SDK (without streaming)
+
+```python
+# The OCI SDK must be installed for this example to function properly.
+# Installation instructions can be found here: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/pythonsdk.htm
+
+import requests
+import oci
+from oci.signer import Signer
+from oci.config import from_file
+
+config = from_file('~/.oci/config')
+auth = Signer(
+ tenancy=config['tenancy'],
+ user=config['user'],
+ fingerprint=config['fingerprint'],
+ private_key_file_location=config['key_file'],
+ pass_phrase=config['pass_phrase']
+)
+
+# For security token based authentication
+# token_file = config['security_token_file']
+# token = None
+# with open(token_file, 'r') as f:
+# token = f.read()
+# private_key = oci.signer.load_private_key_from_file(config['key_file'])
+# auth = oci.auth.signers.SecurityTokenSigner(token, private_key)
+
+model = ""
+
+endpoint = "https://modeldeployment.us-ashburn-1.oci.oc-test.com/ocid1.datasciencemodeldeployment.oc1.iad.xxxxxxxxx/predict"
+body = {
+ "model": model, # this is a constant
+ "prompt": "what are activation functions?",
+ "max_tokens": 250,
+ "temperature": 0.7,
+ "top_p": 0.8,
+}
+
+res = requests.post(endpoint, json=body, auth=auth, headers={}).json()
+
+print(res)
+```
+
+## Using Python SDK (with streaming)
+
+To consume streaming Server-sent Events (SSE), install [sseclient-py](https://pypi.org/project/sseclient-py/) using `pip install sseclient-py`.
+
+```python
+# The OCI SDK must be installed for this example to function properly.
+# Installation instructions can be found here: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/pythonsdk.htm
+
+import requests
+import oci
+from oci.signer import Signer
+from oci.config import from_file
+import sseclient # pip install sseclient-py
+
+config = from_file('~/.oci/config')
+auth = Signer(
+ tenancy=config['tenancy'],
+ user=config['user'],
+ fingerprint=config['fingerprint'],
+ private_key_file_location=config['key_file'],
+ pass_phrase=config['pass_phrase']
+)
+
+# For security token based authentication
+# token_file = config['security_token_file']
+# token = None
+# with open(token_file, 'r') as f:
+# token = f.read()
+# private_key = oci.signer.load_private_key_from_file(config['key_file'])
+# auth = oci.auth.signers.SecurityTokenSigner(token, private_key)
+
+model = ""
+
+endpoint = "https://modeldeployment.us-ashburn-1.oci.oc-test.com/ocid1.datasciencemodeldeployment.oc1.iad.xxxxxxxxx/predict"
+body = {
+ "model": model, # this is a constant
+ "prompt": "what are activation functions?",
+ "max_tokens": 250,
+ "temperature": 0.7,
+ "top_p": 0.8,
+ "stream": True,
+}
+
+headers={'Content-Type':'application/json','enable-streaming':'true', 'Accept': 'text/event-stream'}
+response = requests.post(endpoint, json=body, auth=auth, stream=True, headers=headers)
+
+print(response.headers)
+
+client = sseclient.SSEClient(response)
+for event in client.events():
+ print(event.data)
+
+# Alternatively, we can use the below code to print the response.
+# for line in response.iter_lines():
+# if line:
+# print(line)
+```
+
+## Using Python SDK for /v1/chat/completions endpoint
+
+To access the model deployed with `/v1/chat/completions` endpoint for inference, update the body and replace `prompt` field
+with `messages`.
+
+```python
+...
+body = {
+ "model": "", # this is a constant
+ "messages":[{"role":"user","content":[{"type":"text","text":"Who wrote the book Harry Potter?"}]}],
+ "max_tokens": 250,
+ "temperature": 0.7,
+ "top_p": 0.8,
+}
+...
+```
+
+## Using Java (with streaming)
+
+```java
+/**
+ * The OCI SDK must be installed for this example to function properly.
+ * Installation instructions can be found here: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/javasdk.htm
+ */
+package org.example;
+
+import com.oracle.bmc.auth.AuthenticationDetailsProvider;
+import com.oracle.bmc.auth.SessionTokenAuthenticationDetailsProvider;
+import com.oracle.bmc.http.ClientConfigurator;
+import com.oracle.bmc.http.Priorities;
+import com.oracle.bmc.http.client.HttpClient;
+import com.oracle.bmc.http.client.HttpClientBuilder;
+import com.oracle.bmc.http.client.HttpRequest;
+import com.oracle.bmc.http.client.HttpResponse;
+import com.oracle.bmc.http.client.Method;
+import com.oracle.bmc.http.client.jersey.JerseyHttpProvider;
+import com.oracle.bmc.http.client.jersey.sse.SseSupport;
+import com.oracle.bmc.http.internal.ParamEncoder;
+import com.oracle.bmc.http.signing.RequestSigningFilter;
+
+import javax.ws.rs.core.MediaType;
+import java.io.BufferedReader;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.net.URI;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Function;
+
+public class RestExample {
+
+ public static void main(String[] args) throws Exception {
+ String configurationFilePath = "~/.oci/config";
+ String profile = "DEFAULT";
+
+ // Pre-Requirement: Allow setting of restricted headers. This is required to allow the SigningFilter
+ // to set the host header that gets computed during signing of the request.
+ System.setProperty("sun.net.http.allowRestrictedHeaders", "true");
+
+ final AuthenticationDetailsProvider provider =
+ new SessionTokenAuthenticationDetailsProvider(configurationFilePath, profile);
+
+ // 1) Create a request signing filter instance using SessionTokenAuth Provider.
+ RequestSigningFilter requestSigningFilter = RequestSigningFilter.fromAuthProvider(
+ provider);
+
+ // 1) Alternatively, RequestSigningFilter can be created from a config file.
+ // RequestSigningFilter requestSigningFilter = RequestSigningFilter.fromConfigFile(configurationFilePath, profile);
+
+ // 2) Create a Jersey client and register the request signing filter.
+ // Refer to this page https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/javasdkexamples.htm for information regarding the compatibility of the HTTP client(s) with OCI SDK version.
+
+ HttpClientBuilder builder = JerseyHttpProvider.getInstance()
+ .newBuilder()
+ .registerRequestInterceptor(Priorities.AUTHENTICATION, requestSigningFilter)
+ .baseUri(
+ URI.create(
+ "${modelDeployment.modelDeploymentUrl}/")
+ + ParamEncoder.encodePathParam("predict"));
+ // 3) Create a request and set the expected type header.
+
+ String jsonPayload = "{}"; // Add payload here with respect to your model example shown in next line:
+
+ // 4) Setup Streaming request
+ Function> generateTextResultReader = getInputStreamListFunction();
+ SseSupport sseSupport = new SseSupport(generateTextResultReader);
+ ClientConfigurator clientConfigurator = sseSupport.getClientConfigurator();
+ clientConfigurator.customizeClient(builder);
+
+ try (HttpClient client = builder.build()) {
+ HttpRequest request = client
+ .createRequest(Method.POST)
+ .header("accepts", MediaType.APPLICATION_JSON)
+ .header("content-type", MediaType.APPLICATION_JSON)
+ .header("enable-streaming", "true")
+ .body(jsonPayload);
+
+ // 5) Invoke the call and get the response.
+ HttpResponse response = request.execute().toCompletableFuture().get();
+
+ // 6) Print the response headers and body
+ Map> responseHeaders = response.headers();
+ System.out.println("HTTP Headers " + responseHeaders);
+
+ InputStream responseBody = response.streamBody().toCompletableFuture().get();
+ try (
+ final BufferedReader reader = new BufferedReader(
+ new InputStreamReader(responseBody, StandardCharsets.UTF_8)
+ )
+ ) {
+ String line;
+ while ((line = reader.readLine()) != null) {
+ System.out.println(line);
+ }
+ }
+ } catch (Exception ex) {
+ throw ex;
+ }
+ }
+
+ private static Function> getInputStreamListFunction() {
+ Function> generateTextResultReader = entityStream -> {
+ try (BufferedReader reader =
+ new BufferedReader(new InputStreamReader(entityStream))) {
+ String line;
+ List generatedTextList = new ArrayList<>();
+ while ((line = reader.readLine()) != null) {
+ if (line.isEmpty() || line.startsWith(":")) {
+ continue;
+ }
+ generatedTextList.add(line);
+ }
+ return generatedTextList;
+ } catch (Exception ex) {
+ throw new RuntimeException(ex);
+ }
+ };
+ return generateTextResultReader;
+ }
+}
+
+```
+
+# Stacked Model Evaluation
+
+## Create Model Evaluations
+
+### Description
+
+Creates a new evaluation model using an existing Aqua Stacked deployment. For Stacked deployment, evaluations must be created separately for each model using the same model deployment OCID.
+
+### Usage
+
+```bash
+ads aqua evaluation create [OPTIONS]
+```
+
+### Required Parameters
+
+`--evaluation_source_id [str]`
+
+The evaluation source id. Must be Stacked deployment OCID.
+
+`--evaluation_name [str]`
+
+The name for evaluation.
+
+`--dataset_path [str]`
+
+The dataset path for the evaluation. Must be an object storage path.
+Example: `oci://@/path/to/the/dataset.jsonl`
+
+`--report_path [str]`
+
+The report path for the evaluation. Must be an object storage path.
+Example: `oci://@/report/path/`
+
+`--model_parameters [str]`
+
+The parameters for the evaluation. The `"model"` is required evaluation param in case of Stacked deployment.
+
+`--shape_name [str]`
+
+The shape name for the evaluation job infrastructure.
+Example: `VM.Standard.E3.Flex, VM.Standard.E4.Flex, VM.Standard3.Flex, VM.Optimized3.Flex`.
+
+`--block_storage_size [int]`
+
+The storage for the evaluation job infrastructure.
+
+### Optional Parameters
+
+`--compartment_id [str]`
+
+The compartment OCID where evaluation is to be created. If not provided, then it defaults to user's compartment.
+
+`--project_id [str]`
+
+The project OCID where evaluation is to be created. If not provided, then it defaults to user's project.
+
+`--evaluation_description [str]`
+
+The description of the evaluation. Defaults to None.
+
+`--memory_in_gbs [float]`
+
+The memory in gbs for the flexible shape selected.
+
+`--ocpus [float]`
+
+The ocpu count for the shape selected.
+
+`--experiment_id [str]`
+
+The evaluation model version set id. If provided, evaluation model will be associated with it. Defaults to None.
+
+`--experiment_name [str]`
+
+The evaluation model version set name. If provided, the model version set with the same name will be used if exists, otherwise a new model version set will be created with the name.
+
+`--experiment_description [str]`
+
+The description for the evaluation model version set.
+
+`--log_group_id [str]`
+
+The log group id for the evaluation job infrastructure. Defaults to None.
+
+`--log_id [str]`
+
+The log id for the evaluation job infrastructure. Defaults to None.
+
+`--metrics [list]`
+
+The metrics for the evaluation, currently BERTScore and ROGUE are supported.
+Example: `'[{"name": "bertscore", "args": {}}, {"name": "rouge", "args": {}}]`
+
+`--force_overwrite [bool]`
+
+A flag to indicate whether to force overwrite the existing evaluation file in object storage if already present. Defaults to `False`.
+
+### Example
+
+```bash
+ads aqua evaluation create \
+ --evaluation_source_id "ocid1.datasciencemodeldeployment.oc1.iad." \
+ --evaluation_name "test_evaluation" \
+ --dataset_path "oci://@/path/to/the/dataset.jsonl" \
+ --report_path "oci://@/report/path/" \
+ --model_parameters '{"model":"","max_tokens": 500, "temperature": 0.7, "top_p": 1.0, "top_k": 50}' \
+ --shape_name "VM.Standard.E4.Flex" \
+ --block_storage_size 50 \
+ --metrics '[{"name": "bertscore", "args": {}}, {"name": "rouge", "args": {}}]'
+```
+
+#### CLI Output
+
+```json
+{
+ "id": "ocid1.datasciencemodeldeployment.oc1.iad.",
+ "name": "test_evaluation",
+ "aqua_service_model": true,
+ "state": "CREATING",
+ "description": null,
+ "created_on": "2024-02-03 21:21:31.952000+00:00",
+ "created_by": "ocid1.user.oc1..",
+ "endpoint": "https://modeldeployment.us-ashburn-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.iad.",
+ "console_link": "https://cloud.oracle.com/data-science/model-deployments/ocid1.datasciencemodeldeployment.oc1.iad.?region=us-ashburn-1",
+ "shape_info": {
+ "instance_shape": "VM.Standard.E4.Flex",
+ "instance_count": 1,
+ "ocpus": 1.0,
+ "memory_in_gbs": 16.0
+ },
+ "tags": {
+ "aqua_service_model": "ocid1.datasciencemodel.oc1.iad.#Mistral-7B-v0.1",
+ "OCI_AQUA": ""
+ }
+}
+```
+
+For other operations related to **Evaluation**, such as listing evaluations and retrieving evaluation details, please refer to [AQUA CLI tips](cli-tips.md)
diff --git a/ai-quick-actions/web_assets/deploy-multi-model-advanced-options.png b/ai-quick-actions/web_assets/deploy-multi-model-advanced-options.png
new file mode 100644
index 00000000..e7217ea7
Binary files /dev/null and b/ai-quick-actions/web_assets/deploy-multi-model-advanced-options.png differ
diff --git a/ai-quick-actions/web_assets/deploy-multi.png b/ai-quick-actions/web_assets/deploy-multi.png
new file mode 100644
index 00000000..3268dfd9
Binary files /dev/null and b/ai-quick-actions/web_assets/deploy-multi.png differ
diff --git a/ai-quick-actions/web_assets/deploy-stack-model-advanced-options.png b/ai-quick-actions/web_assets/deploy-stack-model-advanced-options.png
new file mode 100644
index 00000000..083bfab0
Binary files /dev/null and b/ai-quick-actions/web_assets/deploy-stack-model-advanced-options.png differ
diff --git a/ai-quick-actions/web_assets/deploy-stack.png b/ai-quick-actions/web_assets/deploy-stack.png
new file mode 100644
index 00000000..25093133
Binary files /dev/null and b/ai-quick-actions/web_assets/deploy-stack.png differ
diff --git a/ai-quick-actions/web_assets/try-multi-model.png b/ai-quick-actions/web_assets/try-multi-model.png
new file mode 100644
index 00000000..dd36f231
Binary files /dev/null and b/ai-quick-actions/web_assets/try-multi-model.png differ
diff --git a/ai-quick-actions/web_assets/try-stack-model.png b/ai-quick-actions/web_assets/try-stack-model.png
new file mode 100644
index 00000000..d5bff24a
Binary files /dev/null and b/ai-quick-actions/web_assets/try-stack-model.png differ