Jessli/convert Fix bug (#43557)

YoYoJa · ebwinters · rads-1996 · web-flow · commit e856faac5e86 · 2025-10-20T21:59:15.000-07:00
* add eval result converter * Add result converter * update converter params to optional * add eval meta data * fix type * remove useless file * get eval meta data as input * fix build errors * remove useless import * resolve comments * update * update comments * fix checker failure * add error msg and error code * Surface evaluator error msg * update UT * fix usage * make eval_meta_data optional * remove useless lines * update param name to add underscore * parse updated annotation results * update trace_id * expose sample data for sdk evaluators * update * Fix column mapping bug for AOAI evaluators with custom data mapping (#43429) * fix nesting bug for custom data mapping * address comments * remove extra code and fix test case * run formatter * use dumps * Modify logic for message body on Microsoft.ApplicationInsights.MessageData to include default message for messages with empty body and export logs (#43091) * Modify logic in PR (#43060) to include default message for messages with empty body and export logs * Update CHANGELOG * Update logic as per updated spec * Addressed comments * Set-VcpkgWriteModeCache -- add token timeout param for cmake generate's that exceed 1 hour (this can happen in C++ API View) (#43470) Co-authored-by: Daniel Jurek <djurek@microsoft.com> * update * fix UT * fix tests * Added Tests and Samples for Paginated Queries (#43472) * added tests and samples for paginated queries * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * added single partition pagination sample --------- Co-authored-by: Andrew Mathew <andrewmathew@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Test Proxy] Support AARCH64 platform (#43428) * Delete doc/dev/how_to_request_a_feature_in_sdk.md (#43415) this doc is outdated * fix test * [AutoRelease] t2-iothub-2025-10-03-03336(can only be merged by SDK owner) (#43230) * code and test * update pyproject.toml --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> * [AutoRelease] t2-redisenterprise-2025-10-17-18412(can only be merged by SDK owner) (#43476) * code and test * update changelog * update changelog * Update CHANGELOG.md --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * Extend basic test for "project_client.agents" to do more operations (#43516) * Sync eng/common directory with azure-sdk-tools for PR 12478 (#43457) * Updated validate pkg template to use packageInfo * Fixed typo * Fixed the right variable to use * output debug log * Fixed errors in expression evaluation * removed debug code * Fixed an issue in pipeline * Updated condition for variable setting step * Join paths of the script path * Use join-path * return from the function rather than exit --------- Co-authored-by: ray chen <raychen@microsoft.com> * Reorder error and warning log line processing (#43456) Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com> * [App Configuration] - Release 1.7.2 (#43520) * release 1.7.2 * update change log * Modify CODEOWNERS for Azure SDK ownership changes (#43524) Updated CODEOWNERS to reflect new ownership for Azure SDK components. * Migrate Confidential Ledger library from swagger to typespec codegen (#42664) * regen * add default cert endpoint with tsp * remove refs to old namespace * update async operation patch * fix operations patch * fix header impl * more header fixes * revert receipt directory removal * cspell * regen certificates under correct namespace * regen ledger client * update namespace name * revert certificate change * update shared files after regen * updates * delete extra files * cspell * match return type to current behavior * cspell * mypy * pylint * update docs * regen * regen * fix patch * Revert "mypy" This reverts commit 6351ead. * add info in tsp_location.yaml * regen * update patch files * update patch files * fix patch * update patch files * regen * update tsp-location.yaml * generate certificate client * update patch files * fixes * regen clients * update pyproject.toml deps * update assets * regen * revert test change * nit * fix test input * regen with new model * update tests * update tests * apiview props * regen * update tests * update assets * apiview props * temp relative package updates * fix name * fix ledger ci (#43181) * remove swagger * remove extra configs * wip revert package dep temporarily * update readme * fix config files * Revert "wip revert package dep temporarily" This reverts commit db553c4. * move tests * add identity samples --------- Co-authored-by: catalinaperalta <caperal@microsoft.com> * rm certificate files * update changelog * misc fixes * update shared reqs * test * pylint --------- Co-authored-by: catalinaperalta <caperal@microsoft.com> * update scripts (#43527) Co-authored-by: helen229 <gaoh@microsoft.com> * [AutoPR azure-mgmt-mongocluster]-generated-from-SDK Generation - Python-5459673 (#43448) * Configurations: 'specification/mongocluster/resource-manager/Microsoft.DocumentDB/MongoCluster/tspconfig.yaml', API Version: 2025-09-01, SDK Release Type: stable, and CommitSHA: 'c5601446fc65494f18157aecbcc79cebcfbab1fb' in SpecRepo: 'https://github.com/Azure/azure-rest-api-specs' Pipeline run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=5459673 Refer to https://eng.ms/docs/products/azure-developer-experience/develop/sdk-release/sdk-release-prerequisites to prepare for SDK release. * update changelog --------- Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> * App Configuration Provider - Key Vault Refresh (#41882) * Sync refresh changes * Key Vault Refresh * adding tests and fixing sync refresh * Updating Async * Fixed Async Tests * Updated tests and change log * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fixing merge issue * Updating comments * Updating secret refresh * Update _azureappconfigurationproviderasync.py * Fixing Optional Endpoint * fix mypy issue * fixing async test * mixing merge * fixing test after merge * Update testcase.py * Secret Provider Base * removing unused imports * updating exception * updating resolve key vault references * Review comments * fixing tests * tox updates * Updating Tests * Updating Async to be the same as sync * Fixing formatting * fixing tox and unneeded "" * fixing tox items * fix cspell + tests recording * Update test_async_secret_provider.py * Post Merge updates * Move cache to shared code * removed unneeded disabled * Update Secret Provider * Updating usage * Update assets.json * Updated to make secret refresh update dictionary * removing _secret_version_cache * Update assets.json * Update _secret_provider_base.py --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Increment package version after release of azure-appconfiguration (#43531) * Patch `azure-template` back to `green` (#43533) * Update sdk/template/azure-template/pyproject.toml to use `repository` instead of `source` * added brackets for sql query keyword value (#43525) Co-authored-by: Andrew Mathew <andrewmathew@microsoft.com> * update changelog (#43532) Co-authored-by: catalinaperalta <caperal@microsoft.com> * App Config Provider - Provider Refactor (#43196) * Code Cleanup * Move validation to shared file * Updating Header Check * Update test_azureappconfigurationproviderbase.py * moved async tests to aio folder * post merge updates --------- Co-authored-by: Ethan Winters <etwinter@microsoft.com> Co-authored-by: rads-1996 <guptaradhika@microsoft.com> Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com> Co-authored-by: Daniel Jurek <djurek@microsoft.com> Co-authored-by: Andrew Mathew <80082032+andrewmathew1@users.noreply.github.com> Co-authored-by: Andrew Mathew <andrewmathew@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: McCoy Patiño <39780829+mccoyp@users.noreply.github.com> Co-authored-by: Yuchao Yan <yuchaoyan@microsoft.com> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> Co-authored-by: Darren Cohen <39422044+dargilco@users.noreply.github.com> Co-authored-by: ray chen <raychen@microsoft.com> Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com> Co-authored-by: Zhiyuan Liang <141655842+zhiyuanliang-ms@users.noreply.github.com> Co-authored-by: Matthew Metcalf <mrm9084@gmail.com> Co-authored-by: catalinaperalta <9859037+catalinaperalta@users.noreply.github.com> Co-authored-by: catalinaperalta <caperal@microsoft.com> Co-authored-by: helen229 <gaoh@microsoft.com> Co-authored-by: Scott Beddall <45376673+scbedd@users.noreply.github.com>
diff --git a/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py b/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py
@@ -1736,6 +1736,8 @@ def _convert_results_to_aoai_evaluation_results(
             criteria_type = criteria_name_types_from_meta[criteria_name].get("type", None)
             evaluator_name = criteria_name_types_from_meta[criteria_name].get("evaluator_name", None)
             if evaluator_name:
+                if criteria_type=="azure_ai_evaluator" and evaluator_name.startswith("builtin."):
+                    evaluator_name = evaluator_name.replace("builtin.", "")
                 metrics_mapped = _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS.get(evaluator_name, [])
                 if metrics_mapped and len(metrics_mapped) > 0:
                     metrics.extend(metrics_mapped)
@@ -1798,6 +1800,9 @@ def _convert_results_to_aoai_evaluation_results(
                         result_per_metric[metric] = {"score": metric_value}
                     else:
                         result_per_metric[metric]["score"] = metric_value
+                    _append_indirect_attachments_to_results(
+                        result_per_metric, "score", metric, metric_value
+                    )
                 elif metric_key.endswith("_result") or metric_key == "result" or metric_key.endswith("_label"):
                     metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
                     label = metric_value
@@ -1809,6 +1814,12 @@ def _convert_results_to_aoai_evaluation_results(
                     else:
                         result_per_metric[metric]["label"] = metric_value
                         result_per_metric[metric]["passed"] = passed
+                    _append_indirect_attachments_to_results(
+                        result_per_metric, "label", metric, label
+                    )
+                    _append_indirect_attachments_to_results(
+                        result_per_metric, "passed", metric, passed
+                    )
                 elif (
                     metric_key.endswith("_reason") and not metric_key.endswith("_finish_reason")
                 ) or metric_key == "reason":
@@ -1817,18 +1828,27 @@ def _convert_results_to_aoai_evaluation_results(
                         result_per_metric[metric] = {"reason": metric_value}
                     else:
                         result_per_metric[metric]["reason"] = metric_value
+                    _append_indirect_attachments_to_results(
+                        result_per_metric, "reason", metric, metric_value
+                    )
                 elif metric_key.endswith("_threshold") or metric_key == "threshold":
                     metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
                     if metric not in result_per_metric:
                         result_per_metric[metric] = {"threshold": metric_value}
                     else:
                         result_per_metric[metric]["threshold"] = metric_value
+                    _append_indirect_attachments_to_results(
+                        result_per_metric, "threshold", metric, metric_value
+                    )
                 elif metric_key == "sample":
                     metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
                     if metric not in result_per_metric:
                         result_per_metric[metric] = {"sample": metric_value}
                     else:
                         result_per_metric[metric]["sample"] = metric_value
+                    _append_indirect_attachments_to_results(
+                        result_per_metric, "sample", metric, metric_value
+                    )
                 elif metric_key.endswith("_finish_reason"):
                     metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
                     if metric not in result_per_metric:
@@ -1841,6 +1861,9 @@ def _convert_results_to_aoai_evaluation_results(
                         and "finish_reason" not in result_per_metric[metric]["sample"]
                     ):
                         result_per_metric[metric]["sample"]["finish_reason"] = metric_value
+                    _append_indirect_attachments_to_results(
+                        result_per_metric, "sample", metric, metric_value, "finish_reason"
+                    )
                 elif metric_key.endswith("_model"):
                     metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
                     if metric not in result_per_metric:
@@ -1853,6 +1876,9 @@ def _convert_results_to_aoai_evaluation_results(
                         and "model" not in result_per_metric[metric]["sample"]
                     ):
                         result_per_metric[metric]["sample"]["model"] = metric_value
+                    _append_indirect_attachments_to_results(
+                        result_per_metric, "sample", metric, metric_value, "model"
+                    )
                 elif metric_key.endswith("_sample_input"):
                     metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
                     input_metric_val_json: Optional[List[Dict[str, Any]]] = []
@@ -1870,6 +1896,9 @@ def _convert_results_to_aoai_evaluation_results(
                         and "input" not in result_per_metric[metric]["sample"]
                     ):
                         result_per_metric[metric]["sample"]["input"] = input_metric_val_json
+                    _append_indirect_attachments_to_results(
+                        result_per_metric, "sample", metric, input_metric_val_json, "input"
+                    )
                 elif metric_key.endswith("_sample_output"):
                     metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
                     output_metric_val_json: Optional[List[Dict[str, Any]]] = []
@@ -1887,6 +1916,9 @@ def _convert_results_to_aoai_evaluation_results(
                         and "output" not in result_per_metric[metric]["sample"]
                     ):
                         result_per_metric[metric]["sample"]["output"] = output_metric_val_json
+                    _append_indirect_attachments_to_results(
+                        result_per_metric, "sample", metric, output_metric_val_json, "output"
+                    )
                 elif metric_key.endswith("_total_tokens"):
                     metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
                     if metric not in result_per_metric:
@@ -1901,6 +1933,9 @@ def _convert_results_to_aoai_evaluation_results(
                         result_per_metric[metric]["sample"]["usage"] = {"total_tokens": metric_value}
                     else:
                         result_per_metric[metric]["sample"]["usage"]["total_tokens"] = metric_value
+                    _append_indirect_attachments_to_results(
+                        result_per_metric, "sample", metric, metric_value, "usage", "total_tokens"
+                    )
                 elif metric_key.endswith("_prompt_tokens"):
                     metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
                     if metric not in result_per_metric:
@@ -1915,6 +1950,9 @@ def _convert_results_to_aoai_evaluation_results(
                         result_per_metric[metric]["sample"]["usage"] = {"prompt_tokens": metric_value}
                     else:
                         result_per_metric[metric]["sample"]["usage"]["prompt_tokens"] = metric_value
+                    _append_indirect_attachments_to_results(
+                        result_per_metric, "sample", metric, metric_value, "usage", "prompt_tokens"
+                    )
                 elif metric_key.endswith("_completion_tokens"):
                     metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
                     if metric not in result_per_metric:
@@ -1929,6 +1967,9 @@ def _convert_results_to_aoai_evaluation_results(
                         result_per_metric[metric]["sample"]["usage"] = {"completion_tokens": metric_value}
                     else:
                         result_per_metric[metric]["sample"]["usage"]["completion_tokens"] = metric_value
+                    _append_indirect_attachments_to_results(
+                        result_per_metric, "sample", metric, metric_value, "usage", "completion_tokens"
+                    )
                 elif not any(
                     metric_key.endswith(suffix)
                     for suffix in [
@@ -1970,6 +2011,18 @@ def _convert_results_to_aoai_evaluation_results(
                     "metric": metric if metric is not None else criteria_name,  # Use criteria name as metric
                 }
                 # Add optional fields
+                if(metric in _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS["indirect_attack"]
+                   or metric in _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS["code_vulnerability"]
+                   or metric in _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS["protected_material"]):
+                    copy_label = label
+                    if copy_label is not None and isinstance(copy_label, bool) and copy_label == True:
+                        label = "fail"
+                        score = 0.0
+                        passed = False
+                    else:
+                        label = "pass"
+                        score = 1.0
+                        passed = True
                 result_obj["score"] = score
                 result_obj["label"] = label
                 result_obj["reason"] = reason
@@ -2043,6 +2096,65 @@ def _convert_results_to_aoai_evaluation_results(
         f"Summary statistics calculated for {len(converted_rows)} rows, eval_id: {eval_id}, eval_run_id: {eval_run_id}"
     )
 
+def _append_indirect_attachments_to_results(current_result_dict: Dict[str, Any], 
+                                            result_name: str, 
+                                            metric: str, 
+                                            metric_value: Any, 
+                                            nested_result_name: Optional[str] = None, 
+                                            secondnested_result_name: Optional[str] = None) -> None:
+    """
+    Append indirect attachments to the current result dictionary.
+
+    :param current_result_dict: The current result dictionary to update
+    :type current_result_dict: Dict[str, Any]
+    :param result_name: The result name
+    :type result_name: str
+    :param metric: The metric name
+    :type metric: str
+    :param metric_value: The value of the metric
+    :type metric_value: Any
+    """
+    if metric == "xpia" and result_name:
+        for metric_extended in ["xpia_manipulated_content", "xpia_intrusion", "xpia_information_gathering"]:
+            if nested_result_name is None:
+                if metric_extended not in current_result_dict:
+                    current_result_dict[metric_extended] = { result_name: metric_value }
+                else:
+                    current_result_dict[metric_extended][result_name] = metric_value
+            elif nested_result_name is not None and secondnested_result_name is None:
+                if metric_extended not in current_result_dict:
+                    current_result_dict[metric_extended] = {result_name : {nested_result_name: metric_value}}
+                elif (metric_extended in current_result_dict 
+                      and result_name not in current_result_dict[metric_extended]
+                ):
+                    current_result_dict[metric_extended][result_name] = {nested_result_name: metric_value}
+                elif (
+                    metric_extended in current_result_dict
+                    and result_name in current_result_dict[metric_extended]
+                    and nested_result_name not in current_result_dict[metric_extended][result_name]
+                ):
+                    current_result_dict[metric_extended][result_name][nested_result_name] = metric_value
+            elif nested_result_name is not None and secondnested_result_name is not None:
+                if metric_extended not in current_result_dict:
+                    current_result_dict[metric_extended] = {
+                        result_name: {nested_result_name: {secondnested_result_name: metric_value}}
+                    }
+                elif (metric_extended in current_result_dict 
+                      and result_name not in current_result_dict[metric_extended]
+                ):
+                    current_result_dict[metric_extended][result_name] = {
+                        nested_result_name: {secondnested_result_name: metric_value}
+                    }
+                elif (
+                    metric_extended in current_result_dict
+                    and result_name in current_result_dict[metric_extended]
+                    and nested_result_name not in current_result_dict[metric_extended][result_name]
+                ):
+                    current_result_dict[metric_extended][result_name][nested_result_name] = {
+                        secondnested_result_name: metric_value
+                    }
+                else:
+                    current_result_dict[metric_extended][result_name][nested_result_name][secondnested_result_name] = metric_value
 
 def _get_metric_from_criteria(testing_criteria_name: str, metric_key: str, metric_list: List[str]) -> str:
     """
@@ -2058,6 +2170,16 @@ def _get_metric_from_criteria(testing_criteria_name: str, metric_key: str, metri
     :rtype: str
     """
     metric = None
+
+    if metric_key == "xpia_manipulated_content":
+        metric = "xpia_manipulated_content"
+        return metric
+    elif metric_key == "xpia_intrusion":
+        metric = "xpia_intrusion"
+        return metric
+    elif metric_key == "xpia_information_gathering":
+        metric = "xpia_information_gathering"
+        return metric
     for expected_metric in metric_list:
         if metric_key.startswith(expected_metric):
             metric = expected_metric
@@ -2124,9 +2246,12 @@ def _calculate_aoai_evaluation_summary(aoai_results: list, logger: logging.Logge
 
         # Extract usage statistics from aoai_result.sample
         sample_data_list = []
+        dup_usage_list = _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS["indirect_attack"].copy()
+        dup_usage_list.remove("xpia")
         if isinstance(aoai_result, dict) and aoai_result["results"] and isinstance(aoai_result["results"], list):
             for result_item in aoai_result["results"]:
-                if isinstance(result_item, dict) and "sample" in result_item and result_item["sample"]:
+                if (isinstance(result_item, dict) and "sample" in result_item and result_item["sample"]
+                    and result_item["metric"] not in dup_usage_list):
                     sample_data_list.append(result_item["sample"])
 
         for sample_data in sample_data_list: