Skip to content

Commit 0c9c911

Browse files
YoYoJaebwintersrads-1996azure-sdkdanieljurek
authored
Jessli/convert1021 Fxi bug (#43563)
* add eval result converter * Add result converter * update converter params to optional * add eval meta data * fix type * remove useless file * get eval meta data as input * fix build errors * remove useless import * resolve comments * update * update comments * fix checker failure * add error msg and error code * Surface evaluator error msg * update UT * fix usage * make eval_meta_data optional * remove useless lines * update param name to add underscore * parse updated annotation results * update trace_id * expose sample data for sdk evaluators * update * update * fix UT * fix tests * fix test * Jessli/convert (#43556) merge main * add eval result converter * Add result converter * update converter params to optional * add eval meta data * fix type * remove useless file * get eval meta data as input * fix build errors * remove useless import * resolve comments * update * update comments * fix checker failure * add error msg and error code * Surface evaluator error msg * update UT * fix usage * make eval_meta_data optional * remove useless lines * update param name to add underscore * parse updated annotation results * update trace_id * expose sample data for sdk evaluators * update * Fix column mapping bug for AOAI evaluators with custom data mapping (#43429) * fix nesting bug for custom data mapping * address comments * remove extra code and fix test case * run formatter * use dumps * Modify logic for message body on Microsoft.ApplicationInsights.MessageData to include default message for messages with empty body and export logs (#43091) * Modify logic in PR (#43060) to include default message for messages with empty body and export logs * Update CHANGELOG * Update logic as per updated spec * Addressed comments * Set-VcpkgWriteModeCache -- add token timeout param for cmake generate's that exceed 1 hour (this can happen in C++ API View) (#43470) Co-authored-by: Daniel Jurek <djurek@microsoft.com> * update * fix UT * fix tests * Added Tests and Samples for Paginated Queries (#43472) * added tests and samples for paginated queries * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * added single partition pagination sample --------- Co-authored-by: Andrew Mathew <andrewmathew@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Test Proxy] Support AARCH64 platform (#43428) * Delete doc/dev/how_to_request_a_feature_in_sdk.md (#43415) this doc is outdated * fix test * [AutoRelease] t2-iothub-2025-10-03-03336(can only be merged by SDK owner) (#43230) * code and test * update pyproject.toml --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> * [AutoRelease] t2-redisenterprise-2025-10-17-18412(can only be merged by SDK owner) (#43476) * code and test * update changelog * update changelog * Update CHANGELOG.md --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * Extend basic test for "project_client.agents" to do more operations (#43516) * Sync eng/common directory with azure-sdk-tools for PR 12478 (#43457) * Updated validate pkg template to use packageInfo * Fixed typo * Fixed the right variable to use * output debug log * Fixed errors in expression evaluation * removed debug code * Fixed an issue in pipeline * Updated condition for variable setting step * Join paths of the script path * Use join-path * return from the function rather than exit --------- Co-authored-by: ray chen <raychen@microsoft.com> * Reorder error and warning log line processing (#43456) Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com> * [App Configuration] - Release 1.7.2 (#43520) * release 1.7.2 * update change log * Modify CODEOWNERS for Azure SDK ownership changes (#43524) Updated CODEOWNERS to reflect new ownership for Azure SDK components. * Migrate Confidential Ledger library from swagger to typespec codegen (#42664) * regen * add default cert endpoint with tsp * remove refs to old namespace * update async operation patch * fix operations patch * fix header impl * more header fixes * revert receipt directory removal * cspell * regen certificates under correct namespace * regen ledger client * update namespace name * revert certificate change * update shared files after regen * updates * delete extra files * cspell * match return type to current behavior * cspell * mypy * pylint * update docs * regen * regen * fix patch * Revert "mypy" This reverts commit 6351ead. * add info in tsp_location.yaml * regen * update patch files * update patch files * fix patch * update patch files * regen * update tsp-location.yaml * generate certificate client * update patch files * fixes * regen clients * update pyproject.toml deps * update assets * regen * revert test change * nit * fix test input * regen with new model * update tests * update tests * apiview props * regen * update tests * update assets * apiview props * temp relative package updates * fix name * fix ledger ci (#43181) * remove swagger * remove extra configs * wip revert package dep temporarily * update readme * fix config files * Revert "wip revert package dep temporarily" This reverts commit db553c4. * move tests * add identity samples --------- Co-authored-by: catalinaperalta <caperal@microsoft.com> * rm certificate files * update changelog * misc fixes * update shared reqs * test * pylint --------- Co-authored-by: catalinaperalta <caperal@microsoft.com> * update scripts (#43527) Co-authored-by: helen229 <gaoh@microsoft.com> * [AutoPR azure-mgmt-mongocluster]-generated-from-SDK Generation - Python-5459673 (#43448) * Configurations: 'specification/mongocluster/resource-manager/Microsoft.DocumentDB/MongoCluster/tspconfig.yaml', API Version: 2025-09-01, SDK Release Type: stable, and CommitSHA: 'c5601446fc65494f18157aecbcc79cebcfbab1fb' in SpecRepo: 'https://github.com/Azure/azure-rest-api-specs' Pipeline run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=5459673 Refer to https://eng.ms/docs/products/azure-developer-experience/develop/sdk-release/sdk-release-prerequisites to prepare for SDK release. * update changelog --------- Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> * App Configuration Provider - Key Vault Refresh (#41882) * Sync refresh changes * Key Vault Refresh * adding tests and fixing sync refresh * Updating Async * Fixed Async Tests * Updated tests and change log * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fixing merge issue * Updating comments * Updating secret refresh * Update _azureappconfigurationproviderasync.py * Fixing Optional Endpoint * fix mypy issue * fixing async test * mixing merge * fixing test after merge * Update testcase.py * Secret Provider Base * removing unused imports * updating exception * updating resolve key vault references * Review comments * fixing tests * tox updates * Updating Tests * Updating Async to be the same as sync * Fixing formatting * fixing tox and unneeded "" * fixing tox items * fix cspell + tests recording * Update test_async_secret_provider.py * Post Merge updates * Move cache to shared code * removed unneeded disabled * Update Secret Provider * Updating usage * Update assets.json * Updated to make secret refresh update dictionary * removing _secret_version_cache * Update assets.json * Update _secret_provider_base.py --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Increment package version after release of azure-appconfiguration (#43531) * Patch `azure-template` back to `green` (#43533) * Update sdk/template/azure-template/pyproject.toml to use `repository` instead of `source` * added brackets for sql query keyword value (#43525) Co-authored-by: Andrew Mathew <andrewmathew@microsoft.com> * update changelog (#43532) Co-authored-by: catalinaperalta <caperal@microsoft.com> * App Config Provider - Provider Refactor (#43196) * Code Cleanup * Move validation to shared file * Updating Header Check * Update test_azureappconfigurationproviderbase.py * moved async tests to aio folder * post merge updates --------- Co-authored-by: Ethan Winters <etwinter@microsoft.com> Co-authored-by: rads-1996 <guptaradhika@microsoft.com> Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com> Co-authored-by: Daniel Jurek <djurek@microsoft.com> Co-authored-by: Andrew Mathew <80082032+andrewmathew1@users.noreply.github.com> Co-authored-by: Andrew Mathew <andrewmathew@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: McCoy Patiño <39780829+mccoyp@users.noreply.github.com> Co-authored-by: Yuchao Yan <yuchaoyan@microsoft.com> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> Co-authored-by: Darren Cohen <39422044+dargilco@users.noreply.github.com> Co-authored-by: ray chen <raychen@microsoft.com> Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com> Co-authored-by: Zhiyuan Liang <141655842+zhiyuanliang-ms@users.noreply.github.com> Co-authored-by: Matthew Metcalf <mrm9084@gmail.com> Co-authored-by: catalinaperalta <9859037+catalinaperalta@users.noreply.github.com> Co-authored-by: catalinaperalta <caperal@microsoft.com> Co-authored-by: helen229 <gaoh@microsoft.com> Co-authored-by: Scott Beddall <45376673+scbedd@users.noreply.github.com> * Jessli/convert Fix bug (#43557) * add eval result converter * Add result converter * update converter params to optional * add eval meta data * fix type * remove useless file * get eval meta data as input * fix build errors * remove useless import * resolve comments * update * update comments * fix checker failure * add error msg and error code * Surface evaluator error msg * update UT * fix usage * make eval_meta_data optional * remove useless lines * update param name to add underscore * parse updated annotation results * update trace_id * expose sample data for sdk evaluators * update * Fix column mapping bug for AOAI evaluators with custom data mapping (#43429) * fix nesting bug for custom data mapping * address comments * remove extra code and fix test case * run formatter * use dumps * Modify logic for message body on Microsoft.ApplicationInsights.MessageData to include default message for messages with empty body and export logs (#43091) * Modify logic in PR (#43060) to include default message for messages with empty body and export logs * Update CHANGELOG * Update logic as per updated spec * Addressed comments * Set-VcpkgWriteModeCache -- add token timeout param for cmake generate's that exceed 1 hour (this can happen in C++ API View) (#43470) Co-authored-by: Daniel Jurek <djurek@microsoft.com> * update * fix UT * fix tests * Added Tests and Samples for Paginated Queries (#43472) * added tests and samples for paginated queries * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * added single partition pagination sample --------- Co-authored-by: Andrew Mathew <andrewmathew@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Test Proxy] Support AARCH64 platform (#43428) * Delete doc/dev/how_to_request_a_feature_in_sdk.md (#43415) this doc is outdated * fix test * [AutoRelease] t2-iothub-2025-10-03-03336(can only be merged by SDK owner) (#43230) * code and test * update pyproject.toml --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> * [AutoRelease] t2-redisenterprise-2025-10-17-18412(can only be merged by SDK owner) (#43476) * code and test * update changelog * update changelog * Update CHANGELOG.md --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * Extend basic test for "project_client.agents" to do more operations (#43516) * Sync eng/common directory with azure-sdk-tools for PR 12478 (#43457) * Updated validate pkg template to use packageInfo * Fixed typo * Fixed the right variable to use * output debug log * Fixed errors in expression evaluation * removed debug code * Fixed an issue in pipeline * Updated condition for variable setting step * Join paths of the script path * Use join-path * return from the function rather than exit --------- Co-authored-by: ray chen <raychen@microsoft.com> * Reorder error and warning log line processing (#43456) Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com> * [App Configuration] - Release 1.7.2 (#43520) * release 1.7.2 * update change log * Modify CODEOWNERS for Azure SDK ownership changes (#43524) Updated CODEOWNERS to reflect new ownership for Azure SDK components. * Migrate Confidential Ledger library from swagger to typespec codegen (#42664) * regen * add default cert endpoint with tsp * remove refs to old namespace * update async operation patch * fix operations patch * fix header impl * more header fixes * revert receipt directory removal * cspell * regen certificates under correct namespace * regen ledger client * update namespace name * revert certificate change * update shared files after regen * updates * delete extra files * cspell * match return type to current behavior * cspell * mypy * pylint * update docs * regen * regen * fix patch * Revert "mypy" This reverts commit 6351ead. * add info in tsp_location.yaml * regen * update patch files * update patch files * fix patch * update patch files * regen * update tsp-location.yaml * generate certificate client * update patch files * fixes * regen clients * update pyproject.toml deps * update assets * regen * revert test change * nit * fix test input * regen with new model * update tests * update tests * apiview props * regen * update tests * update assets * apiview props * temp relative package updates * fix name * fix ledger ci (#43181) * remove swagger * remove extra configs * wip revert package dep temporarily * update readme * fix config files * Revert "wip revert package dep temporarily" This reverts commit db553c4. * move tests * add identity samples --------- Co-authored-by: catalinaperalta <caperal@microsoft.com> * rm certificate files * update changelog * misc fixes * update shared reqs * test * pylint --------- Co-authored-by: catalinaperalta <caperal@microsoft.com> * update scripts (#43527) Co-authored-by: helen229 <gaoh@microsoft.com> * [AutoPR azure-mgmt-mongocluster]-generated-from-SDK Generation - Python-5459673 (#43448) * Configurations: 'specification/mongocluster/resource-manager/Microsoft.DocumentDB/MongoCluster/tspconfig.yaml', API Version: 2025-09-01, SDK Release Type: stable, and CommitSHA: 'c5601446fc65494f18157aecbcc79cebcfbab1fb' in SpecRepo: 'https://github.com/Azure/azure-rest-api-specs' Pipeline run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=5459673 Refer to https://eng.ms/docs/products/azure-developer-experience/develop/sdk-release/sdk-release-prerequisites to prepare for SDK release. * update changelog --------- Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> * App Configuration Provider - Key Vault Refresh (#41882) * Sync refresh changes * Key Vault Refresh * adding tests and fixing sync refresh * Updating Async * Fixed Async Tests * Updated tests and change log * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fixing merge issue * Updating comments * Updating secret refresh * Update _azureappconfigurationproviderasync.py * Fixing Optional Endpoint * fix mypy issue * fixing async test * mixing merge * fixing test after merge * Update testcase.py * Secret Provider Base * removing unused imports * updating exception * updating resolve key vault references * Review comments * fixing tests * tox updates * Updating Tests * Updating Async to be the same as sync * Fixing formatting * fixing tox and unneeded "" * fixing tox items * fix cspell + tests recording * Update test_async_secret_provider.py * Post Merge updates * Move cache to shared code * removed unneeded disabled * Update Secret Provider * Updating usage * Update assets.json * Updated to make secret refresh update dictionary * removing _secret_version_cache * Update assets.json * Update _secret_provider_base.py --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Increment package version after release of azure-appconfiguration (#43531) * Patch `azure-template` back to `green` (#43533) * Update sdk/template/azure-template/pyproject.toml to use `repository` instead of `source` * added brackets for sql query keyword value (#43525) Co-authored-by: Andrew Mathew <andrewmathew@microsoft.com> * update changelog (#43532) Co-authored-by: catalinaperalta <caperal@microsoft.com> * App Config Provider - Provider Refactor (#43196) * Code Cleanup * Move validation to shared file * Updating Header Check * Update test_azureappconfigurationproviderbase.py * moved async tests to aio folder * post merge updates --------- Co-authored-by: Ethan Winters <etwinter@microsoft.com> Co-authored-by: rads-1996 <guptaradhika@microsoft.com> Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com> Co-authored-by: Daniel Jurek <djurek@microsoft.com> Co-authored-by: Andrew Mathew <80082032+andrewmathew1@users.noreply.github.com> Co-authored-by: Andrew Mathew <andrewmathew@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: McCoy Patiño <39780829+mccoyp@users.noreply.github.com> Co-authored-by: Yuchao Yan <yuchaoyan@microsoft.com> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> Co-authored-by: Darren Cohen <39422044+dargilco@users.noreply.github.com> Co-authored-by: ray chen <raychen@microsoft.com> Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com> Co-authored-by: Zhiyuan Liang <141655842+zhiyuanliang-ms@users.noreply.github.com> Co-authored-by: Matthew Metcalf <mrm9084@gmail.com> Co-authored-by: catalinaperalta <9859037+catalinaperalta@users.noreply.github.com> Co-authored-by: catalinaperalta <caperal@microsoft.com> Co-authored-by: helen229 <gaoh@microsoft.com> Co-authored-by: Scott Beddall <45376673+scbedd@users.noreply.github.com> * fix bug --------- Co-authored-by: Ethan Winters <etwinter@microsoft.com> Co-authored-by: rads-1996 <guptaradhika@microsoft.com> Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com> Co-authored-by: Daniel Jurek <djurek@microsoft.com> Co-authored-by: Andrew Mathew <80082032+andrewmathew1@users.noreply.github.com> Co-authored-by: Andrew Mathew <andrewmathew@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: McCoy Patiño <39780829+mccoyp@users.noreply.github.com> Co-authored-by: Yuchao Yan <yuchaoyan@microsoft.com> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> Co-authored-by: Darren Cohen <39422044+dargilco@users.noreply.github.com> Co-authored-by: ray chen <raychen@microsoft.com> Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com> Co-authored-by: Zhiyuan Liang <141655842+zhiyuanliang-ms@users.noreply.github.com> Co-authored-by: Matthew Metcalf <mrm9084@gmail.com> Co-authored-by: catalinaperalta <9859037+catalinaperalta@users.noreply.github.com> Co-authored-by: catalinaperalta <caperal@microsoft.com> Co-authored-by: helen229 <gaoh@microsoft.com> Co-authored-by: Scott Beddall <45376673+scbedd@users.noreply.github.com>
1 parent 0578a8a commit 0c9c911

File tree

1 file changed

+120
-1
lines changed
  • sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate

1 file changed

+120
-1
lines changed

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py

Lines changed: 120 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1736,6 +1736,8 @@ def _convert_results_to_aoai_evaluation_results(
17361736
criteria_type = criteria_name_types_from_meta[criteria_name].get("type", None)
17371737
evaluator_name = criteria_name_types_from_meta[criteria_name].get("evaluator_name", None)
17381738
if evaluator_name:
1739+
if criteria_type == "azure_ai_evaluator" and evaluator_name.startswith("builtin."):
1740+
evaluator_name = evaluator_name.replace("builtin.", "")
17391741
metrics_mapped = _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS.get(evaluator_name, [])
17401742
if metrics_mapped and len(metrics_mapped) > 0:
17411743
metrics.extend(metrics_mapped)
@@ -1798,6 +1800,7 @@ def _convert_results_to_aoai_evaluation_results(
17981800
result_per_metric[metric] = {"score": metric_value}
17991801
else:
18001802
result_per_metric[metric]["score"] = metric_value
1803+
_append_indirect_attachments_to_results(result_per_metric, "score", metric, metric_value)
18011804
elif metric_key.endswith("_result") or metric_key == "result" or metric_key.endswith("_label"):
18021805
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18031806
label = metric_value
@@ -1809,6 +1812,8 @@ def _convert_results_to_aoai_evaluation_results(
18091812
else:
18101813
result_per_metric[metric]["label"] = metric_value
18111814
result_per_metric[metric]["passed"] = passed
1815+
_append_indirect_attachments_to_results(result_per_metric, "label", metric, label)
1816+
_append_indirect_attachments_to_results(result_per_metric, "passed", metric, passed)
18121817
elif (
18131818
metric_key.endswith("_reason") and not metric_key.endswith("_finish_reason")
18141819
) or metric_key == "reason":
@@ -1817,18 +1822,21 @@ def _convert_results_to_aoai_evaluation_results(
18171822
result_per_metric[metric] = {"reason": metric_value}
18181823
else:
18191824
result_per_metric[metric]["reason"] = metric_value
1825+
_append_indirect_attachments_to_results(result_per_metric, "reason", metric, metric_value)
18201826
elif metric_key.endswith("_threshold") or metric_key == "threshold":
18211827
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18221828
if metric not in result_per_metric:
18231829
result_per_metric[metric] = {"threshold": metric_value}
18241830
else:
18251831
result_per_metric[metric]["threshold"] = metric_value
1832+
_append_indirect_attachments_to_results(result_per_metric, "threshold", metric, metric_value)
18261833
elif metric_key == "sample":
18271834
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18281835
if metric not in result_per_metric:
18291836
result_per_metric[metric] = {"sample": metric_value}
18301837
else:
18311838
result_per_metric[metric]["sample"] = metric_value
1839+
_append_indirect_attachments_to_results(result_per_metric, "sample", metric, metric_value)
18321840
elif metric_key.endswith("_finish_reason"):
18331841
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18341842
if metric not in result_per_metric:
@@ -1841,6 +1849,9 @@ def _convert_results_to_aoai_evaluation_results(
18411849
and "finish_reason" not in result_per_metric[metric]["sample"]
18421850
):
18431851
result_per_metric[metric]["sample"]["finish_reason"] = metric_value
1852+
_append_indirect_attachments_to_results(
1853+
result_per_metric, "sample", metric, metric_value, "finish_reason"
1854+
)
18441855
elif metric_key.endswith("_model"):
18451856
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18461857
if metric not in result_per_metric:
@@ -1853,6 +1864,7 @@ def _convert_results_to_aoai_evaluation_results(
18531864
and "model" not in result_per_metric[metric]["sample"]
18541865
):
18551866
result_per_metric[metric]["sample"]["model"] = metric_value
1867+
_append_indirect_attachments_to_results(result_per_metric, "sample", metric, metric_value, "model")
18561868
elif metric_key.endswith("_sample_input"):
18571869
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18581870
input_metric_val_json: Optional[List[Dict[str, Any]]] = []
@@ -1870,6 +1882,9 @@ def _convert_results_to_aoai_evaluation_results(
18701882
and "input" not in result_per_metric[metric]["sample"]
18711883
):
18721884
result_per_metric[metric]["sample"]["input"] = input_metric_val_json
1885+
_append_indirect_attachments_to_results(
1886+
result_per_metric, "sample", metric, input_metric_val_json, "input"
1887+
)
18731888
elif metric_key.endswith("_sample_output"):
18741889
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18751890
output_metric_val_json: Optional[List[Dict[str, Any]]] = []
@@ -1887,6 +1902,9 @@ def _convert_results_to_aoai_evaluation_results(
18871902
and "output" not in result_per_metric[metric]["sample"]
18881903
):
18891904
result_per_metric[metric]["sample"]["output"] = output_metric_val_json
1905+
_append_indirect_attachments_to_results(
1906+
result_per_metric, "sample", metric, output_metric_val_json, "output"
1907+
)
18901908
elif metric_key.endswith("_total_tokens"):
18911909
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18921910
if metric not in result_per_metric:
@@ -1901,6 +1919,9 @@ def _convert_results_to_aoai_evaluation_results(
19011919
result_per_metric[metric]["sample"]["usage"] = {"total_tokens": metric_value}
19021920
else:
19031921
result_per_metric[metric]["sample"]["usage"]["total_tokens"] = metric_value
1922+
_append_indirect_attachments_to_results(
1923+
result_per_metric, "sample", metric, metric_value, "usage", "total_tokens"
1924+
)
19041925
elif metric_key.endswith("_prompt_tokens"):
19051926
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
19061927
if metric not in result_per_metric:
@@ -1915,6 +1936,9 @@ def _convert_results_to_aoai_evaluation_results(
19151936
result_per_metric[metric]["sample"]["usage"] = {"prompt_tokens": metric_value}
19161937
else:
19171938
result_per_metric[metric]["sample"]["usage"]["prompt_tokens"] = metric_value
1939+
_append_indirect_attachments_to_results(
1940+
result_per_metric, "sample", metric, metric_value, "usage", "prompt_tokens"
1941+
)
19181942
elif metric_key.endswith("_completion_tokens"):
19191943
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
19201944
if metric not in result_per_metric:
@@ -1929,6 +1953,9 @@ def _convert_results_to_aoai_evaluation_results(
19291953
result_per_metric[metric]["sample"]["usage"] = {"completion_tokens": metric_value}
19301954
else:
19311955
result_per_metric[metric]["sample"]["usage"]["completion_tokens"] = metric_value
1956+
_append_indirect_attachments_to_results(
1957+
result_per_metric, "sample", metric, metric_value, "usage", "completion_tokens"
1958+
)
19321959
elif not any(
19331960
metric_key.endswith(suffix)
19341961
for suffix in [
@@ -1970,6 +1997,20 @@ def _convert_results_to_aoai_evaluation_results(
19701997
"metric": metric if metric is not None else criteria_name, # Use criteria name as metric
19711998
}
19721999
# Add optional fields
2000+
if (
2001+
metric in _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS["indirect_attack"]
2002+
or metric in _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS["code_vulnerability"]
2003+
or metric in _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS["protected_material"]
2004+
):
2005+
copy_label = label
2006+
if copy_label is not None and isinstance(copy_label, bool) and copy_label == True:
2007+
label = "fail"
2008+
score = 0.0
2009+
passed = False
2010+
else:
2011+
label = "pass"
2012+
score = 1.0
2013+
passed = True
19732014
result_obj["score"] = score
19742015
result_obj["label"] = label
19752016
result_obj["reason"] = reason
@@ -2044,6 +2085,67 @@ def _convert_results_to_aoai_evaluation_results(
20442085
)
20452086

20462087

2088+
def _append_indirect_attachments_to_results(
2089+
current_result_dict: Dict[str, Any],
2090+
result_name: str,
2091+
metric: str,
2092+
metric_value: Any,
2093+
nested_result_name: Optional[str] = None,
2094+
secondnested_result_name: Optional[str] = None,
2095+
) -> None:
2096+
"""
2097+
Append indirect attachments to the current result dictionary.
2098+
2099+
:param current_result_dict: The current result dictionary to update
2100+
:type current_result_dict: Dict[str, Any]
2101+
:param result_name: The result name
2102+
:type result_name: str
2103+
:param metric: The metric name
2104+
:type metric: str
2105+
:param metric_value: The value of the metric
2106+
:type metric_value: Any
2107+
"""
2108+
if metric == "xpia" and result_name:
2109+
for metric_extended in ["xpia_manipulated_content", "xpia_intrusion", "xpia_information_gathering"]:
2110+
if nested_result_name is None:
2111+
if metric_extended not in current_result_dict:
2112+
current_result_dict[metric_extended] = {result_name: metric_value}
2113+
else:
2114+
current_result_dict[metric_extended][result_name] = metric_value
2115+
elif nested_result_name is not None and secondnested_result_name is None:
2116+
if metric_extended not in current_result_dict:
2117+
current_result_dict[metric_extended] = {result_name: {nested_result_name: metric_value}}
2118+
elif metric_extended in current_result_dict and result_name not in current_result_dict[metric_extended]:
2119+
current_result_dict[metric_extended][result_name] = {nested_result_name: metric_value}
2120+
elif (
2121+
metric_extended in current_result_dict
2122+
and result_name in current_result_dict[metric_extended]
2123+
and nested_result_name not in current_result_dict[metric_extended][result_name]
2124+
):
2125+
current_result_dict[metric_extended][result_name][nested_result_name] = metric_value
2126+
elif nested_result_name is not None and secondnested_result_name is not None:
2127+
if metric_extended not in current_result_dict:
2128+
current_result_dict[metric_extended] = {
2129+
result_name: {nested_result_name: {secondnested_result_name: metric_value}}
2130+
}
2131+
elif metric_extended in current_result_dict and result_name not in current_result_dict[metric_extended]:
2132+
current_result_dict[metric_extended][result_name] = {
2133+
nested_result_name: {secondnested_result_name: metric_value}
2134+
}
2135+
elif (
2136+
metric_extended in current_result_dict
2137+
and result_name in current_result_dict[metric_extended]
2138+
and nested_result_name not in current_result_dict[metric_extended][result_name]
2139+
):
2140+
current_result_dict[metric_extended][result_name][nested_result_name] = {
2141+
secondnested_result_name: metric_value
2142+
}
2143+
else:
2144+
(
2145+
current_result_dict[metric_extended][result_name][nested_result_name][secondnested_result_name]
2146+
) = metric_value
2147+
2148+
20472149
def _get_metric_from_criteria(testing_criteria_name: str, metric_key: str, metric_list: List[str]) -> str:
20482150
"""
20492151
Get the metric name from the testing criteria and metric key.
@@ -2058,6 +2160,16 @@ def _get_metric_from_criteria(testing_criteria_name: str, metric_key: str, metri
20582160
:rtype: str
20592161
"""
20602162
metric = None
2163+
2164+
if metric_key == "xpia_manipulated_content":
2165+
metric = "xpia_manipulated_content"
2166+
return metric
2167+
elif metric_key == "xpia_intrusion":
2168+
metric = "xpia_intrusion"
2169+
return metric
2170+
elif metric_key == "xpia_information_gathering":
2171+
metric = "xpia_information_gathering"
2172+
return metric
20612173
for expected_metric in metric_list:
20622174
if metric_key.startswith(expected_metric):
20632175
metric = expected_metric
@@ -2124,9 +2236,16 @@ def _calculate_aoai_evaluation_summary(aoai_results: list, logger: logging.Logge
21242236

21252237
# Extract usage statistics from aoai_result.sample
21262238
sample_data_list = []
2239+
dup_usage_list = _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS["indirect_attack"].copy()
2240+
dup_usage_list.remove("xpia")
21272241
if isinstance(aoai_result, dict) and aoai_result["results"] and isinstance(aoai_result["results"], list):
21282242
for result_item in aoai_result["results"]:
2129-
if isinstance(result_item, dict) and "sample" in result_item and result_item["sample"]:
2243+
if (
2244+
isinstance(result_item, dict)
2245+
and "sample" in result_item
2246+
and result_item["sample"]
2247+
and result_item["metric"] not in dup_usage_list
2248+
):
21302249
sample_data_list.append(result_item["sample"])
21312250

21322251
for sample_data in sample_data_list:

0 commit comments

Comments
 (0)