Skip to content

Commit e856faa

Browse files
YoYoJaebwintersrads-1996azure-sdkdanieljurek
authored
Jessli/convert Fix bug (#43557)
* add eval result converter * Add result converter * update converter params to optional * add eval meta data * fix type * remove useless file * get eval meta data as input * fix build errors * remove useless import * resolve comments * update * update comments * fix checker failure * add error msg and error code * Surface evaluator error msg * update UT * fix usage * make eval_meta_data optional * remove useless lines * update param name to add underscore * parse updated annotation results * update trace_id * expose sample data for sdk evaluators * update * Fix column mapping bug for AOAI evaluators with custom data mapping (#43429) * fix nesting bug for custom data mapping * address comments * remove extra code and fix test case * run formatter * use dumps * Modify logic for message body on Microsoft.ApplicationInsights.MessageData to include default message for messages with empty body and export logs (#43091) * Modify logic in PR (#43060) to include default message for messages with empty body and export logs * Update CHANGELOG * Update logic as per updated spec * Addressed comments * Set-VcpkgWriteModeCache -- add token timeout param for cmake generate's that exceed 1 hour (this can happen in C++ API View) (#43470) Co-authored-by: Daniel Jurek <djurek@microsoft.com> * update * fix UT * fix tests * Added Tests and Samples for Paginated Queries (#43472) * added tests and samples for paginated queries * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * added single partition pagination sample --------- Co-authored-by: Andrew Mathew <andrewmathew@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Test Proxy] Support AARCH64 platform (#43428) * Delete doc/dev/how_to_request_a_feature_in_sdk.md (#43415) this doc is outdated * fix test * [AutoRelease] t2-iothub-2025-10-03-03336(can only be merged by SDK owner) (#43230) * code and test * update pyproject.toml --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> * [AutoRelease] t2-redisenterprise-2025-10-17-18412(can only be merged by SDK owner) (#43476) * code and test * update changelog * update changelog * Update CHANGELOG.md --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * Extend basic test for "project_client.agents" to do more operations (#43516) * Sync eng/common directory with azure-sdk-tools for PR 12478 (#43457) * Updated validate pkg template to use packageInfo * Fixed typo * Fixed the right variable to use * output debug log * Fixed errors in expression evaluation * removed debug code * Fixed an issue in pipeline * Updated condition for variable setting step * Join paths of the script path * Use join-path * return from the function rather than exit --------- Co-authored-by: ray chen <raychen@microsoft.com> * Reorder error and warning log line processing (#43456) Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com> * [App Configuration] - Release 1.7.2 (#43520) * release 1.7.2 * update change log * Modify CODEOWNERS for Azure SDK ownership changes (#43524) Updated CODEOWNERS to reflect new ownership for Azure SDK components. * Migrate Confidential Ledger library from swagger to typespec codegen (#42664) * regen * add default cert endpoint with tsp * remove refs to old namespace * update async operation patch * fix operations patch * fix header impl * more header fixes * revert receipt directory removal * cspell * regen certificates under correct namespace * regen ledger client * update namespace name * revert certificate change * update shared files after regen * updates * delete extra files * cspell * match return type to current behavior * cspell * mypy * pylint * update docs * regen * regen * fix patch * Revert "mypy" This reverts commit 6351ead. * add info in tsp_location.yaml * regen * update patch files * update patch files * fix patch * update patch files * regen * update tsp-location.yaml * generate certificate client * update patch files * fixes * regen clients * update pyproject.toml deps * update assets * regen * revert test change * nit * fix test input * regen with new model * update tests * update tests * apiview props * regen * update tests * update assets * apiview props * temp relative package updates * fix name * fix ledger ci (#43181) * remove swagger * remove extra configs * wip revert package dep temporarily * update readme * fix config files * Revert "wip revert package dep temporarily" This reverts commit db553c4. * move tests * add identity samples --------- Co-authored-by: catalinaperalta <caperal@microsoft.com> * rm certificate files * update changelog * misc fixes * update shared reqs * test * pylint --------- Co-authored-by: catalinaperalta <caperal@microsoft.com> * update scripts (#43527) Co-authored-by: helen229 <gaoh@microsoft.com> * [AutoPR azure-mgmt-mongocluster]-generated-from-SDK Generation - Python-5459673 (#43448) * Configurations: 'specification/mongocluster/resource-manager/Microsoft.DocumentDB/MongoCluster/tspconfig.yaml', API Version: 2025-09-01, SDK Release Type: stable, and CommitSHA: 'c5601446fc65494f18157aecbcc79cebcfbab1fb' in SpecRepo: 'https://github.com/Azure/azure-rest-api-specs' Pipeline run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=5459673 Refer to https://eng.ms/docs/products/azure-developer-experience/develop/sdk-release/sdk-release-prerequisites to prepare for SDK release. * update changelog --------- Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> * App Configuration Provider - Key Vault Refresh (#41882) * Sync refresh changes * Key Vault Refresh * adding tests and fixing sync refresh * Updating Async * Fixed Async Tests * Updated tests and change log * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fixing merge issue * Updating comments * Updating secret refresh * Update _azureappconfigurationproviderasync.py * Fixing Optional Endpoint * fix mypy issue * fixing async test * mixing merge * fixing test after merge * Update testcase.py * Secret Provider Base * removing unused imports * updating exception * updating resolve key vault references * Review comments * fixing tests * tox updates * Updating Tests * Updating Async to be the same as sync * Fixing formatting * fixing tox and unneeded "" * fixing tox items * fix cspell + tests recording * Update test_async_secret_provider.py * Post Merge updates * Move cache to shared code * removed unneeded disabled * Update Secret Provider * Updating usage * Update assets.json * Updated to make secret refresh update dictionary * removing _secret_version_cache * Update assets.json * Update _secret_provider_base.py --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Increment package version after release of azure-appconfiguration (#43531) * Patch `azure-template` back to `green` (#43533) * Update sdk/template/azure-template/pyproject.toml to use `repository` instead of `source` * added brackets for sql query keyword value (#43525) Co-authored-by: Andrew Mathew <andrewmathew@microsoft.com> * update changelog (#43532) Co-authored-by: catalinaperalta <caperal@microsoft.com> * App Config Provider - Provider Refactor (#43196) * Code Cleanup * Move validation to shared file * Updating Header Check * Update test_azureappconfigurationproviderbase.py * moved async tests to aio folder * post merge updates --------- Co-authored-by: Ethan Winters <etwinter@microsoft.com> Co-authored-by: rads-1996 <guptaradhika@microsoft.com> Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com> Co-authored-by: Daniel Jurek <djurek@microsoft.com> Co-authored-by: Andrew Mathew <80082032+andrewmathew1@users.noreply.github.com> Co-authored-by: Andrew Mathew <andrewmathew@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: McCoy Patiño <39780829+mccoyp@users.noreply.github.com> Co-authored-by: Yuchao Yan <yuchaoyan@microsoft.com> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> Co-authored-by: Darren Cohen <39422044+dargilco@users.noreply.github.com> Co-authored-by: ray chen <raychen@microsoft.com> Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com> Co-authored-by: Zhiyuan Liang <141655842+zhiyuanliang-ms@users.noreply.github.com> Co-authored-by: Matthew Metcalf <mrm9084@gmail.com> Co-authored-by: catalinaperalta <9859037+catalinaperalta@users.noreply.github.com> Co-authored-by: catalinaperalta <caperal@microsoft.com> Co-authored-by: helen229 <gaoh@microsoft.com> Co-authored-by: Scott Beddall <45376673+scbedd@users.noreply.github.com>
1 parent 76b1951 commit e856faa

File tree

1 file changed

+126
-1
lines changed
  • sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate

1 file changed

+126
-1
lines changed

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py

Lines changed: 126 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1736,6 +1736,8 @@ def _convert_results_to_aoai_evaluation_results(
17361736
criteria_type = criteria_name_types_from_meta[criteria_name].get("type", None)
17371737
evaluator_name = criteria_name_types_from_meta[criteria_name].get("evaluator_name", None)
17381738
if evaluator_name:
1739+
if criteria_type=="azure_ai_evaluator" and evaluator_name.startswith("builtin."):
1740+
evaluator_name = evaluator_name.replace("builtin.", "")
17391741
metrics_mapped = _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS.get(evaluator_name, [])
17401742
if metrics_mapped and len(metrics_mapped) > 0:
17411743
metrics.extend(metrics_mapped)
@@ -1798,6 +1800,9 @@ def _convert_results_to_aoai_evaluation_results(
17981800
result_per_metric[metric] = {"score": metric_value}
17991801
else:
18001802
result_per_metric[metric]["score"] = metric_value
1803+
_append_indirect_attachments_to_results(
1804+
result_per_metric, "score", metric, metric_value
1805+
)
18011806
elif metric_key.endswith("_result") or metric_key == "result" or metric_key.endswith("_label"):
18021807
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18031808
label = metric_value
@@ -1809,6 +1814,12 @@ def _convert_results_to_aoai_evaluation_results(
18091814
else:
18101815
result_per_metric[metric]["label"] = metric_value
18111816
result_per_metric[metric]["passed"] = passed
1817+
_append_indirect_attachments_to_results(
1818+
result_per_metric, "label", metric, label
1819+
)
1820+
_append_indirect_attachments_to_results(
1821+
result_per_metric, "passed", metric, passed
1822+
)
18121823
elif (
18131824
metric_key.endswith("_reason") and not metric_key.endswith("_finish_reason")
18141825
) or metric_key == "reason":
@@ -1817,18 +1828,27 @@ def _convert_results_to_aoai_evaluation_results(
18171828
result_per_metric[metric] = {"reason": metric_value}
18181829
else:
18191830
result_per_metric[metric]["reason"] = metric_value
1831+
_append_indirect_attachments_to_results(
1832+
result_per_metric, "reason", metric, metric_value
1833+
)
18201834
elif metric_key.endswith("_threshold") or metric_key == "threshold":
18211835
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18221836
if metric not in result_per_metric:
18231837
result_per_metric[metric] = {"threshold": metric_value}
18241838
else:
18251839
result_per_metric[metric]["threshold"] = metric_value
1840+
_append_indirect_attachments_to_results(
1841+
result_per_metric, "threshold", metric, metric_value
1842+
)
18261843
elif metric_key == "sample":
18271844
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18281845
if metric not in result_per_metric:
18291846
result_per_metric[metric] = {"sample": metric_value}
18301847
else:
18311848
result_per_metric[metric]["sample"] = metric_value
1849+
_append_indirect_attachments_to_results(
1850+
result_per_metric, "sample", metric, metric_value
1851+
)
18321852
elif metric_key.endswith("_finish_reason"):
18331853
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18341854
if metric not in result_per_metric:
@@ -1841,6 +1861,9 @@ def _convert_results_to_aoai_evaluation_results(
18411861
and "finish_reason" not in result_per_metric[metric]["sample"]
18421862
):
18431863
result_per_metric[metric]["sample"]["finish_reason"] = metric_value
1864+
_append_indirect_attachments_to_results(
1865+
result_per_metric, "sample", metric, metric_value, "finish_reason"
1866+
)
18441867
elif metric_key.endswith("_model"):
18451868
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18461869
if metric not in result_per_metric:
@@ -1853,6 +1876,9 @@ def _convert_results_to_aoai_evaluation_results(
18531876
and "model" not in result_per_metric[metric]["sample"]
18541877
):
18551878
result_per_metric[metric]["sample"]["model"] = metric_value
1879+
_append_indirect_attachments_to_results(
1880+
result_per_metric, "sample", metric, metric_value, "model"
1881+
)
18561882
elif metric_key.endswith("_sample_input"):
18571883
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18581884
input_metric_val_json: Optional[List[Dict[str, Any]]] = []
@@ -1870,6 +1896,9 @@ def _convert_results_to_aoai_evaluation_results(
18701896
and "input" not in result_per_metric[metric]["sample"]
18711897
):
18721898
result_per_metric[metric]["sample"]["input"] = input_metric_val_json
1899+
_append_indirect_attachments_to_results(
1900+
result_per_metric, "sample", metric, input_metric_val_json, "input"
1901+
)
18731902
elif metric_key.endswith("_sample_output"):
18741903
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18751904
output_metric_val_json: Optional[List[Dict[str, Any]]] = []
@@ -1887,6 +1916,9 @@ def _convert_results_to_aoai_evaluation_results(
18871916
and "output" not in result_per_metric[metric]["sample"]
18881917
):
18891918
result_per_metric[metric]["sample"]["output"] = output_metric_val_json
1919+
_append_indirect_attachments_to_results(
1920+
result_per_metric, "sample", metric, output_metric_val_json, "output"
1921+
)
18901922
elif metric_key.endswith("_total_tokens"):
18911923
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
18921924
if metric not in result_per_metric:
@@ -1901,6 +1933,9 @@ def _convert_results_to_aoai_evaluation_results(
19011933
result_per_metric[metric]["sample"]["usage"] = {"total_tokens": metric_value}
19021934
else:
19031935
result_per_metric[metric]["sample"]["usage"]["total_tokens"] = metric_value
1936+
_append_indirect_attachments_to_results(
1937+
result_per_metric, "sample", metric, metric_value, "usage", "total_tokens"
1938+
)
19041939
elif metric_key.endswith("_prompt_tokens"):
19051940
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
19061941
if metric not in result_per_metric:
@@ -1915,6 +1950,9 @@ def _convert_results_to_aoai_evaluation_results(
19151950
result_per_metric[metric]["sample"]["usage"] = {"prompt_tokens": metric_value}
19161951
else:
19171952
result_per_metric[metric]["sample"]["usage"]["prompt_tokens"] = metric_value
1953+
_append_indirect_attachments_to_results(
1954+
result_per_metric, "sample", metric, metric_value, "usage", "prompt_tokens"
1955+
)
19181956
elif metric_key.endswith("_completion_tokens"):
19191957
metric = _get_metric_from_criteria(criteria_name, metric_key, expected_metrics)
19201958
if metric not in result_per_metric:
@@ -1929,6 +1967,9 @@ def _convert_results_to_aoai_evaluation_results(
19291967
result_per_metric[metric]["sample"]["usage"] = {"completion_tokens": metric_value}
19301968
else:
19311969
result_per_metric[metric]["sample"]["usage"]["completion_tokens"] = metric_value
1970+
_append_indirect_attachments_to_results(
1971+
result_per_metric, "sample", metric, metric_value, "usage", "completion_tokens"
1972+
)
19321973
elif not any(
19331974
metric_key.endswith(suffix)
19341975
for suffix in [
@@ -1970,6 +2011,18 @@ def _convert_results_to_aoai_evaluation_results(
19702011
"metric": metric if metric is not None else criteria_name, # Use criteria name as metric
19712012
}
19722013
# Add optional fields
2014+
if(metric in _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS["indirect_attack"]
2015+
or metric in _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS["code_vulnerability"]
2016+
or metric in _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS["protected_material"]):
2017+
copy_label = label
2018+
if copy_label is not None and isinstance(copy_label, bool) and copy_label == True:
2019+
label = "fail"
2020+
score = 0.0
2021+
passed = False
2022+
else:
2023+
label = "pass"
2024+
score = 1.0
2025+
passed = True
19732026
result_obj["score"] = score
19742027
result_obj["label"] = label
19752028
result_obj["reason"] = reason
@@ -2043,6 +2096,65 @@ def _convert_results_to_aoai_evaluation_results(
20432096
f"Summary statistics calculated for {len(converted_rows)} rows, eval_id: {eval_id}, eval_run_id: {eval_run_id}"
20442097
)
20452098

2099+
def _append_indirect_attachments_to_results(current_result_dict: Dict[str, Any],
2100+
result_name: str,
2101+
metric: str,
2102+
metric_value: Any,
2103+
nested_result_name: Optional[str] = None,
2104+
secondnested_result_name: Optional[str] = None) -> None:
2105+
"""
2106+
Append indirect attachments to the current result dictionary.
2107+
2108+
:param current_result_dict: The current result dictionary to update
2109+
:type current_result_dict: Dict[str, Any]
2110+
:param result_name: The result name
2111+
:type result_name: str
2112+
:param metric: The metric name
2113+
:type metric: str
2114+
:param metric_value: The value of the metric
2115+
:type metric_value: Any
2116+
"""
2117+
if metric == "xpia" and result_name:
2118+
for metric_extended in ["xpia_manipulated_content", "xpia_intrusion", "xpia_information_gathering"]:
2119+
if nested_result_name is None:
2120+
if metric_extended not in current_result_dict:
2121+
current_result_dict[metric_extended] = { result_name: metric_value }
2122+
else:
2123+
current_result_dict[metric_extended][result_name] = metric_value
2124+
elif nested_result_name is not None and secondnested_result_name is None:
2125+
if metric_extended not in current_result_dict:
2126+
current_result_dict[metric_extended] = {result_name : {nested_result_name: metric_value}}
2127+
elif (metric_extended in current_result_dict
2128+
and result_name not in current_result_dict[metric_extended]
2129+
):
2130+
current_result_dict[metric_extended][result_name] = {nested_result_name: metric_value}
2131+
elif (
2132+
metric_extended in current_result_dict
2133+
and result_name in current_result_dict[metric_extended]
2134+
and nested_result_name not in current_result_dict[metric_extended][result_name]
2135+
):
2136+
current_result_dict[metric_extended][result_name][nested_result_name] = metric_value
2137+
elif nested_result_name is not None and secondnested_result_name is not None:
2138+
if metric_extended not in current_result_dict:
2139+
current_result_dict[metric_extended] = {
2140+
result_name: {nested_result_name: {secondnested_result_name: metric_value}}
2141+
}
2142+
elif (metric_extended in current_result_dict
2143+
and result_name not in current_result_dict[metric_extended]
2144+
):
2145+
current_result_dict[metric_extended][result_name] = {
2146+
nested_result_name: {secondnested_result_name: metric_value}
2147+
}
2148+
elif (
2149+
metric_extended in current_result_dict
2150+
and result_name in current_result_dict[metric_extended]
2151+
and nested_result_name not in current_result_dict[metric_extended][result_name]
2152+
):
2153+
current_result_dict[metric_extended][result_name][nested_result_name] = {
2154+
secondnested_result_name: metric_value
2155+
}
2156+
else:
2157+
current_result_dict[metric_extended][result_name][nested_result_name][secondnested_result_name] = metric_value
20462158

20472159
def _get_metric_from_criteria(testing_criteria_name: str, metric_key: str, metric_list: List[str]) -> str:
20482160
"""
@@ -2058,6 +2170,16 @@ def _get_metric_from_criteria(testing_criteria_name: str, metric_key: str, metri
20582170
:rtype: str
20592171
"""
20602172
metric = None
2173+
2174+
if metric_key == "xpia_manipulated_content":
2175+
metric = "xpia_manipulated_content"
2176+
return metric
2177+
elif metric_key == "xpia_intrusion":
2178+
metric = "xpia_intrusion"
2179+
return metric
2180+
elif metric_key == "xpia_information_gathering":
2181+
metric = "xpia_information_gathering"
2182+
return metric
20612183
for expected_metric in metric_list:
20622184
if metric_key.startswith(expected_metric):
20632185
metric = expected_metric
@@ -2124,9 +2246,12 @@ def _calculate_aoai_evaluation_summary(aoai_results: list, logger: logging.Logge
21242246

21252247
# Extract usage statistics from aoai_result.sample
21262248
sample_data_list = []
2249+
dup_usage_list = _EvaluatorMetricMapping.EVALUATOR_NAME_METRICS_MAPPINGS["indirect_attack"].copy()
2250+
dup_usage_list.remove("xpia")
21272251
if isinstance(aoai_result, dict) and aoai_result["results"] and isinstance(aoai_result["results"], list):
21282252
for result_item in aoai_result["results"]:
2129-
if isinstance(result_item, dict) and "sample" in result_item and result_item["sample"]:
2253+
if (isinstance(result_item, dict) and "sample" in result_item and result_item["sample"]
2254+
and result_item["metric"] not in dup_usage_list):
21302255
sample_data_list.append(result_item["sample"])
21312256

21322257
for sample_data in sample_data_list:

0 commit comments

Comments
 (0)