Update TSG for red team (#43870)

slister1001 · web-flow · commit 84d5584c1e6b · 2025-11-10T16:08:45.000-08:00
diff --git a/sdk/evaluation/azure-ai-evaluation/TROUBLESHOOTING.md b/sdk/evaluation/azure-ai-evaluation/TROUBLESHOOTING.md
@@ -14,8 +14,11 @@ This guide walks you through how to investigate failures, common errors in the `
   - [Need to generate simulations for specific harm type](#need-to-generate-simulations-for-specific-harm-type)
   - [Simulator is slow](#simulator-is-slow)
 - [Handle RedTeam Errors](#handle-redteam-errors)
+  - [Permission or authentication failures](#permission-or-authentication-failures)
   - [Target resource not found](#target-resource-not-found)
+  - [Agent name not found](#agent-name-not-found)
   - [Insufficient Storage Permissions](#insufficient-storage-permissions)
+  - [PyRIT "Error sending prompt" message](#pyrit-error-sending-prompt-message)
 - [Logging](#logging)
 - [Get Additional Help](#get-additional-help)
 
@@ -56,39 +59,71 @@ Adversarial simulators use Azure AI Studio safety evaluation backend service to
 
 The Adversarial simulator does not support selecting individual harms, instead we recommend running the `AdversarialSimulator` for 4x the number of specific harms as the `max_simulation_results`
 
-
 ### Simulator is slow
 
 Identify the type of simulations being run (adversarial or non-adversarial).
 Adjust parameters such as `api_call_retry_sleep_sec`, `api_call_delay_sec`, and `concurrent_async_task`. Please note that rate limits to llm calls can be both tokens per minute and requests per minute.
 
 ## Handle RedTeam errors
 
+### Permission or authentication failures
+- Run `az login` in the active shell before starting the scan and ensure the account has the **Azure AI User** role plus the `Storage Blob Data Contributor` assignment on the linked storage account. Both are required to create evaluation runs and upload artifacts.
+- In secured hubs, confirm the linked storage account allows access from your network (or private endpoint) and that Entra ID authentication is enabled on the storage resource.
+- If the helper warns `This may be due to missing environment variables or insufficient permissions.`, double-check the `AZURE_PROJECT_ENDPOINT`, `AGENT_NAME`, and storage role assignments before retrying.
+
 ### Target resource not found
-When initializing an Azure OpenAI model directly as `target` for a `RedTeam` scan, ensure `azure_endpoint` is specified in the format `https://<hub>.openai.azure.com/openai/deployments/<deployment_name>/chat/completions?api-version=2025-01-01-preview`. If using `AzureOpenAI`, `endpoint` should be specified in the format `https://<hub>.openai.azure.com/`. 
+- When initializing an Azure OpenAI deployment directly as the `target`, specify `azure_endpoint` as `https://<hub>.openai.azure.com/openai/deployments/<deployment_name>/chat/completions?api-version=2025-01-01-preview`.
+- If you instantiate `AzureOpenAI`, use the resource-level endpoint format `https://<hub>.openai.azure.com/` and ensure the deployment name plus API version match an active deployment.
+- A cloud run error such as `Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}` when creating the eval group can also indicate that `azure-ai-projects>=2.0.0b1` is not installed. Upgrade to that version or later to access the preview APIs used by Red Team.
+
+### Agent name not found
+- `(not_found) Agent <name> doesn’t exist` means the Azure AI project could not resolve the agent `name`. Names are case sensitive and differ from display names.
+- Verify the `AZURE_PROJECT_ENDPOINT` points to the correct project and that the agent is published there.
+- Requires `DefaultAzureCredential` from `azure.identity` and `AIProjectClient` from `azure.ai.projects`.
+- Use the following helper to list agents in the current project and confirm the `name` column matches your `AGENT_NAME` value:
+
+    ```python
+    def list_project_agents(endpoint: str | None = None) -> None:
+        project_endpoint = endpoint or os.environ.get("AZURE_PROJECT_ENDPOINT") or ""
+        if not project_endpoint:
+            print("Set AZURE_PROJECT_ENDPOINT before listing agents.")
+            return
+        with DefaultAzureCredential() as project_credential:
+            with AIProjectClient(
+                endpoint=project_endpoint,
+                credential=project_credential,
+                api_version="2025-11-15-preview",
+            ) as project_client:
+                agents = list(project_client.agents.list())
+        if not agents:
+            print(f"No agents found in project: {project_endpoint}")
+            return
+        print(f"Agents in {project_endpoint}:")
+        for agent in agents:
+            display_name = agent.get("display_name") if isinstance(agent, dict) else getattr(agent, "display_name", "")
+            name = agent.get("name") if isinstance(agent, dict) else getattr(agent, "name", "")
+            print(f"- name: {name} | display_name: {display_name}")
+    ```
 
 ### Insufficient Storage Permissions
-If you see an error like `WARNING: Failed to log artifacts to MLFlow: (UserError) Failed to upload evaluation run to the cloud due to insufficient permission to access the storage`, you need to ensure that proper permissions are assigned to the storage account linked to your Azure AI Project.
-
-To fix this issue:
-1. Open the associated resource group being used in your Azure AI Project in the Azure Portal
-2. Look up the storage accounts associated with that resource group
-3. Open each storage account and click on "Access control (IAM)" on the left side navigation
-4. Add permissions for the desired users with the "Storage Blob Data Contributor" role
-
-If you have Azure CLI, you can use the following command:
-
-```Shell
-# <mySubscriptionID>: Subscription ID of the Azure AI Studio hub's linked storage account (available in Azure AI hub resource view in Azure Portal).
-# <myResourceGroupName>: Resource group of the Azure AI Studio hub's linked storage account.
-# <user-id>: User object ID for role assignment (retrieve with "az ad user show" command).
-
-az role assignment create --role "Storage Blob Data Contributor" --scope /subscriptions/<mySubscriptionID>/resourceGroups/<myResourceGroupName> --assignee-principal-type User --assignee-object-id "<user-id>"
-```
+- `WARNING: Failed to log artifacts to MLFlow: (UserError) Failed to upload evaluation run to the cloud due to insufficient permission to access the storage` means the linked storage account is missing the necessary assignments.
+- Portal steps:
+  1. Open the resource group tied to the Azure AI Project in the Azure Portal.
+  2. Locate the linked storage account(s).
+  3. Select each storage account and choose **Access control (IAM)**.
+  4. Grant the affected identity the **Storage Blob Data Contributor** role.
+- Prefer CLI? Reuse the `az role assignment create` command described in [Troubleshoot Remote Tracking Issues](#troubleshoot-remote-tracking-issues).
+
+### PyRIT "Error sending prompt" message
+- `Exception: Error sending prompt with conversation ID: <guid>` is raised by PyRIT when a target LLM call fails inside the `PromptSendingOrchestrator`. The runner retries the conversation up to the configured limit, so occasional occurrences usually resolve automatically.
+- Common triggers include transient network issues, 429 throttling, or 5xx responses from the target deployment. Even if retries succeed you will still see the stack trace in notebook output.
+- Inspect the `redteam.log` file written to the scan output directory (typically `<working dir>/runs/<scan_id>/redteam.log`) for the underlying exception and HTTP status. Increase verbosity with `DEBUG=True` for deeper diagnostics.
+- Running in Azure AI Studio? Navigate to **Evaluate > Red Team > <run name> > Logs**, download `redteam.log`, and search for the conversation ID to inspect the payload.
+- If one conversation ID keeps failing after retries, verify the target credentials, check deployment health, and review Azure OpenAI quota or rate-limit alerts in the Azure portal.
 
 ## Logging
 
-You can set logging level via environment variable `PF_LOGGING_LEVEL`, valid values includes `CRITICAL`, `ERROR`, `WARNING`, `INFO`, `DEBUG`, default to `INFO`.
+You can set logging level via environment variable `PF_LOGGING_LEVEL`, valid values include `CRITICAL`, `ERROR`, `WARNING`, `INFO`, `DEBUG`; default is `INFO`.
 
 ## Get Additional Help