You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -150,7 +151,7 @@ Let's create our first RCA entity using the Port UI:
150
151
151
152
3. Toggle JSON mode and copy and paste the following JSON:
152
153
<details>
153
-
<summary>Database Connection Pool Exhaustion incident</summary>
154
+
<summary><b>Database Connection Pool Exhaustion incident (click to expand)</b></summary>
154
155
155
156
```json showLineNumbers
156
157
{
@@ -190,7 +191,7 @@ Let us add another RCA entity:
190
191
2. Toggle JSON mode and copy and paste the following JSON:
191
192
192
193
<details>
193
-
<summary>Memory Leak incident</summary>
194
+
<summary><b>Memory Leak incident (click to expand)</b></summary>
194
195
195
196
```json showLineNumbers
196
197
{
@@ -340,7 +341,7 @@ else:
340
341
**Example integration scenarios:**
341
342
342
343
<details>
343
-
<summary>GitHub Actions workflow</summary>
344
+
<summary><b>GitHub Actions workflow (click to expand)</b></summary>
344
345
345
346
```yaml title=".github/workflows/create-rca.yml"
346
347
name: Create RCA from Issue
@@ -389,11 +390,10 @@ jobs:
389
390
print(f'RCA created: {response.status_code}')
390
391
"
391
392
```
392
-
393
393
</details>
394
394
395
395
<details>
396
-
<summary>Standalone Python script for bulk import</summary>
396
+
<summary><b>Standalone Python script for bulk import (click to expand)</b></summary>
397
397
398
398
```python title="bulk_import_rcas.py"
399
399
#!/usr/bin/env python3
@@ -465,7 +465,6 @@ def main():
465
465
if __name__ == '__main__':
466
466
main()
467
467
```
468
-
469
468
</details>
470
469
471
470
</TabItem>
@@ -598,87 +597,69 @@ For a more comprehensive knowledge base, consider adding 5-10 RCA documents cove
598
597
599
598
## Update AI agent configuration
600
599
601
-
Now we'll modify the Incident Manager AI agent to include access to our RCA documents.
600
+
The Incident Manager AI agent uses the MCP tools pattern (`^(list|get|search|track|describe)_.*`), which automatically provides access to all blueprints in your catalog - including the RCA blueprint you just created. This means the agent can already search and reference RCA documents without any configuration changes.
602
601
603
-
<h3> Add RCA blueprint to allowed blueprints</h3>
604
-
605
-
1. Go to the [AI Agents](https://app.getport.io/_ai_agents) page.
606
-
607
-
2. Find the **Incident Manager** agent and click on the `...` on the far right of the row.
608
-
609
-
3. Click on `Edit`.
602
+
:::tip Automatic blueprint discovery
603
+
With the MCP tools pattern, AI agents automatically discover new blueprints you create. You don't need to manually add `"rootCauseAnalysis"` to any configuration - the agent already has access to query and reference these documents.
604
+
:::
610
605
611
-
4. In the `allowed_blueprints` array, add `"rootCauseAnalysis"`:
612
606
613
-
```json showLineNumbers
614
-
"allowed_blueprints": [
615
-
"pagerdutyService",
616
-
"pagerdutyIncident",
617
-
"pagerdutyEscalationPolicy",
618
-
"pagerdutySchedule",
619
-
"pagerdutyOncall",
620
-
"pagerdutyUser",
621
-
"_user",
622
-
"_team",
623
-
"service",
624
-
"rootCauseAnalysis" //highlight
625
-
]
626
-
```
607
+
<h3> Update the agent prompt (Optional)</h3>
627
608
628
-
5. Click `Save` to save the changes.
609
+
While the agent can automatically access RCA documents, updating the prompt helps guide it on when and how to use this information effectively.
629
610
611
+
1. Go to the [AI Agents](https://app.getport.io/_ai_agents) page.
630
612
631
-
<h3> Update the agent prompt</h3>
613
+
2. Find the **Incident Manager** agent and click on the `...` on the far right of the row.
632
614
633
-
Enhance the prompt to include instructions about using RCA context:
615
+
3. Click on `Edit`.
634
616
635
-
1. Click on `Edit property` on the `Prompt` field.
617
+
4. Click on `Edit property` on the `Prompt` field.
636
618
637
-
2. Replace the existing content with the following:
619
+
5. Replace the existing content with the following:
638
620
639
621
<details>
640
-
<summary>Enhanced agent prompt</summary>
622
+
<summary><b>Enhanced agent prompt (click to expand)</b></summary>
641
623
642
-
```markdown showLineNumbers"
643
-
You are an agent responsible for answering questions about PagerDuty incidents, services, escalation policies, schedules, and on-call rotations.
644
-
You also have access to historical Root Cause Analysis (RCA) documents from past incidents.
645
-
646
-
## Guidelines
647
-
- Provide clear information about incidents
648
-
- Identify who is on-call for services (both primary and secondary on-call)
649
-
- Report on incident statistics and resolution times
650
-
- When relevant, reference past RCA documents to provide context and suggest solutions
651
-
- Use RCA lessons learned to help prevent similar incidents
652
-
- Suggest preventive measures based on historical incident patterns
624
+
```markdown showLineNumbers"
625
+
You are an agent responsible for answering questions about PagerDuty incidents, services, escalation policies, schedules, and on-call rotations.
626
+
You also have access to historical Root Cause Analysis (RCA) documents from past incidents.
627
+
628
+
### Guidelines
629
+
- Provide clear information about incidents
630
+
- Identify who is on-call for services (both primary and secondary on-call)
631
+
- Report on incident statistics and resolution times
632
+
- When relevant, reference past RCA documents to provide context and suggest solutions
633
+
- Use RCA lessons learned to help prevent similar incidents
634
+
- Suggest preventive measures based on historical incident patterns
Copy file name to clipboardExpand all lines: docs/guides/all/heal-unhealthy-k8s-pods.md
+12-14Lines changed: 12 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1107,20 +1107,6 @@ Next, we will create an AI agent that analyzes pod health issues and creates com
1107
1107
"properties": {
1108
1108
"description": "AI agent specialized in diagnosing and automatically fixing unhealthy Kubernetes workloads. Monitors pod health, identifies root causes, and applies appropriate fixes.",
1109
1109
"status": "active",
1110
-
"allowed_blueprints": [
1111
-
"k8s_workload",
1112
-
"k8s_pod",
1113
-
"k8s_replicaSet",
1114
-
"k8s_namespace",
1115
-
"k8s_cluster"
1116
-
],
1117
-
"allowed_actions": [
1118
-
"get_k8s_pod_logs",
1119
-
"restart_k8s_workload",
1120
-
"scale_k8s_workload",
1121
-
"update_k8s_workload_config",
1122
-
"create_k8s_fix_pr"
1123
-
],
1124
1110
"prompt": "You are a Kubernetes healing AI agent with access to comprehensive SDLC data and pod logs.\n\n**Your healing process:**\n1. **FIRST - Get Logs**: Always start by retrieving pod logs using the get_k8s_pod_logs action to understand what's actually happening\n2. **Analyze with Context**: Combine log data with SDLC information (recent deployments, code changes, configuration updates) to build a complete picture\n3. **Intelligent Diagnosis**: Based on logs and context, determine the root cause (crashes, resource constraints, configuration issues, etc.)\n4. **Targeted Fix**: Execute only the specific action that will resolve the issue:\n - Restart for crashes and transient issues\n - Scale for resource constraints (CPU/memory)\n - Update config for resource limit issues\n - Create PR for complex fixes requiring code changes\n5. **Explain Your Actions**: Always explain what you found in the logs and why you chose the specific fix in your response",
1125
1111
"execution_mode": "Automatic",
1126
1112
"conversation_starters": [
@@ -1130,13 +1116,25 @@ Next, we will create an AI agent that analyzes pod health issues and creates com
1130
1116
"Scale up resources for this memory-constrained workload",
1131
1117
"What's causing this ImagePullBackOff error?",
1132
1118
"Why are my pods stuck in Pending state?"
1119
+
],
1120
+
"tools": [
1121
+
"^(list|get|search|track|describe)_.*",
1122
+
"run_get_k8s_pod_logs",
1123
+
"run_restart_k8s_workload",
1124
+
"run_scale_k8s_workload",
1125
+
"run_update_k8s_workload_config",
1126
+
"run_create_k8s_fix_pr"
1133
1127
]
1134
1128
},
1135
1129
"relations": {}
1136
1130
}
1137
1131
```
1138
1132
</details>
1139
1133
1134
+
:::tip MCP Enhanced Capabilities
1135
+
The AI agent uses MCP (Model Context Protocol) enhanced capabilities to automatically discover important and relevant blueprint entities via its tools. The `^(list|get|search|track|describe)_.*` pattern allows the agent to access and analyze related entities across your entire software catalog, including Kubernetes resources, recent deployments, code changes, runbooks, and history logs. This gives the agent rich SDLC context to make intelligent healing decisions. Additionally, we explicitly add the remediation action tools (`run_get_k8s_pod_logs`, `run_restart_k8s_workload`, etc.), which the agent calls sequentially to diagnose and fix unhealthy workloads.
0 commit comments