Skip to content

Commit 2e48873

Browse files
committed
feat(kubevirt): add VM troubleshooting tool with diagnostic plan template
Implements a new troubleshoot tool for VirtualMachines in the kubevirt toolset. The tool provides automated diagnostic plans based on VM status, conditions, and common issues, helping users identify and resolve VM problems efficiently. Adds: - Troubleshoot tool implementation with structured diagnostic output - Template-based diagnostic plan generation - Comprehensive test coverage for the troubleshoot functionality Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
1 parent 22e8460 commit 2e48873

File tree

4 files changed

+361
-0
lines changed

4 files changed

+361
-0
lines changed

pkg/toolsets/kubevirt/toolset.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ import (
77
internalk8s "github.com/containers/kubernetes-mcp-server/pkg/kubernetes"
88
"github.com/containers/kubernetes-mcp-server/pkg/toolsets"
99
vm_create "github.com/containers/kubernetes-mcp-server/pkg/toolsets/kubevirt/vm/create"
10+
vm_troubleshoot "github.com/containers/kubernetes-mcp-server/pkg/toolsets/kubevirt/vm/troubleshoot"
1011
)
1112

1213
type Toolset struct{}
@@ -24,6 +25,7 @@ func (t *Toolset) GetDescription() string {
2425
func (t *Toolset) GetTools(o internalk8s.Openshift) []api.ServerTool {
2526
return slices.Concat(
2627
vm_create.Tools(),
28+
vm_troubleshoot.Tools(),
2729
)
2830
}
2931

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# VirtualMachine Troubleshooting Guide
2+
3+
## VM: {{.Name}} (namespace: {{.Namespace}})
4+
5+
Follow these steps to diagnose issues with the VirtualMachine:
6+
7+
---
8+
9+
## Step 1: Check VirtualMachine Status
10+
11+
Use the `resources_get` tool to inspect the VirtualMachine:
12+
- **apiVersion**: `kubevirt.io/v1`
13+
- **kind**: `VirtualMachine`
14+
- **namespace**: `{{.Namespace}}`
15+
- **name**: `{{.Name}}`
16+
17+
**What to look for:**
18+
- `status.printableStatus` - Should be "Running" for a healthy VM
19+
- `status.ready` - Should be `true`
20+
- `status.conditions` - Look for conditions with `status: "False"` or error messages
21+
- `spec.runStrategy` - Check if it's "Always", "Manual", "Halted", or "RerunOnFailure"
22+
23+
---
24+
25+
## Step 2: Check VirtualMachineInstance Status
26+
27+
If the VM exists but isn't running, check if a VirtualMachineInstance was created:
28+
29+
Use the `resources_get` tool:
30+
- **apiVersion**: `kubevirt.io/v1`
31+
- **kind**: `VirtualMachineInstance`
32+
- **namespace**: `{{.Namespace}}`
33+
- **name**: `{{.Name}}`
34+
35+
**What to look for:**
36+
- `status.phase` - Should be "Running" for a healthy VMI
37+
- `status.conditions` - Check for "Ready" condition with `status: "True"`
38+
- `status.guestOSInfo` - Confirms guest agent is running
39+
- If VMI doesn't exist and VM runStrategy is "Always", this indicates a problem
40+
41+
---
42+
43+
## Step 3: Check DataVolume Status (if applicable)
44+
45+
If the VM uses DataVolumeTemplates, check their status:
46+
47+
Use the `resources_list` tool:
48+
- **apiVersion**: `cdi.kubevirt.io/v1beta1`
49+
- **kind**: `DataVolume`
50+
- **namespace**: `{{.Namespace}}`
51+
52+
Look for DataVolumes with names starting with `{{.Name}}-`
53+
54+
**What to look for:**
55+
- `status.phase` - Should be "Succeeded" when ready
56+
- `status.progress` - Shows import/clone progress (e.g., "100.0%")
57+
- Common issues:
58+
- Phase "Pending" - Waiting for resources
59+
- Phase "ImportScheduled" or "ImportInProgress" - Still importing
60+
- Phase "Failed" - Check `status.conditions` for error details
61+
62+
---
63+
64+
## Step 4: Check virt-launcher Pod
65+
66+
The virt-launcher pod runs the actual VM. Find and inspect it:
67+
68+
Use the `pods_list_in_namespace` tool:
69+
- **namespace**: `{{.Namespace}}`
70+
- **labelSelector**: `kubevirt.io=virt-launcher,vm.kubevirt.io/name={{.Name}}`
71+
72+
**What to look for:**
73+
- Pod should be in "Running" phase
74+
- All containers should be ready (e.g., "2/2")
75+
- Check pod events and conditions for errors
76+
77+
If pod exists, get detailed status with `pods_get`:
78+
- **namespace**: `{{.Namespace}}`
79+
- **name**: `virt-launcher-{{.Name}}-xxxxx` (use actual pod name from list)
80+
81+
Get pod logs with `pods_log`:
82+
- **namespace**: `{{.Namespace}}`
83+
- **name**: `virt-launcher-{{.Name}}-xxxxx`
84+
- **container**: `compute` (main VM container)
85+
86+
---
87+
88+
## Step 5: Check Events
89+
90+
Events provide crucial diagnostic information:
91+
92+
Use the `events_list` tool:
93+
- **namespace**: `{{.Namespace}}`
94+
95+
Filter output for events related to `{{.Name}}` - look for warnings or errors.
96+
97+
---
98+
99+
## Step 6: Check Instance Type and Preference (if used)
100+
101+
If the VM uses instance types or preferences, verify they exist:
102+
103+
For instance types, use `resources_get`:
104+
- **apiVersion**: `instancetype.kubevirt.io/v1beta1`
105+
- **kind**: `VirtualMachineClusterInstancetype`
106+
- **name**: (check VM spec for instancetype name)
107+
108+
For preferences, use `resources_get`:
109+
- **apiVersion**: `instancetype.kubevirt.io/v1beta1`
110+
- **kind**: `VirtualMachineClusterPreference`
111+
- **name**: (check VM spec for preference name)
112+
113+
---
114+
115+
## Common Issues and Solutions
116+
117+
### VM stuck in "Stopped" or "Halted"
118+
- Check `spec.runStrategy` - if "Halted", the VM is intentionally stopped
119+
- Change runStrategy to "Always" to start the VM
120+
121+
### VMI doesn't exist
122+
- Check VM conditions for admission errors
123+
- Verify instance type and preference exist
124+
- Check resource quotas in the namespace
125+
126+
### DataVolume stuck in "ImportInProgress"
127+
- Check CDI controller pods in `cdi` namespace
128+
- Verify source image is accessible
129+
- Check PVC storage class exists and has available capacity
130+
131+
### virt-launcher pod in CrashLoopBackOff
132+
- Check pod logs for container `compute`
133+
- Common causes:
134+
- Insufficient resources (CPU/memory)
135+
- Invalid VM configuration
136+
- Storage issues (PVC not available)
137+
138+
### VM starts but guest doesn't boot
139+
- Check virt-launcher logs for QEMU errors
140+
- Verify boot disk is properly configured
141+
- Check if guest agent is installed (for cloud images)
142+
- Ensure correct architecture (amd64 vs arm64)
143+
144+
---
145+
146+
## Additional Resources
147+
148+
For more detailed diagnostics:
149+
- Check KubeVirt components: `pods_list` in `kubevirt` namespace
150+
- Check CDI components: `pods_list` in `cdi` namespace (if using DataVolumes)
151+
- Review resource consumption: `pods_top` for the virt-launcher pod
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
package troubleshoot
2+
3+
import (
4+
_ "embed"
5+
"fmt"
6+
"strings"
7+
"text/template"
8+
9+
"github.com/containers/kubernetes-mcp-server/pkg/api"
10+
"github.com/google/jsonschema-go/jsonschema"
11+
"k8s.io/utils/ptr"
12+
)
13+
14+
//go:embed plan.tmpl
15+
var planTemplate string
16+
17+
func Tools() []api.ServerTool {
18+
return []api.ServerTool{
19+
{
20+
Tool: api.Tool{
21+
Name: "vm_troubleshoot",
22+
Description: "Generate a comprehensive troubleshooting guide for a VirtualMachine, providing step-by-step instructions to diagnose common issues",
23+
InputSchema: &jsonschema.Schema{
24+
Type: "object",
25+
Properties: map[string]*jsonschema.Schema{
26+
"namespace": {
27+
Type: "string",
28+
Description: "The namespace of the virtual machine",
29+
},
30+
"name": {
31+
Type: "string",
32+
Description: "The name of the virtual machine",
33+
},
34+
},
35+
Required: []string{"namespace", "name"},
36+
},
37+
Annotations: api.ToolAnnotations{
38+
Title: "Virtual Machine: Troubleshoot",
39+
ReadOnlyHint: ptr.To(true),
40+
DestructiveHint: ptr.To(false),
41+
IdempotentHint: ptr.To(true),
42+
OpenWorldHint: ptr.To(false),
43+
},
44+
},
45+
Handler: troubleshoot,
46+
},
47+
}
48+
}
49+
50+
type troubleshootParams struct {
51+
Namespace string
52+
Name string
53+
}
54+
55+
func troubleshoot(params api.ToolHandlerParams) (*api.ToolCallResult, error) {
56+
// Parse required parameters
57+
namespace, err := getRequiredString(params, "namespace")
58+
if err != nil {
59+
return api.NewToolCallResult("", err), nil
60+
}
61+
62+
name, err := getRequiredString(params, "name")
63+
if err != nil {
64+
return api.NewToolCallResult("", err), nil
65+
}
66+
67+
// Prepare template parameters
68+
templateParams := troubleshootParams{
69+
Namespace: namespace,
70+
Name: name,
71+
}
72+
73+
// Render template
74+
tmpl, err := template.New("troubleshoot").Parse(planTemplate)
75+
if err != nil {
76+
return api.NewToolCallResult("", fmt.Errorf("failed to parse template: %w", err)), nil
77+
}
78+
79+
var result strings.Builder
80+
if err := tmpl.Execute(&result, templateParams); err != nil {
81+
return api.NewToolCallResult("", fmt.Errorf("failed to render template: %w", err)), nil
82+
}
83+
84+
return api.NewToolCallResult(result.String(), nil), nil
85+
}
86+
87+
func getRequiredString(params api.ToolHandlerParams, key string) (string, error) {
88+
args := params.GetArguments()
89+
val, ok := args[key]
90+
if !ok {
91+
return "", fmt.Errorf("%s parameter required", key)
92+
}
93+
str, ok := val.(string)
94+
if !ok {
95+
return "", fmt.Errorf("%s parameter must be a string", key)
96+
}
97+
return str, nil
98+
}
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
package troubleshoot
2+
3+
import (
4+
"context"
5+
"strings"
6+
"testing"
7+
8+
"github.com/containers/kubernetes-mcp-server/pkg/api"
9+
internalk8s "github.com/containers/kubernetes-mcp-server/pkg/kubernetes"
10+
)
11+
12+
type mockToolCallRequest struct {
13+
arguments map[string]interface{}
14+
}
15+
16+
func (m *mockToolCallRequest) GetArguments() map[string]any {
17+
return m.arguments
18+
}
19+
20+
func TestTroubleshoot(t *testing.T) {
21+
tests := []struct {
22+
name string
23+
args map[string]interface{}
24+
wantErr bool
25+
checkFunc func(t *testing.T, result string)
26+
}{
27+
{
28+
name: "generates troubleshooting guide",
29+
args: map[string]interface{}{
30+
"namespace": "test-ns",
31+
"name": "test-vm",
32+
},
33+
wantErr: false,
34+
checkFunc: func(t *testing.T, result string) {
35+
if !strings.Contains(result, "VirtualMachine Troubleshooting Guide") {
36+
t.Errorf("Expected troubleshooting guide header")
37+
}
38+
if !strings.Contains(result, "test-vm") {
39+
t.Errorf("Expected VM name in guide")
40+
}
41+
if !strings.Contains(result, "test-ns") {
42+
t.Errorf("Expected namespace in guide")
43+
}
44+
if !strings.Contains(result, "Step 1: Check VirtualMachine Status") {
45+
t.Errorf("Expected step 1 header")
46+
}
47+
if !strings.Contains(result, "resources_get") {
48+
t.Errorf("Expected resources_get tool reference")
49+
}
50+
if !strings.Contains(result, "VirtualMachineInstance") {
51+
t.Errorf("Expected VMI section")
52+
}
53+
if !strings.Contains(result, "virt-launcher") {
54+
t.Errorf("Expected virt-launcher pod section")
55+
}
56+
},
57+
},
58+
{
59+
name: "missing namespace",
60+
args: map[string]interface{}{
61+
"name": "test-vm",
62+
},
63+
wantErr: true,
64+
},
65+
{
66+
name: "missing name",
67+
args: map[string]interface{}{
68+
"namespace": "test-ns",
69+
},
70+
wantErr: true,
71+
},
72+
}
73+
74+
for _, tt := range tests {
75+
t.Run(tt.name, func(t *testing.T) {
76+
params := api.ToolHandlerParams{
77+
Context: context.Background(),
78+
Kubernetes: &internalk8s.Kubernetes{},
79+
ToolCallRequest: &mockToolCallRequest{arguments: tt.args},
80+
}
81+
82+
result, err := troubleshoot(params)
83+
if err != nil {
84+
t.Errorf("troubleshoot() unexpected Go error: %v", err)
85+
return
86+
}
87+
88+
if result == nil {
89+
t.Error("Expected non-nil result")
90+
return
91+
}
92+
93+
if tt.wantErr {
94+
if result.Error == nil {
95+
t.Error("Expected error in result.Error, got nil")
96+
}
97+
} else {
98+
if result.Error != nil {
99+
t.Errorf("Expected no error in result, got: %v", result.Error)
100+
}
101+
if result.Content == "" {
102+
t.Error("Expected non-empty result content")
103+
}
104+
if tt.checkFunc != nil {
105+
tt.checkFunc(t, result.Content)
106+
}
107+
}
108+
})
109+
}
110+
}

0 commit comments

Comments
 (0)