diff --git a/deploy/kubernetes/README.md b/deploy/kubernetes/README.md
index 175763cdc..e51d6aa45 100644
--- a/deploy/kubernetes/README.md
+++ b/deploy/kubernetes/README.md
@@ -1,13 +1,16 @@
 # Semantic Router Kubernetes Deployment
 
-This directory contains Kubernetes manifests for deploying the Semantic Router using Kustomize.
+This directory contains Kubernetes manifests for deploying the Semantic Router using Kustomize. It provides two modes similar to docker-compose profiles:
+
+- core: only the semantic-router (no llm-katan)
+- llm-katan: semantic-router plus an llm-katan sidecar listening on 8002 (served model name `qwen3`)
 
 ## Architecture
 
 The deployment consists of:
 
 - **ConfigMap**: Contains `config.yaml` and `tools_db.json` configuration files
-- **PersistentVolumeClaim**: 10Gi storage for model files
+- **PersistentVolumeClaim**: 30Gi storage for model files (adjust based on models you enable)
 - **Deployment**:
   - **Init Container**: Downloads/copies model files to persistent volume
   - **Main Container**: Runs the semantic router service
@@ -25,15 +28,24 @@ The deployment consists of:
 
 ### Standard Kubernetes Deployment
 
+First-time apply (creates PVC via storage overlay):
+
 ```bash
-kubectl apply -k deploy/kubernetes/
+kubectl apply -k deploy/kubernetes/overlays/storage
+kubectl apply -k deploy/kubernetes/overlays/core   # or overlays/llm-katan
 
 # Check deployment status
-kubectl get pods -l app=semantic-router -n semantic-router
-kubectl get services -l app=semantic-router -n semantic-router
+kubectl get pods -l app=semantic-router -n vllm-semantic-router-system
+kubectl get services -l app=semantic-router -n vllm-semantic-router-system
 
 # View logs
-kubectl logs -l app=semantic-router -n semantic-router -f
+kubectl logs -l app=semantic-router -n vllm-semantic-router-system -f
+```
+
+Day-2 updates (do not touch PVC):
+
+```bash
+kubectl apply -k deploy/kubernetes/overlays/core   # or overlays/llm-katan
 ```
 
 ### Kind (Kubernetes in Docker) Deployment
@@ -83,23 +95,27 @@ kubectl wait --for=condition=Ready nodes --all --timeout=300s
 **Step 2: Deploy the application**
 
 ```bash
+# First-time storage (PVC)
+kubectl apply -k deploy/kubernetes/overlays/storage
+
+# Deploy app
 kubectl apply -k deploy/kubernetes/
 
 # Wait for deployment to be ready
-kubectl wait --for=condition=Available deployment/semantic-router -n semantic-router --timeout=600s
+kubectl wait --for=condition=Available deployment/semantic-router -n vllm-semantic-router-system --timeout=600s
 ```
 
 **Step 3: Check deployment status**
 
 ```bash
 # Check pods
-kubectl get pods -n semantic-router -o wide
+kubectl get pods -n vllm-semantic-router-system -o wide
 
 # Check services
-kubectl get services -n semantic-router
+kubectl get services -n vllm-semantic-router-system
 
 # View logs
-kubectl logs -l app=semantic-router -n semantic-router -f
+kubectl logs -l app=semantic-router -n vllm-semantic-router-system -f
 ```
 
 #### Resource Requirements for Kind
@@ -137,13 +153,13 @@ Or using kubectl directly:
 
 ```bash
 # Access Classification API (HTTP REST)
-kubectl port-forward -n semantic-router svc/semantic-router 8080:8080
+kubectl port-forward -n vllm-semantic-router-system svc/semantic-router 8080:8080
 
 # Access gRPC API
-kubectl port-forward -n semantic-router svc/semantic-router 50051:50051
+kubectl port-forward -n vllm-semantic-router-system svc/semantic-router 50051:50051
 
 # Access metrics
-kubectl port-forward -n semantic-router svc/semantic-router-metrics 9190:9190
+kubectl port-forward -n vllm-semantic-router-system svc/semantic-router-metrics 9190:9190
 ```
 
 #### Testing the Deployment
@@ -195,6 +211,11 @@ kubectl delete -k deploy/kubernetes/
 kind delete cluster --name semantic-router-cluster
 ```
 
+## Notes on dependencies
+
+- Gateway API Inference Extension CRDs are required only when using the Envoy AI Gateway integration in `deploy/kubernetes/ai-gateway/`. Follow the installation steps in `website/docs/installation/kubernetes.md` if you plan to use the gateway path.
+- The core kustomize deployment in this folder does not install Envoy Gateway or AI Gateway; those are optional components documented separately.
+
 ## Make Commands Reference
 
 The project provides comprehensive make targets for managing kind clusters and deployments:
@@ -290,9 +311,14 @@ kubectl logs -n semantic-router -l app=semantic-router -c model-downloader
 # Check resource usage
 kubectl top pods -n semantic-router
 
-# Adjust resource limits in deployment.yaml if needed
+# Adjust resource limits in base/deployment.yaml if needed
 ```
 
+### Storage sizing
+
+- The default PVC is 30Gi. If the enabled models are small, you can reduce it; otherwise reserve at least 2–3x the total model size.
+- If your cluster's default StorageClass isn't named `standard`, change `storageClassName` in `pvc.yaml` accordingly or remove the field to use the default class.
+
 ### Resource Optimization
 
 For different environments, you can adjust resource requirements:
@@ -301,22 +327,49 @@ For different environments, you can adjust resource requirements:
 - **Testing**: 4Gi memory, 1 CPU
 - **Production**: 8Gi+ memory, 2+ CPU
 
-Edit the `resources` section in `deployment.yaml` accordingly.
+Edit the `resources` section in `base/deployment.yaml` accordingly.
 
 ## Files Overview
 
 ### Kubernetes Manifests (`deploy/kubernetes/`)
 
-- `deployment.yaml` - Main application deployment with optimized resource settings
-- `service.yaml` - Services for gRPC, HTTP API, and metrics
-- `pvc.yaml` - Persistent volume claim for model storage
-- `namespace.yaml` - Dedicated namespace for the application
-- `config.yaml` - Application configuration
-- `tools_db.json` - Tools database for semantic routing
-- `kustomization.yaml` - Kustomize configuration for easy deployment
+- `base/` - Shared resources (Namespace, Service, ConfigMap, Deployment)
+  - `namespace.yaml` - Dedicated namespace for the application
+  - `service.yaml` - gRPC, HTTP API, and metrics services
+  - `deployment.yaml` - App deployment (init downloads by default; imagePullPolicy IfNotPresent)
+  - `config.yaml` - Application configuration (defaults to qwen3 @ 127.0.0.1:8002)
+  - `tools_db.json` - Tools database for semantic routing
+  - `pv.yaml` - OPTIONAL hostPath PV for local models (edit path as needed)
+- `overlays/core/` - Core deployment (no llm-katan), references `base/`
+- `overlays/llm-katan/` - Adds llm-katan sidecar via local patch (no parent file references)
+- `overlays/storage/` - PVC only (self-contained `namespace.yaml` + `pvc.yaml`), run once to create storage
+- `kustomization.yaml` - Root entry (defaults to `overlays/core`)
 
 ### Development Tools
 
+## Choose a mode: core or llm-katan
+
+- Core mode (default root points here):
+
+  ```bash
+  kubectl apply -k deploy/kubernetes
+  # or explicitly
+  kubectl apply -k deploy/kubernetes/overlays/core
+  ```
+
+- llm-katan mode:
+
+  ```bash
+  kubectl apply -k deploy/kubernetes/overlays/llm-katan
+  ```
+
+Notes for llm-katan:
+
+Notes for llm-katan:
+
+- The init container will attempt to download `Qwen/Qwen3-0.6B` into `/app/models/Qwen/Qwen3-0.6B` and the embedding model `sentence-transformers/all-MiniLM-L12-v2` into `/app/models/all-MiniLM-L12-v2`. In restricted networks, these downloads may fail—pre-populate the PV or point the init script to your internal artifact store as needed.
+- The default Kubernetes `config.yaml` has been aligned to use `qwen3` and endpoint `127.0.0.1:8002`.
+
 - `tools/kind/kind-config.yaml` - Kind cluster configuration for local development
 - `tools/make/kube.mk` - Make targets for Kubernetes operations
 - `Makefile` - Root makefile including all make targets
diff --git a/deploy/kubernetes/ai-gateway/inference-pool/inference-pool.yaml b/deploy/kubernetes/ai-gateway/inference-pool/inference-pool.yaml
index 64afc6f93..7b52e07b1 100644
--- a/deploy/kubernetes/ai-gateway/inference-pool/inference-pool.yaml
+++ b/deploy/kubernetes/ai-gateway/inference-pool/inference-pool.yaml
@@ -11,7 +11,7 @@ spec:
     - number: 50051
   selector:
     matchLabels:
-      app: vllm-semantic-router
+      app: semantic-router
   endpointPickerRef:
     name: semantic-router
     port:
diff --git a/deploy/kubernetes/base/config.yaml b/deploy/kubernetes/base/config.yaml
new file mode 100644
index 000000000..5f5159a3e
--- /dev/null
+++ b/deploy/kubernetes/base/config.yaml
@@ -0,0 +1,169 @@
+bert_model:
+	model_id: models/all-MiniLM-L12-v2
+	threshold: 0.6
+	use_cpu: true
+
+semantic_cache:
+	enabled: true
+	backend_type: "memory" # Options: "memory" or "milvus"
+	similarity_threshold: 0.8
+	max_entries: 1000 # Only applies to memory backend
+	ttl_seconds: 3600
+	eviction_policy: "fifo"
+
+tools:
+	enabled: true
+	top_k: 3
+	similarity_threshold: 0.2
+	tools_db_path: "config/tools_db.json"
+	fallback_to_empty: true
+
+prompt_guard:
+	enabled: true
+	use_modernbert: true
+	model_id: "models/jailbreak_classifier_modernbert-base_model"
+	threshold: 0.7
+	use_cpu: true
+	jailbreak_mapping_path: "models/jailbreak_classifier_modernbert-base_model/jailbreak_type_mapping.json"
+
+# vLLM Endpoints Configuration
+# IMPORTANT: 'address' field must be a valid IP address (IPv4 or IPv6)
+# Supported formats: 127.0.0.1, 192.168.1.1, ::1, 2001:db8::1
+# NOT supported: domain names (example.com), protocol prefixes (http://), paths (/api), ports in address (use 'port' field)
+vllm_endpoints:
+	- name: "endpoint1"
+		address: "127.0.0.1" # llm-katan sidecar or local backend
+		port: 8002
+		weight: 1
+
+model_config:
+	"qwen3":
+		reasoning_family: "qwen3" # Match docker-compose default model name
+		preferred_endpoints: ["endpoint1"]
+		pii_policy:
+			allow_by_default: true
+
+# Classifier configuration
+classifier:
+	category_model:
+		model_id: "models/category_classifier_modernbert-base_model"
+		use_modernbert: true
+		threshold: 0.6
+		use_cpu: true
+		category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"
+	pii_model:
+		model_id: "models/pii_classifier_modernbert-base_presidio_token_model"
+		use_modernbert: true
+		threshold: 0.7
+		use_cpu: true
+		pii_mapping_path: "models/pii_classifier_modernbert-base_presidio_token_model/pii_type_mapping.json"
+
+# Categories with new use_reasoning field structure
+categories:
+	- name: business
+		model_scores:
+			- model: qwen3
+				score: 0.7
+				use_reasoning: false # Business performs better without reasoning
+	- name: law
+		model_scores:
+			- model: qwen3
+				score: 0.4
+				use_reasoning: false
+	- name: psychology
+		model_scores:
+			- model: qwen3
+				score: 0.6
+				use_reasoning: false
+	- name: biology
+		model_scores:
+			- model: qwen3
+				score: 0.9
+				use_reasoning: false
+	- name: chemistry
+		model_scores:
+			- model: qwen3
+				score: 0.6
+				use_reasoning: true # Enable reasoning for complex chemistry
+	- name: history
+		model_scores:
+			- model: qwen3
+				score: 0.7
+				use_reasoning: false
+	- name: other
+		model_scores:
+			- model: qwen3
+				score: 0.7
+				use_reasoning: false
+	- name: health
+		model_scores:
+			- model: qwen3
+				score: 0.5
+				use_reasoning: false
+	- name: economics
+		model_scores:
+			- model: qwen3
+				score: 1.0
+				use_reasoning: false
+	- name: math
+		model_scores:
+			- model: qwen3
+				score: 1.0
+				use_reasoning: true # Enable reasoning for complex math
+	- name: physics
+		model_scores:
+			- model: qwen3
+				score: 0.7
+				use_reasoning: true # Enable reasoning for physics
+	- name: computer science
+		model_scores:
+			- model: qwen3
+				score: 0.6
+				use_reasoning: false
+	- name: philosophy
+		model_scores:
+			- model: qwen3
+				score: 0.5
+				use_reasoning: false
+	- name: engineering
+		model_scores:
+			- model: qwen3
+				score: 0.7
+				use_reasoning: false
+
+default_model: qwen3
+
+# Reasoning family configurations
+reasoning_families:
+	deepseek:
+		type: "chat_template_kwargs"
+		parameter: "thinking"
+
+	qwen3:
+		type: "chat_template_kwargs"
+		parameter: "enable_thinking"
+
+	gpt-oss:
+		type: "reasoning_effort"
+		parameter: "reasoning_effort"
+	gpt:
+		type: "reasoning_effort"
+		parameter: "reasoning_effort"
+
+# Global default reasoning effort level
+default_reasoning_effort: high
+
+# API Configuration
+api:
+	batch_classification:
+		max_batch_size: 100
+		concurrency_threshold: 5
+		max_concurrency: 8
+		metrics:
+			enabled: true
+			detailed_goroutine_tracking: true
+			high_resolution_timing: false
+			sample_rate: 1.0
+			duration_buckets:
+				[0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30]
+			size_buckets: [1, 2, 5, 10, 20, 50, 100, 200]
diff --git a/deploy/kubernetes/base/deployment.yaml b/deploy/kubernetes/base/deployment.yaml
new file mode 100644
index 000000000..5baecc953
--- /dev/null
+++ b/deploy/kubernetes/base/deployment.yaml
@@ -0,0 +1,144 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: semantic-router
+  namespace: vllm-semantic-router-system
+  labels:
+    app: semantic-router
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: semantic-router
+  template:
+    metadata:
+      labels:
+        app: semantic-router
+    spec:
+      initContainers:
+        - name: model-downloader
+          image: python:3.11-slim
+          securityContext:
+            runAsNonRoot: false
+            allowPrivilegeEscalation: false
+          command: ["/bin/bash", "-c"]
+          args:
+            - |
+              set -e
+              echo "Installing Hugging Face CLI..."
+              pip install --no-cache-dir huggingface_hub[cli]
+
+              echo "Downloading models to persistent volume..."
+              cd /app/models
+
+              # Download category classifier model
+              if [ ! -d "category_classifier_modernbert-base_model" ]; then
+                echo "Downloading category classifier model..."
+                huggingface-cli download LLM-Semantic-Router/category_classifier_modernbert-base_model --local-dir category_classifier_modernbert-base_model
+              else
+                echo "Category classifier model already exists, skipping..."
+              fi
+
+              # Download PII classifier model
+              if [ ! -d "pii_classifier_modernbert-base_model" ]; then
+                echo "Downloading PII classifier model..."
+                huggingface-cli download LLM-Semantic-Router/pii_classifier_modernbert-base_model --local-dir pii_classifier_modernbert-base_model
+              else
+                echo "PII classifier model already exists, skipping..."
+              fi
+
+              # Download jailbreak classifier model
+              if [ ! -d "jailbreak_classifier_modernbert-base_model" ]; then
+                echo "Downloading jailbreak classifier model..."
+                huggingface-cli download LLM-Semantic-Router/jailbreak_classifier_modernbert-base_model --local-dir jailbreak_classifier_modernbert-base_model
+              else
+                echo "Jailbreak classifier model already exists, skipping..."
+              fi
+
+              # Download PII token classifier model
+              if [ ! -d "pii_classifier_modernbert-base_presidio_token_model" ]; then
+                echo "Downloading PII token classifier model..."
+                huggingface-cli download LLM-Semantic-Router/pii_classifier_modernbert-base_presidio_token_model --local-dir pii_classifier_modernbert-base_presidio_token_model
+              else
+                echo "PII token classifier model already exists, skipping..."
+              fi
+
+              # Download embedding model all-MiniLM-L12-v2
+              if [ ! -d "all-MiniLM-L12-v2" ]; then
+                echo "Downloading all-MiniLM-L12-v2 embedding model..."
+                huggingface-cli download sentence-transformers/all-MiniLM-L12-v2 --local-dir all-MiniLM-L12-v2
+              else
+                echo "all-MiniLM-L12-v2 already exists, skipping..."
+              fi
+
+
+              echo "Model setup complete."
+              ls -la /app/models/
+          env:
+            - name: HF_HUB_CACHE
+              value: /tmp/hf_cache
+          resources:
+            requests:
+              memory: "512Mi"
+              cpu: "250m"
+            limits:
+              memory: "1Gi"
+              cpu: "500m"
+          volumeMounts:
+            - name: models-volume
+              mountPath: /app/models
+      containers:
+        - name: semantic-router
+          image: ghcr.io/vllm-project/semantic-router/extproc:latest
+          imagePullPolicy: IfNotPresent
+          args: ["--secure=true"]
+          securityContext:
+            runAsNonRoot: false
+            allowPrivilegeEscalation: false
+          ports:
+            - containerPort: 50051
+              name: grpc
+              protocol: TCP
+            - containerPort: 9190
+              name: metrics
+              protocol: TCP
+            - containerPort: 8080
+              name: classify-api
+              protocol: TCP
+          env:
+            - name: LD_LIBRARY_PATH
+              value: "/app/lib"
+          volumeMounts:
+            - name: config-volume
+              mountPath: /app/config
+              readOnly: true
+            - name: models-volume
+              mountPath: /app/models
+          livenessProbe:
+            tcpSocket:
+              port: 50051
+            initialDelaySeconds: 60
+            periodSeconds: 30
+            timeoutSeconds: 10
+            failureThreshold: 3
+          readinessProbe:
+            tcpSocket:
+              port: 50051
+            initialDelaySeconds: 90
+            periodSeconds: 30
+            timeoutSeconds: 10
+            failureThreshold: 3
+          resources:
+            requests:
+              memory: "3Gi"
+              cpu: "1"
+            limits:
+              memory: "6Gi"
+              cpu: "2"
+      volumes:
+        - name: config-volume
+          configMap:
+            name: semantic-router-config
+        - name: models-volume
+          persistentVolumeClaim:
+            claimName: semantic-router-models
diff --git a/deploy/kubernetes/base/kustomization.yaml b/deploy/kubernetes/base/kustomization.yaml
new file mode 100644
index 000000000..eeb933939
--- /dev/null
+++ b/deploy/kubernetes/base/kustomization.yaml
@@ -0,0 +1,15 @@
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+
+resources:
+  - ./namespace.yaml
+  - ./service.yaml
+  - ./deployment.yaml
+
+configMapGenerator:
+  - name: semantic-router-config
+    files:
+      - ./config.yaml
+      - ./tools_db.json
+
+namespace: vllm-semantic-router-system
diff --git a/deploy/kubernetes/namespace.yaml b/deploy/kubernetes/base/namespace.yaml
similarity index 100%
rename from deploy/kubernetes/namespace.yaml
rename to deploy/kubernetes/base/namespace.yaml
diff --git a/deploy/kubernetes/base/pv.yaml b/deploy/kubernetes/base/pv.yaml
new file mode 100644
index 000000000..7ea00f491
--- /dev/null
+++ b/deploy/kubernetes/base/pv.yaml
@@ -0,0 +1,16 @@
+apiVersion: v1
+kind: PersistentVolume
+metadata:
+  name: semantic-router-models-pv
+  labels:
+    app: semantic-router
+spec:
+  capacity:
+    storage: 50Gi
+  accessModes:
+    - ReadWriteOnce
+  storageClassName: standard
+  persistentVolumeReclaimPolicy: Retain
+  hostPath:
+    path: /tmp/hostpath-provisioner/models
+    type: DirectoryOrCreate
diff --git a/deploy/kubernetes/service.yaml b/deploy/kubernetes/base/service.yaml
similarity index 64%
rename from deploy/kubernetes/service.yaml
rename to deploy/kubernetes/base/service.yaml
index 5d674a6fd..5d2ed1b61 100644
--- a/deploy/kubernetes/service.yaml
+++ b/deploy/kubernetes/base/service.yaml
@@ -8,14 +8,14 @@ metadata:
 spec:
   type: ClusterIP
   ports:
-  - port: 50051
-    targetPort: grpc
-    protocol: TCP
-    name: grpc
-  - port: 8080
-    targetPort: 8080
-    protocol: TCP
-    name: classify-api
+    - port: 50051
+      targetPort: grpc
+      protocol: TCP
+      name: grpc
+    - port: 8080
+      targetPort: 8080
+      protocol: TCP
+      name: classify-api
   selector:
     app: semantic-router
 ---
@@ -30,9 +30,9 @@ metadata:
 spec:
   type: ClusterIP
   ports:
-  - port: 9190
-    targetPort: metrics
-    protocol: TCP
-    name: metrics
+    - port: 9190
+      targetPort: metrics
+      protocol: TCP
+      name: metrics
   selector:
     app: semantic-router
diff --git a/deploy/kubernetes/base/tools_db.json b/deploy/kubernetes/base/tools_db.json
new file mode 100644
index 000000000..4f62f26e7
--- /dev/null
+++ b/deploy/kubernetes/base/tools_db.json
@@ -0,0 +1,142 @@
+[
+	{
+		"tool": {
+			"type": "function",
+			"function": {
+				"name": "get_weather",
+				"description": "Get current weather information for a location",
+				"parameters": {
+					"type": "object",
+					"properties": {
+						"location": {
+							"type": "string",
+							"description": "The city and state, e.g. San Francisco, CA"
+						},
+						"unit": {
+							"type": "string",
+							"enum": ["celsius", "fahrenheit"],
+							"description": "Temperature unit"
+						}
+					},
+					"required": ["location"]
+				}
+			}
+		},
+		"description": "Get current weather information, temperature, conditions, forecast for any location, city, or place. Check weather today, now, current conditions, temperature, rain, sun, cloudy, hot, cold, storm, snow",
+		"category": "weather",
+		"tags": ["weather", "temperature", "forecast", "climate"]
+	},
+	{
+		"tool": {
+			"type": "function",
+			"function": {
+				"name": "search_web",
+				"description": "Search the web for information",
+				"parameters": {
+					"type": "object",
+					"properties": {
+						"query": {
+							"type": "string",
+							"description": "The search query"
+						},
+						"num_results": {
+							"type": "integer",
+							"description": "Number of results to return",
+							"default": 5
+						}
+					},
+					"required": ["query"]
+				}
+			}
+		},
+		"description": "Search the internet, web search, find information online, browse web content, lookup, research, google, find answers, discover, investigate",
+		"category": "search",
+		"tags": ["search", "web", "internet", "information", "browse"]
+	},
+	{
+		"tool": {
+			"type": "function",
+			"function": {
+				"name": "calculate",
+				"description": "Perform mathematical calculations",
+				"parameters": {
+					"type": "object",
+					"properties": {
+						"expression": {
+							"type": "string",
+							"description": "Mathematical expression to evaluate"
+						}
+					},
+					"required": ["expression"]
+				}
+			}
+		},
+		"description": "Calculate mathematical expressions, solve math problems, arithmetic operations, compute numbers, addition, subtraction, multiplication, division, equations, formula",
+		"category": "math",
+		"tags": ["math", "calculation", "arithmetic", "compute", "numbers"]
+	},
+	{
+		"tool": {
+			"type": "function",
+			"function": {
+				"name": "send_email",
+				"description": "Send an email message",
+				"parameters": {
+					"type": "object",
+					"properties": {
+						"to": {
+							"type": "string",
+							"description": "Recipient email address"
+						},
+						"subject": {
+							"type": "string",
+							"description": "Email subject"
+						},
+						"body": {
+							"type": "string",
+							"description": "Email body content"
+						}
+					},
+					"required": ["to", "subject", "body"]
+				}
+			}
+		},
+		"description": "Send email messages, email communication, contact people via email, mail, message, correspondence, notify, inform",
+		"category": "communication",
+		"tags": ["email", "send", "communication", "message", "contact"]
+	},
+	{
+		"tool": {
+			"type": "function",
+			"function": {
+				"name": "create_calendar_event",
+				"description": "Create a new calendar event or appointment",
+				"parameters": {
+					"type": "object",
+					"properties": {
+						"title": {
+							"type": "string",
+							"description": "Event title"
+						},
+						"date": {
+							"type": "string",
+							"description": "Event date in YYYY-MM-DD format"
+						},
+						"time": {
+							"type": "string",
+							"description": "Event time in HH:MM format"
+						},
+						"duration": {
+							"type": "integer",
+							"description": "Duration in minutes"
+						}
+					},
+					"required": ["title", "date", "time"]
+				}
+			}
+		},
+		"description": "Schedule meetings, create calendar events, set appointments, manage calendar, book time, plan meeting, organize schedule, reminder, agenda",
+		"category": "productivity",
+		"tags": ["calendar", "event", "meeting", "appointment", "schedule"]
+	}
+] 
diff --git a/deploy/kubernetes/config.yaml b/deploy/kubernetes/config.yaml
deleted file mode 100644
index 5bc40cbbe..000000000
--- a/deploy/kubernetes/config.yaml
+++ /dev/null
@@ -1,168 +0,0 @@
-bert_model:
-  model_id: sentence-transformers/all-MiniLM-L12-v2
-  threshold: 0.6
-  use_cpu: true
-
-semantic_cache:
-  enabled: true
-  backend_type: "memory"  # Options: "memory" or "milvus"
-  similarity_threshold: 0.8
-  max_entries: 1000  # Only applies to memory backend
-  ttl_seconds: 3600
-  eviction_policy: "fifo"  
-
-tools:
-  enabled: true
-  top_k: 3
-  similarity_threshold: 0.2
-  tools_db_path: "config/tools_db.json"
-  fallback_to_empty: true
-
-prompt_guard:
-  enabled: true
-  use_modernbert: true
-  model_id: "models/jailbreak_classifier_modernbert-base_model"
-  threshold: 0.7
-  use_cpu: true
-  jailbreak_mapping_path: "models/jailbreak_classifier_modernbert-base_model/jailbreak_type_mapping.json"
-
-# vLLM Endpoints Configuration
-# IMPORTANT: 'address' field must be a valid IP address (IPv4 or IPv6)
-# Supported formats: 127.0.0.1, 192.168.1.1, ::1, 2001:db8::1
-# NOT supported: domain names (example.com), protocol prefixes (http://), paths (/api), ports in address (use 'port' field)
-vllm_endpoints:
-  - name: "endpoint1"
-    address: "127.0.0.1"  # IPv4 address - REQUIRED format
-    port: 8000
-    weight: 1
-
-model_config:
-  "openai/gpt-oss-20b":
-    reasoning_family: "gpt-oss"  # This model uses GPT-OSS reasoning syntax
-    preferred_endpoints: ["endpoint1"]
-    pii_policy:
-      allow_by_default: true
-
-# Classifier configuration
-classifier:
-  category_model:
-    model_id: "models/category_classifier_modernbert-base_model"
-    use_modernbert: true
-    threshold: 0.6
-    use_cpu: true
-    category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"
-  pii_model:
-    model_id: "models/pii_classifier_modernbert-base_presidio_token_model"
-    use_modernbert: true
-    threshold: 0.7
-    use_cpu: true
-    pii_mapping_path: "models/pii_classifier_modernbert-base_presidio_token_model/pii_type_mapping.json"
-
-# Categories with new use_reasoning field structure
-categories:
-  - name: business
-    model_scores:
-      - model: openai/gpt-oss-20b
-        score: 0.7
-        use_reasoning: false  # Business performs better without reasoning
-  - name: law
-    model_scores:
-      - model: openai/gpt-oss-20b
-        score: 0.4
-        use_reasoning: false
-  - name: psychology
-    model_scores:
-      - model: openai/gpt-oss-20b
-        score: 0.6
-        use_reasoning: false
-  - name: biology
-    model_scores:
-      - model: openai/gpt-oss-20b
-        score: 0.9
-        use_reasoning: false
-  - name: chemistry
-    model_scores:
-      - model: openai/gpt-oss-20b
-        score: 0.6
-        use_reasoning: true  # Enable reasoning for complex chemistry
-  - name: history
-    model_scores:
-      - model: openai/gpt-oss-20b
-        score: 0.7
-        use_reasoning: false
-  - name: other
-    model_scores:
-      - model: openai/gpt-oss-20b
-        score: 0.7
-        use_reasoning: false
-  - name: health
-    model_scores:
-      - model: openai/gpt-oss-20b
-        score: 0.5
-        use_reasoning: false
-  - name: economics
-    model_scores:
-      - model: openai/gpt-oss-20b
-        score: 1.0
-        use_reasoning: false
-  - name: math
-    model_scores:
-      - model: openai/gpt-oss-20b
-        score: 1.0
-        use_reasoning: true  # Enable reasoning for complex math
-  - name: physics
-    model_scores:
-      - model: openai/gpt-oss-20b
-        score: 0.7
-        use_reasoning: true  # Enable reasoning for physics
-  - name: computer science
-    model_scores:
-      - model: openai/gpt-oss-20b
-        score: 0.6
-        use_reasoning: false
-  - name: philosophy
-    model_scores:
-      - model: openai/gpt-oss-20b
-        score: 0.5
-        use_reasoning: false
-  - name: engineering
-    model_scores:
-      - model: openai/gpt-oss-20b
-        score: 0.7
-        use_reasoning: false
-
-default_model: openai/gpt-oss-20b
-
-# Reasoning family configurations
-reasoning_families:
-  deepseek:
-    type: "chat_template_kwargs"
-    parameter: "thinking"
-
-  qwen3:
-    type: "chat_template_kwargs"
-    parameter: "enable_thinking"
-
-  gpt-oss:
-    type: "reasoning_effort"
-    parameter: "reasoning_effort"
-  gpt:
-    type: "reasoning_effort"
-    parameter: "reasoning_effort"
-
-# Global default reasoning effort level
-default_reasoning_effort: high
-
-# API Configuration
-api:
-  batch_classification:
-    max_batch_size: 100
-    concurrency_threshold: 5
-    max_concurrency: 8
-    metrics:
-      enabled: true
-      detailed_goroutine_tracking: true
-      high_resolution_timing: false
-      sample_rate: 1.0
-      duration_buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30]
-      size_buckets: [1, 2, 5, 10, 20, 50, 100, 200]
diff --git a/deploy/kubernetes/deployment.yaml b/deploy/kubernetes/deployment.yaml
deleted file mode 100644
index ab7000f9a..000000000
--- a/deploy/kubernetes/deployment.yaml
+++ /dev/null
@@ -1,136 +0,0 @@
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: semantic-router
-  namespace: vllm-semantic-router-system
-  labels:
-    app: semantic-router
-spec:
-  replicas: 1
-  selector:
-    matchLabels:
-      app: semantic-router
-  template:
-    metadata:
-      labels:
-        app: semantic-router
-    spec:
-      initContainers:
-      - name: model-downloader
-        image: python:3.11-slim
-        securityContext:
-          runAsNonRoot: false
-          allowPrivilegeEscalation: false
-        command: ["/bin/bash", "-c"]
-        args:
-        - |
-          set -e
-          echo "Installing Hugging Face CLI..."
-          pip install --no-cache-dir huggingface_hub[cli]
-
-          echo "Downloading models to persistent volume..."
-          cd /app/models
-
-          # Download category classifier model
-          if [ ! -d "category_classifier_modernbert-base_model" ]; then
-            echo "Downloading category classifier model..."
-            huggingface-cli download LLM-Semantic-Router/category_classifier_modernbert-base_model --local-dir category_classifier_modernbert-base_model
-          else
-            echo "Category classifier model already exists, skipping..."
-          fi
-
-          # Download PII classifier model
-          if [ ! -d "pii_classifier_modernbert-base_model" ]; then
-            echo "Downloading PII classifier model..."
-            huggingface-cli download LLM-Semantic-Router/pii_classifier_modernbert-base_model --local-dir pii_classifier_modernbert-base_model
-          else
-            echo "PII classifier model already exists, skipping..."
-          fi
-
-          # Download jailbreak classifier model
-          if [ ! -d "jailbreak_classifier_modernbert-base_model" ]; then
-            echo "Downloading jailbreak classifier model..."
-            huggingface-cli download LLM-Semantic-Router/jailbreak_classifier_modernbert-base_model --local-dir jailbreak_classifier_modernbert-base_model
-          else
-            echo "Jailbreak classifier model already exists, skipping..."
-          fi
-
-          # Download PII token classifier model
-          if [ ! -d "pii_classifier_modernbert-base_presidio_token_model" ]; then
-            echo "Downloading PII token classifier model..."
-            huggingface-cli download LLM-Semantic-Router/pii_classifier_modernbert-base_presidio_token_model --local-dir pii_classifier_modernbert-base_presidio_token_model
-          else
-            echo "PII token classifier model already exists, skipping..."
-          fi
-
-          echo "All models downloaded successfully!"
-          ls -la /app/models/
-        env:
-        - name: HF_HUB_CACHE
-          value: /tmp/hf_cache
-        # Reduced resource requirements for init container
-        resources:
-          requests:
-            memory: "512Mi"
-            cpu: "250m"
-          limits:
-            memory: "1Gi"
-            cpu: "500m"
-        volumeMounts:
-        - name: models-volume
-          mountPath: /app/models
-      containers:
-      - name: semantic-router
-        image: ghcr.io/vllm-project/semantic-router/extproc:latest
-        args: ["--secure=true"]
-        securityContext:
-          runAsNonRoot: false
-          allowPrivilegeEscalation: false
-        ports:
-        - containerPort: 50051
-          name: grpc
-          protocol: TCP
-        - containerPort: 9190
-          name: metrics
-          protocol: TCP
-        - containerPort: 8080
-          name: classify-api
-          protocol: TCP
-        env:
-        - name: LD_LIBRARY_PATH
-          value: "/app/lib"
-        volumeMounts:
-        - name: config-volume
-          mountPath: /app/config
-          readOnly: true
-        - name: models-volume
-          mountPath: /app/models
-        livenessProbe:
-          tcpSocket:
-            port: 50051
-          initialDelaySeconds: 60
-          periodSeconds: 30
-          timeoutSeconds: 10
-          failureThreshold: 3
-        readinessProbe:
-          tcpSocket:
-            port: 50051
-          initialDelaySeconds: 90
-          periodSeconds: 30
-          timeoutSeconds: 10
-          failureThreshold: 3
-        # Significantly reduced resource requirements for kind cluster
-        resources:
-          requests:
-            memory: "3Gi"    # Reduced from 8Gi
-            cpu: "1"         # Reduced from 2
-          limits:
-            memory: "6Gi"    # Reduced from 12Gi
-            cpu: "2"         # Reduced from 4
-      volumes:
-      - name: config-volume
-        configMap:
-          name: semantic-router-config
-      - name: models-volume
-        persistentVolumeClaim:
-          claimName: semantic-router-models
diff --git a/deploy/kubernetes/kustomization.yaml b/deploy/kubernetes/kustomization.yaml
index 3eae4ac99..65b2ccae5 100644
--- a/deploy/kubernetes/kustomization.yaml
+++ b/deploy/kubernetes/kustomization.yaml
@@ -1,25 +1,6 @@
 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization
 
-metadata:
-  name: semantic-router
-
+# This root points to the 'core' overlay by default for clarity.
 resources:
-- namespace.yaml
-- pvc.yaml
-- deployment.yaml
-- service.yaml
-
-# Generate ConfigMap
-configMapGenerator:
-- name: semantic-router-config
-  files:
-  - config.yaml
-  - tools_db.json
-
-# Namespace for all resources
-namespace: vllm-semantic-router-system
-
-images:
-- name: ghcr.io/vllm-project/semantic-router/extproc
-  newTag: latest
+  - overlays/core
diff --git a/deploy/kubernetes/overlays/core/kustomization.yaml b/deploy/kubernetes/overlays/core/kustomization.yaml
new file mode 100644
index 000000000..774a422d0
--- /dev/null
+++ b/deploy/kubernetes/overlays/core/kustomization.yaml
@@ -0,0 +1,5 @@
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+
+resources:
+  - ../../base
diff --git a/deploy/kubernetes/overlays/llm-katan/kustomization.yaml b/deploy/kubernetes/overlays/llm-katan/kustomization.yaml
new file mode 100644
index 000000000..dacb15f6b
--- /dev/null
+++ b/deploy/kubernetes/overlays/llm-katan/kustomization.yaml
@@ -0,0 +1,11 @@
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+
+resources:
+  - ../../base
+
+patches:
+  - target:
+      kind: Deployment
+      name: semantic-router
+    path: patch-llm-katan.yaml
diff --git a/deploy/kubernetes/overlays/llm-katan/patch-llm-katan.yaml b/deploy/kubernetes/overlays/llm-katan/patch-llm-katan.yaml
new file mode 100644
index 000000000..6d149109f
--- /dev/null
+++ b/deploy/kubernetes/overlays/llm-katan/patch-llm-katan.yaml
@@ -0,0 +1,30 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: semantic-router
+spec:
+  template:
+    spec:
+      containers:
+        - name: semantic-router
+          imagePullPolicy: IfNotPresent
+        - name: llm-katan
+          image: ghcr.io/vllm-project/semantic-router/llm-katan:latest
+          imagePullPolicy: IfNotPresent
+          args:
+            - llm-katan
+            - --model
+            - /app/models/Qwen/Qwen3-0.6B
+            - --served-model-name
+            - qwen3
+            - --host
+            - 0.0.0.0
+            - --port
+            - "8002"
+          ports:
+            - containerPort: 8002
+              name: katan
+              protocol: TCP
+          volumeMounts:
+            - name: models-volume
+              mountPath: /app/models
diff --git a/deploy/kubernetes/overlays/storage/kustomization.yaml b/deploy/kubernetes/overlays/storage/kustomization.yaml
new file mode 100644
index 000000000..349f724d9
--- /dev/null
+++ b/deploy/kubernetes/overlays/storage/kustomization.yaml
@@ -0,0 +1,6 @@
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+
+resources:
+  - ./namespace.yaml
+  - ./pvc.yaml
diff --git a/deploy/kubernetes/overlays/storage/namespace.yaml b/deploy/kubernetes/overlays/storage/namespace.yaml
new file mode 100644
index 000000000..0bdc316f5
--- /dev/null
+++ b/deploy/kubernetes/overlays/storage/namespace.yaml
@@ -0,0 +1,4 @@
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: vllm-semantic-router-system
diff --git a/deploy/kubernetes/pvc.yaml b/deploy/kubernetes/overlays/storage/pvc.yaml
similarity index 91%
rename from deploy/kubernetes/pvc.yaml
rename to deploy/kubernetes/overlays/storage/pvc.yaml
index 089293069..43b66eb95 100644
--- a/deploy/kubernetes/pvc.yaml
+++ b/deploy/kubernetes/overlays/storage/pvc.yaml
@@ -9,5 +9,5 @@ spec:
     - ReadWriteOnce
   resources:
     requests:
-      storage: 10Gi
+      storage: 30Gi
   storageClassName: standard
diff --git a/deploy/kubernetes/tools_db.json b/deploy/kubernetes/tools_db.json
deleted file mode 100644
index dccbf48aa..000000000
--- a/deploy/kubernetes/tools_db.json
+++ /dev/null
@@ -1,142 +0,0 @@
-[
-  {
-    "tool": {
-      "type": "function",
-      "function": {
-        "name": "get_weather",
-        "description": "Get current weather information for a location",
-        "parameters": {
-          "type": "object",
-          "properties": {
-            "location": {
-              "type": "string",
-              "description": "The city and state, e.g. San Francisco, CA"
-            },
-            "unit": {
-              "type": "string",
-              "enum": ["celsius", "fahrenheit"],
-              "description": "Temperature unit"
-            }
-          },
-          "required": ["location"]
-        }
-      }
-    },
-    "description": "Get current weather information, temperature, conditions, forecast for any location, city, or place. Check weather today, now, current conditions, temperature, rain, sun, cloudy, hot, cold, storm, snow",
-    "category": "weather",
-    "tags": ["weather", "temperature", "forecast", "climate"]
-  },
-  {
-    "tool": {
-      "type": "function",
-      "function": {
-        "name": "search_web",
-        "description": "Search the web for information",
-        "parameters": {
-          "type": "object",
-          "properties": {
-            "query": {
-              "type": "string",
-              "description": "The search query"
-            },
-            "num_results": {
-              "type": "integer",
-              "description": "Number of results to return",
-              "default": 5
-            }
-          },
-          "required": ["query"]
-        }
-      }
-    },
-    "description": "Search the internet, web search, find information online, browse web content, lookup, research, google, find answers, discover, investigate",
-    "category": "search",
-    "tags": ["search", "web", "internet", "information", "browse"]
-  },
-  {
-    "tool": {
-      "type": "function",
-      "function": {
-        "name": "calculate",
-        "description": "Perform mathematical calculations",
-        "parameters": {
-          "type": "object",
-          "properties": {
-            "expression": {
-              "type": "string",
-              "description": "Mathematical expression to evaluate"
-            }
-          },
-          "required": ["expression"]
-        }
-      }
-    },
-    "description": "Calculate mathematical expressions, solve math problems, arithmetic operations, compute numbers, addition, subtraction, multiplication, division, equations, formula",
-    "category": "math",
-    "tags": ["math", "calculation", "arithmetic", "compute", "numbers"]
-  },
-  {
-    "tool": {
-      "type": "function",
-      "function": {
-        "name": "send_email",
-        "description": "Send an email message",
-        "parameters": {
-          "type": "object",
-          "properties": {
-            "to": {
-              "type": "string",
-              "description": "Recipient email address"
-            },
-            "subject": {
-              "type": "string",
-              "description": "Email subject"
-            },
-            "body": {
-              "type": "string",
-              "description": "Email body content"
-            }
-          },
-          "required": ["to", "subject", "body"]
-        }
-      }
-    },
-    "description": "Send email messages, email communication, contact people via email, mail, message, correspondence, notify, inform",
-    "category": "communication",
-    "tags": ["email", "send", "communication", "message", "contact"]
-  },
-  {
-    "tool": {
-      "type": "function",
-      "function": {
-        "name": "create_calendar_event",
-        "description": "Create a new calendar event or appointment",
-        "parameters": {
-          "type": "object",
-          "properties": {
-            "title": {
-              "type": "string",
-              "description": "Event title"
-            },
-            "date": {
-              "type": "string",
-              "description": "Event date in YYYY-MM-DD format"
-            },
-            "time": {
-              "type": "string",
-              "description": "Event time in HH:MM format"
-            },
-            "duration": {
-              "type": "integer",
-              "description": "Duration in minutes"
-            }
-          },
-          "required": ["title", "date", "time"]
-        }
-      }
-    },
-    "description": "Schedule meetings, create calendar events, set appointments, manage calendar, book time, plan meeting, organize schedule, reminder, agenda",
-    "category": "productivity",
-    "tags": ["calendar", "event", "meeting", "appointment", "schedule"]
-  }
-] 
\ No newline at end of file
diff --git a/website/docs/installation/kubernetes.md b/website/docs/installation/kubernetes.md
index 80821ad9c..a679bd5d7 100644
--- a/website/docs/installation/kubernetes.md
+++ b/website/docs/installation/kubernetes.md
@@ -31,16 +31,73 @@ kind create cluster --name semantic-router-cluster --config tools/kind/kind-conf
 kubectl wait --for=condition=Ready nodes --all --timeout=300s
 ```
 
-**Note**: The kind configuration provides sufficient resources (8GB+ RAM, 4+ CPU cores) for running the semantic router and AI gateway components.
+Note: The kind configuration provides sufficient resources (8GB+ RAM, 4+ CPU cores).
 
 ## Step 2: Deploy vLLM Semantic Router
 
-Configure the semantic router by editing `deploy/kubernetes/config.yaml`. This file contains the vLLM configuration, including model config, endpoints, and policies.
+Edit `deploy/kubernetes/config.yaml` (models, endpoints, policies). Two overlays are provided:
 
-Deploy the semantic router service with all required components:
+- core (default): only the semantic-router
+  - Path: `deploy/kubernetes/overlays/core` (root `deploy/kubernetes/` points here by default)
+- llm-katan: semantic-router + an llm-katan sidecar listening on 8002 and serving model name `qwen3`
+  - Path: `deploy/kubernetes/overlays/llm-katan`
+
+### Repository layout (deploy/kubernetes/)
+
+```
+deploy/kubernetes/
+  base/
+    kustomization.yaml        # base kustomize: namespace, PVC, service, deployment
+    namespace.yaml            # Namespace for all resources
+    service.yaml              # Service exposing gRPC/metrics/HTTP ports
+    deployment.yaml           # Semantic Router Deployment (init downloads by default)
+    config.yaml               # Router config (mounted via ConfigMap)
+    tools_db.json             # Tools DB (mounted via ConfigMap)
+    pv.yaml                   # OPTIONAL: hostPath PV for local models (edit path as needed)
+  overlays/
+    core/
+      kustomization.yaml      # Uses only base
+    llm-katan/
+      kustomization.yaml      # Patches base to add llm-katan sidecar
+      patch-llm-katan.yaml    # Strategic-merge patch injecting sidecar
+    storage/
+      kustomization.yaml      # PVC only; run once to create storage, not for day-2 updates
+      namespace.yaml          # Local copy for self-contained apply
+      pvc.yaml                # PVC definition
+  kustomization.yaml          # Root points to overlays/core by default
+  README.md                   # Additional notes
+  namespace.yaml, pvc.yaml, service.yaml (top-level shortcuts kept for backward compat)
+```
+
+Notes:
+
+- Base downloads models on first run (initContainer).
+- In restricted networks, prefer local models via PV/PVC; see Network Tips for hostPath PV, mirrors, and image preload. Mount point is `/app/models`.
+
+First-time apply (creates PVC):
+
+```bash
+kubectl apply -k deploy/kubernetes/overlays/storage
+kubectl apply -k deploy/kubernetes/overlays/core        # or overlays/llm-katan
+```
+
+Day-2 updates (do not touch PVC):
 
 ```bash
-# Deploy semantic router using Kustomize
+kubectl apply -k deploy/kubernetes/overlays/core        # or overlays/llm-katan
+```
+
+Important:
+
+- `vllm_endpoints.address` must be an IP reachable inside the cluster (no scheme/path).
+- PVC default size is 30Gi; adjust to model footprint. StorageClass name may differ by cluster.
+- core downloads classifiers + `all-MiniLM-L12-v2`; llm-katan also prepares `Qwen/Qwen3-0.6B`.
+- Default config uses `qwen3@127.0.0.1:8002` (matches llm-katan); if using core, update endpoints accordingly.
+
+Deploy the semantic router service with all required components (core mode by default):
+
+````bash
+# Deploy semantic router (core mode)
 kubectl apply -k deploy/kubernetes/
 
 # Wait for deployment to be ready (this may take several minutes for model downloads)
@@ -48,8 +105,15 @@ kubectl wait --for=condition=Available deployment/semantic-router -n vllm-semant
 
 # Verify deployment status
 kubectl get pods -n vllm-semantic-router-system
+
+To run with the llm-katan overlay instead:
+
+```bash
+kubectl apply -k deploy/kubernetes/overlays/llm-katan
 ```
 
+Note: The llm-katan overlay no longer references parent files directly. It uses a local patch (`deploy/kubernetes/overlays/llm-katan/patch-llm-katan.yaml`) to inject the sidecar, avoiding kustomize parent-directory restrictions.
+
 ## Step 3: Install Envoy Gateway
 
 Install the core Envoy Gateway for traffic management:
@@ -63,7 +127,7 @@ helm upgrade -i eg oci://docker.io/envoyproxy/gateway-helm \
 
 # Wait for Envoy Gateway to be ready
 kubectl wait --timeout=300s -n envoy-gateway-system deployment/envoy-gateway --for=condition=Available
-```
+````
 
 ## Step 4: Install Envoy AI Gateway
 
@@ -135,26 +199,28 @@ Expected output should show the inference pool in `Accepted` state:
 ```yaml
 status:
   parent:
-  - conditions:
-    - lastTransitionTime: "2025-09-27T09:27:32Z"
-      message: 'InferencePool has been Accepted by controller ai-gateway-controller:
-        InferencePool reconciled successfully'
-      observedGeneration: 1
-      reason: Accepted
-      status: "True"
-      type: Accepted
-    - lastTransitionTime: "2025-09-27T09:27:32Z"
-      message: 'Reference resolution by controller ai-gateway-controller: All references
-        resolved successfully'
-      observedGeneration: 1
-      reason: ResolvedRefs
-      status: "True"
-      type: ResolvedRefs
-    parentRef:
-      group: gateway.networking.k8s.io
-      kind: Gateway
-      name: vllm-semantic-router
-      namespace: vllm-semantic-router-system
+    - conditions:
+        - lastTransitionTime: "2025-09-27T09:27:32Z"
+          message:
+            "InferencePool has been Accepted by controller ai-gateway-controller:
+            InferencePool reconciled successfully"
+          observedGeneration: 1
+          reason: Accepted
+          status: "True"
+          type: Accepted
+        - lastTransitionTime: "2025-09-27T09:27:32Z"
+          message:
+            "Reference resolution by controller ai-gateway-controller: All references
+            resolved successfully"
+          observedGeneration: 1
+          reason: ResolvedRefs
+          status: "True"
+          type: ResolvedRefs
+      parentRef:
+        group: gateway.networking.k8s.io
+        kind: Gateway
+        name: vllm-semantic-router
+        namespace: vllm-semantic-router-system
 ```
 
 ## Testing the Deployment
diff --git a/website/docs/troubleshooting/network-tips.md b/website/docs/troubleshooting/network-tips.md
index c4a29bcef..2e27f3e92 100644
--- a/website/docs/troubleshooting/network-tips.md
+++ b/website/docs/troubleshooting/network-tips.md
@@ -26,12 +26,12 @@ The router will download embedding models on first run unless you provide them l
 
 ### Option A — Use local models (no external network)
 
-1) Download the required model(s) with any reachable method (VPN/offline) into the repo’s `./models` folder. Example layout:
+1. Download the required model(s) with any reachable method (VPN/offline) into the repo’s `./models` folder. Example layout:
 
    - `models/all-MiniLM-L12-v2/`
    - `models/category_classifier_modernbert-base_model`
 
-2) In `config/config.yaml`, point to the local path. Example:
+2. In `config/config.yaml`, point to the local path. Example:
 
    ```yaml
    bert_model:
@@ -39,7 +39,7 @@ The router will download embedding models on first run unless you provide them l
      model_id: /app/models/all-MiniLM-L12-v2
    ```
 
-3) No extra env is required. `deploy/docker-compose/docker-compose.yml` already mounts `./models:/app/models:ro`.
+3. No extra env is required. `deploy/docker-compose/docker-compose.yml` already mounts `./models:/app/models:ro`.
 
 ### Option B — Use HF cache + mirror
 
@@ -53,7 +53,7 @@ services:
     environment:
       - HUGGINGFACE_HUB_CACHE=/root/.cache/huggingface
       - HF_HUB_ENABLE_HF_TRANSFER=1
-      - HF_ENDPOINT=https://hf-mirror.com  # example mirror endpoint (China)
+      - HF_ENDPOINT=https://hf-mirror.com # example mirror endpoint (China)
 ```
 
 Optional: pre-warm cache on the host (only if you have `huggingface_hub` installed):
@@ -70,7 +70,7 @@ PY
 
 When building `Dockerfile.extproc`, the Go stage may hang on `proxy.golang.org`. Create an override Dockerfile that enables mirrors without touching the original.
 
-1) Create `Dockerfile.extproc.cn` at repo root with this content:
+1. Create `Dockerfile.extproc.cn` at repo root with this content:
 
 ```Dockerfile
 # syntax=docker/dockerfile:1
@@ -118,7 +118,7 @@ RUN chmod +x /app/entrypoint.sh
 ENTRYPOINT ["/app/entrypoint.sh"]
 ```
 
-2) Point compose to the override Dockerfile by extending `docker-compose.override.yml`:
+2. Point compose to the override Dockerfile by extending `docker-compose.override.yml`:
 
 ```yaml
 services:
@@ -131,7 +131,7 @@ services:
 
 For the optional testing profile, create an override Dockerfile to configure pip mirrors.
 
-1) Create `tools/mock-vllm/Dockerfile.cn`:
+1. Create `tools/mock-vllm/Dockerfile.cn`:
 
 ```Dockerfile
 FROM python:3.11-slim
@@ -150,7 +150,7 @@ EXPOSE 8000
 CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
 ```
 
-2) Extend `docker-compose.override.yml` to use the override Dockerfile for `mock-vllm`:
+2. Extend `docker-compose.override.yml` to use the override Dockerfile for `mock-vllm`:
 
 ```yaml
 services:
@@ -180,10 +180,43 @@ Container runtimes on Kubernetes nodes do not automatically reuse the host Docke
 
 ### 5.1 Configure containerd or CRI mirrors
 
-- For clusters backed by containerd (Kind, k3s, kubeadm), edit `/etc/containerd/config.toml` or use Kind’s `containerdConfigPatches` to add regional mirror endpoints for registries such as `docker.io`, `ghcr.io`, or `quay.io`.
+- For clusters backed by containerd (Kind, k3s, kubeadm), edit `/etc/containerd/config.toml` or use Kind’s `containerdConfigPatches` to add regional mirror endpoints for registries such as `docker.io`, `ghcr.io`, or `registry.k8s.io`.
 - Restart containerd and kubelet after changes so the new mirrors take effect.
 - Avoid pointing mirrors to loopback proxies unless every node can reach that proxy address.
 
+Example `/etc/containerd/config.toml` mirrors (China):
+
+```toml
+[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
+  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
+    endpoint = [
+      "https://docker.m.daocloud.io",
+      "https://mirror.ccs.tencentyun.com",
+      "https://mirror.baidubce.com",
+      "https://docker.mirrors.ustc.edu.cn",
+      "https://hub-mirror.c.163.com"
+    ]
+  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."ghcr.io"]
+    endpoint = [
+      "https://ghcr.nju.edu.cn",
+      "https://ghcr.dockerproxy.com",
+      "https://ghcr.bj.bcebos.com"
+    ]
+  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.k8s.io"]
+    endpoint = [
+      "https://k8s.m.daocloud.io",
+      "https://mirror.ccs.tencentyun.com",
+      "https://registry.aliyuncs.com"
+    ]
+```
+
+Apply and restart:
+
+```bash
+sudo systemctl restart containerd
+sudo systemctl restart kubelet
+```
+
 ### 5.2 Preload or sideload images
 
 - Build required images locally, then push them into the cluster runtime. For Kind, run `kind load docker-image --name <cluster> <image:tag>`; for other clusters, use `crictl pull` or `ctr -n k8s.io images import` on each node.
@@ -203,13 +236,39 @@ Container runtimes on Kubernetes nodes do not automatically reuse the host Docke
 - Use `kubectl describe pod <name>` or `kubectl get events` to confirm pull errors disappear.
 - Check that services such as `semantic-router-metrics` now expose endpoints and respond via port-forward (`kubectl port-forward svc/<service> <local-port>:<service-port>`).
 
+### 5.6 Mount local models via PV/PVC (no external HF)
+
+When you already have models under `./models` locally, mount them into the Pod and skip downloads:
+
+1. Create a PV (optional; edit `deploy/kubernetes/base/pv.yaml` hostPath to your node path and apply it). If you use a dynamic StorageClass, you can skip the PV.
+
+2. Create the PVC once via the storage overlay:
+
+```bash
+kubectl apply -k deploy/kubernetes/overlays/storage
+```
+
+3. Copy your local models to the node path (hostPath example for kind):
+
+```bash
+docker cp ./models semantic-router-cluster-control-plane:/tmp/hostpath-provisioner/
+```
+
+4. Ensure the Deployment mounts the PVC at `/app/models` and set `imagePullPolicy: IfNotPresent` (already configured in `base/deployment.yaml`).
+
+5. If the PV is tied to a specific node path, pin the Pod to that node using `nodeSelector` or add tolerations if you untainted the control-plane node.
+
+This path avoids Hugging Face downloads and is the most reliable in restricted networks.
+
 ## 6. Troubleshooting
 
 - Go modules still time out:
+
   - Verify `GOPROXY` and `GOSUMDB` are present in the go-builder stage logs.
   - Try a clean build: `docker compose build --no-cache`.
 
 - HF models still download slowly:
+
   - Prefer Option A (local models).
   - Ensure the cache volume is mounted and `HF_ENDPOINT`/`HF_HUB_ENABLE_HF_TRANSFER` are set.