Skip to content

Commit bbbe3e6

Browse files
cooktheryanrootfs
andauthored
[Integration]: Add integration with Kserve functionality (#566)
* WIP: kserve functionality Signed-off-by: Ryan Cook <rcook@redhat.com> * fix of lint Signed-off-by: Ryan Cook <rcook@redhat.com> * removal of spellcheck errs Signed-off-by: Ryan Cook <rcook@redhat.com> * remove toleration Signed-off-by: Ryan Cook <rcook@redhat.com> * update based on success Signed-off-by: Ryan Cook <rcook@redhat.com> * directions including the usage of a new svc Signed-off-by: Ryan Cook <rcook@redhat.com> * working solution Signed-off-by: Ryan Cook <rcook@redhat.com> * linting due to missing line Signed-off-by: Ryan Cook <rcook@redhat.com> * line spacing fix Signed-off-by: Ryan Cook <rcook@redhat.com> * fix lint Signed-off-by: Ryan Cook <rcook@redhat.com> * more of a brief readme Signed-off-by: Ryan Cook <rcook@redhat.com> --------- Signed-off-by: Ryan Cook <rcook@redhat.com> Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
1 parent fb52631 commit bbbe3e6

17 files changed

+2198
-0
lines changed

deploy/kserve/README.md

Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
# Semantic Router Integration with OpenShift AI KServe
2+
3+
Deploy vLLM Semantic Router as an intelligent gateway for your OpenShift AI KServe InferenceServices.
4+
5+
> **Deployment Focus**: This guide is specifically for deploying semantic router on **OpenShift AI with KServe**.
6+
>
7+
> **Learn about features?** See links to feature documentation throughout this guide.
8+
9+
## Overview
10+
11+
The semantic router acts as an intelligent API gateway that provides:
12+
13+
- **Intelligent Model Selection**: Automatically routes requests to the best model based on semantic understanding
14+
- **PII Detection & Protection**: Blocks or redacts sensitive information before sending to models
15+
- **Prompt Guard**: Detects and blocks jailbreak attempts
16+
- **Semantic Caching**: Reduces latency and costs through intelligent response caching
17+
- **Category-Specific Prompts**: Injects domain-specific system prompts for better results
18+
- **Tools Auto-Selection**: Automatically selects relevant tools for function calling
19+
20+
## Prerequisites
21+
22+
Before deploying, ensure you have:
23+
24+
1. **OpenShift Cluster** with OpenShift AI (RHOAI) installed
25+
2. **KServe InferenceService** already deployed and running
26+
3. **OpenShift CLI (oc)** installed and logged in
27+
4. **Cluster admin or namespace admin** permissions
28+
29+
## Quick Deployment
30+
31+
Use the `deploy.sh` script for automated deployment. It handles validation, model downloads, and resource creation:
32+
33+
```bash
34+
./deploy.sh --namespace <namespace> --inferenceservice <name> --model <model>
35+
```
36+
37+
**Example:**
38+
39+
```bash
40+
./deploy.sh -n semantic -i granite32-8b -m granite32-8b
41+
```
42+
43+
The script validates prerequisites, creates a stable service for your predictor, downloads classification models (~2-3 min), and deploys all resources. Optional flags include `--embedding-model`, `--storage-class`, `--models-pvc-size`, and `--cache-pvc-size`. For manual step-by-step deployment, continue reading below.
44+
45+
## Manual Deployment
46+
47+
### Step 1: Verify InferenceService
48+
49+
Check that your InferenceService is deployed and ready:
50+
51+
```bash
52+
NAMESPACE=<your-namespace>
53+
INFERENCESERVICE_NAME=<your-inferenceservice-name>
54+
55+
# List InferenceServices
56+
oc get inferenceservice -n $NAMESPACE
57+
58+
# Create stable ClusterIP service for predictor
59+
cat <<EOF | oc apply -f - -n $NAMESPACE
60+
apiVersion: v1
61+
kind: Service
62+
metadata:
63+
name: ${INFERENCESERVICE_NAME}-predictor-stable
64+
spec:
65+
type: ClusterIP
66+
selector:
67+
serving.kserve.io/inferenceservice: ${INFERENCESERVICE_NAME}
68+
ports:
69+
- name: http
70+
port: 8080
71+
targetPort: 8080
72+
EOF
73+
74+
# Get the stable ClusterIP
75+
PREDICTOR_SERVICE_IP=$(oc get svc "${INFERENCESERVICE_NAME}-predictor-stable" -n $NAMESPACE -o jsonpath='{.spec.clusterIP}')
76+
echo "Predictor service ClusterIP: $PREDICTOR_SERVICE_IP"
77+
```
78+
79+
### Step 2: Configure Router Settings
80+
81+
Edit `configmap-router-config.yaml`:
82+
83+
1. Update `vllm_endpoints` with your predictor service ClusterIP
84+
2. Configure `model_config` with your model name and PII policies
85+
3. Update `categories` with model scores for routing
86+
4. Set `default_model` to your model name
87+
88+
Edit `configmap-envoy-config.yaml`:
89+
90+
1. Update `kserve_dynamic_cluster` address to: `<inferenceservice>-predictor.<namespace>.svc.cluster.local`
91+
92+
### Step 3: Deploy Resources
93+
94+
Apply manifests in order:
95+
96+
```bash
97+
NAMESPACE=<your-namespace>
98+
99+
# Deploy resources
100+
oc apply -f serviceaccount.yaml -n $NAMESPACE
101+
oc apply -f pvc.yaml -n $NAMESPACE
102+
oc apply -f configmap-router-config.yaml -n $NAMESPACE
103+
oc apply -f configmap-envoy-config.yaml -n $NAMESPACE
104+
oc apply -f peerauthentication.yaml -n $NAMESPACE
105+
oc apply -f deployment.yaml -n $NAMESPACE
106+
oc apply -f service.yaml -n $NAMESPACE
107+
oc apply -f route.yaml -n $NAMESPACE
108+
```
109+
110+
### Step 4: Wait for Ready
111+
112+
Monitor deployment progress:
113+
114+
```bash
115+
# Watch pod status
116+
oc get pods -l app=semantic-router -n $NAMESPACE -w
117+
118+
# Check logs
119+
oc logs -l app=semantic-router -c semantic-router -n $NAMESPACE -f
120+
```
121+
122+
The pod will download models (~2-3 minutes) then start serving traffic.
123+
124+
## Accessing Services
125+
126+
Get the route URL:
127+
128+
```bash
129+
ROUTER_URL=$(oc get route semantic-router-kserve -n $NAMESPACE -o jsonpath='{.spec.host}')
130+
echo "External URL: https://$ROUTER_URL"
131+
```
132+
133+
Test the deployment:
134+
135+
```bash
136+
# Test models endpoint
137+
curl -k "https://$ROUTER_URL/v1/models"
138+
139+
# Test chat completion
140+
curl -k "https://$ROUTER_URL/v1/chat/completions" \
141+
-H "Content-Type: application/json" \
142+
-d '{
143+
"model": "<your-model>",
144+
"messages": [{"role": "user", "content": "What is 2+2?"}],
145+
"max_tokens": 50
146+
}'
147+
```
148+
149+
Run validation tests:
150+
151+
```bash
152+
# Auto-detect configuration
153+
./test-semantic-routing.sh
154+
155+
# Or specify explicitly
156+
NAMESPACE=$NAMESPACE MODEL_NAME=<model> ./test-semantic-routing.sh
157+
```
158+
159+
## Monitoring
160+
161+
### Check Deployment Status
162+
163+
```bash
164+
# Check pods
165+
oc get pods -l app=semantic-router -n $NAMESPACE
166+
167+
# Check services
168+
oc get svc -n $NAMESPACE
169+
170+
# Check routes
171+
oc get routes -n $NAMESPACE
172+
```
173+
174+
### View Logs
175+
176+
```bash
177+
# Router logs
178+
oc logs -l app=semantic-router -c semantic-router -n $NAMESPACE -f
179+
180+
# Model download logs (init container)
181+
oc logs -l app=semantic-router -c model-downloader -n $NAMESPACE
182+
183+
# Envoy logs
184+
oc logs -l app=semantic-router -c envoy-proxy -n $NAMESPACE -f
185+
```
186+
187+
### Metrics
188+
189+
```bash
190+
# Port-forward metrics endpoint
191+
POD=$(oc get pods -l app=semantic-router -n $NAMESPACE -o jsonpath='{.items[0].metadata.name}')
192+
oc port-forward $POD 9190:9190 -n $NAMESPACE
193+
194+
# View metrics
195+
curl http://localhost:9190/metrics
196+
```
197+
198+
## Cleanup
199+
200+
Remove all deployed resources:
201+
202+
```bash
203+
NAMESPACE=<your-namespace>
204+
205+
oc delete route semantic-router-kserve -n $NAMESPACE
206+
oc delete service semantic-router-kserve -n $NAMESPACE
207+
oc delete deployment semantic-router-kserve -n $NAMESPACE
208+
oc delete configmap semantic-router-kserve-config semantic-router-envoy-kserve-config -n $NAMESPACE
209+
oc delete pvc semantic-router-models semantic-router-cache -n $NAMESPACE
210+
oc delete peerauthentication semantic-router-kserve-permissive -n $NAMESPACE
211+
oc delete serviceaccount semantic-router -n $NAMESPACE
212+
```
213+
214+
**Warning**: Deleting PVCs will remove downloaded models and cache data. To preserve data, skip PVC deletion.
215+
216+
## Troubleshooting
217+
218+
### Pod Not Starting
219+
220+
```bash
221+
# Check pod status and events
222+
oc get pods -l app=semantic-router -n $NAMESPACE
223+
oc describe pod -l app=semantic-router -n $NAMESPACE
224+
225+
# Check init container logs (model download)
226+
oc logs -l app=semantic-router -c model-downloader -n $NAMESPACE
227+
```
228+
229+
**Common causes:**
230+
231+
- Network issues downloading models
232+
- PVC not bound - check storage class
233+
- Insufficient memory - increase init container resources
234+
235+
### Router Container Crashing
236+
237+
```bash
238+
# Check router logs
239+
oc logs -l app=semantic-router -c semantic-router -n $NAMESPACE --previous
240+
```
241+
242+
**Common causes:**
243+
244+
- Configuration error - validate YAML syntax
245+
- Invalid IP address - use ClusterIP not DNS in `vllm_endpoints.address`
246+
- Missing models - verify init container completed
247+
248+
### Cannot Connect to InferenceService
249+
250+
```bash
251+
# Test from router pod
252+
POD=$(oc get pods -l app=semantic-router -n $NAMESPACE -o jsonpath='{.items[0].metadata.name}')
253+
oc exec $POD -c semantic-router -n $NAMESPACE -- \
254+
curl -v http://<inferenceservice>-predictor.$NAMESPACE.svc.cluster.local:8080/v1/models
255+
```
256+
257+
**Common causes:**
258+
259+
- InferenceService not ready - check `oc get inferenceservice -n $NAMESPACE`
260+
- Wrong DNS name - verify format: `<inferenceservice>-predictor.<namespace>.svc.cluster.local`
261+
- Network policy blocking traffic
262+
- mTLS mode mismatch - ensure PERMISSIVE mode in PeerAuthentication
263+
264+
## Configuration
265+
266+
For detailed configuration options, see the main project documentation:
267+
268+
- **Category Classification**: Train custom models at [Category Classifier Training](../../src/training/classifier_model_fine_tuning/)
269+
- **PII Detection**: Train custom models at [PII Detection Training](../../src/training/pii_model_fine_tuning/)
270+
- **Prompt Guard**: Train custom models at [Prompt Guard Training](../../src/training/prompt_guard_fine_tuning/)
271+
272+
## Related Documentation
273+
274+
### Within This Repository
275+
276+
- **[Category Classifier Training](../../src/training/classifier_model_fine_tuning/)** - Train custom category classification models
277+
- **[PII Detector Training](../../src/training/pii_model_fine_tuning/)** - Train custom PII detection models
278+
- **[Prompt Guard Training](../../src/training/prompt_guard_fine_tuning/)** - Train custom jailbreak detection models
279+
280+
### Other Deployment Options
281+
282+
- **[OpenShift Deployment](../openshift/)** - Deploy with standalone vLLM containers (not KServe)
283+
- *This directory* - OpenShift AI KServe deployment (you are here)
284+
285+
### External Resources
286+
287+
- **Main Project**: https://github.com/vllm-project/semantic-router
288+
- **Full Documentation**: https://vllm-semantic-router.com
289+
- **KServe Docs**: https://kserve.github.io/website/

0 commit comments

Comments
 (0)