Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
289 changes: 289 additions & 0 deletions deploy/kserve/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,289 @@
# Semantic Router Integration with OpenShift AI KServe

Deploy vLLM Semantic Router as an intelligent gateway for your OpenShift AI KServe InferenceServices.

> **Deployment Focus**: This guide is specifically for deploying semantic router on **OpenShift AI with KServe**.
>
> **Learn about features?** See links to feature documentation throughout this guide.

## Overview

The semantic router acts as an intelligent API gateway that provides:

- **Intelligent Model Selection**: Automatically routes requests to the best model based on semantic understanding
- **PII Detection & Protection**: Blocks or redacts sensitive information before sending to models
- **Prompt Guard**: Detects and blocks jailbreak attempts
- **Semantic Caching**: Reduces latency and costs through intelligent response caching
- **Category-Specific Prompts**: Injects domain-specific system prompts for better results
- **Tools Auto-Selection**: Automatically selects relevant tools for function calling

## Prerequisites

Before deploying, ensure you have:

1. **OpenShift Cluster** with OpenShift AI (RHOAI) installed
2. **KServe InferenceService** already deployed and running
3. **OpenShift CLI (oc)** installed and logged in
4. **Cluster admin or namespace admin** permissions

## Quick Deployment

Use the `deploy.sh` script for automated deployment. It handles validation, model downloads, and resource creation:

```bash
./deploy.sh --namespace <namespace> --inferenceservice <name> --model <model>
```

**Example:**

```bash
./deploy.sh -n semantic -i granite32-8b -m granite32-8b
```

The script validates prerequisites, creates a stable service for your predictor, downloads classification models (~2-3 min), and deploys all resources. Optional flags include `--embedding-model`, `--storage-class`, `--models-pvc-size`, and `--cache-pvc-size`. For manual step-by-step deployment, continue reading below.

## Manual Deployment

### Step 1: Verify InferenceService

Check that your InferenceService is deployed and ready:

```bash
NAMESPACE=<your-namespace>
INFERENCESERVICE_NAME=<your-inferenceservice-name>

# List InferenceServices
oc get inferenceservice -n $NAMESPACE

# Create stable ClusterIP service for predictor
cat <<EOF | oc apply -f - -n $NAMESPACE
apiVersion: v1
kind: Service
metadata:
name: ${INFERENCESERVICE_NAME}-predictor-stable
spec:
type: ClusterIP
selector:
serving.kserve.io/inferenceservice: ${INFERENCESERVICE_NAME}
ports:
- name: http
port: 8080
targetPort: 8080
EOF

# Get the stable ClusterIP
PREDICTOR_SERVICE_IP=$(oc get svc "${INFERENCESERVICE_NAME}-predictor-stable" -n $NAMESPACE -o jsonpath='{.spec.clusterIP}')
echo "Predictor service ClusterIP: $PREDICTOR_SERVICE_IP"
```

### Step 2: Configure Router Settings

Edit `configmap-router-config.yaml`:

1. Update `vllm_endpoints` with your predictor service ClusterIP
2. Configure `model_config` with your model name and PII policies
3. Update `categories` with model scores for routing
4. Set `default_model` to your model name

Edit `configmap-envoy-config.yaml`:

1. Update `kserve_dynamic_cluster` address to: `<inferenceservice>-predictor.<namespace>.svc.cluster.local`

### Step 3: Deploy Resources

Apply manifests in order:

```bash
NAMESPACE=<your-namespace>

# Deploy resources
oc apply -f serviceaccount.yaml -n $NAMESPACE
oc apply -f pvc.yaml -n $NAMESPACE
oc apply -f configmap-router-config.yaml -n $NAMESPACE
oc apply -f configmap-envoy-config.yaml -n $NAMESPACE
oc apply -f peerauthentication.yaml -n $NAMESPACE
oc apply -f deployment.yaml -n $NAMESPACE
oc apply -f service.yaml -n $NAMESPACE
oc apply -f route.yaml -n $NAMESPACE
```

### Step 4: Wait for Ready

Monitor deployment progress:

```bash
# Watch pod status
oc get pods -l app=semantic-router -n $NAMESPACE -w

# Check logs
oc logs -l app=semantic-router -c semantic-router -n $NAMESPACE -f
```

The pod will download models (~2-3 minutes) then start serving traffic.

## Accessing Services

Get the route URL:

```bash
ROUTER_URL=$(oc get route semantic-router-kserve -n $NAMESPACE -o jsonpath='{.spec.host}')
echo "External URL: https://$ROUTER_URL"
```

Test the deployment:

```bash
# Test models endpoint
curl -k "https://$ROUTER_URL/v1/models"

# Test chat completion
curl -k "https://$ROUTER_URL/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "<your-model>",
"messages": [{"role": "user", "content": "What is 2+2?"}],
"max_tokens": 50
}'
```

Run validation tests:

```bash
# Auto-detect configuration
./test-semantic-routing.sh

# Or specify explicitly
NAMESPACE=$NAMESPACE MODEL_NAME=<model> ./test-semantic-routing.sh
```

## Monitoring

### Check Deployment Status

```bash
# Check pods
oc get pods -l app=semantic-router -n $NAMESPACE

# Check services
oc get svc -n $NAMESPACE

# Check routes
oc get routes -n $NAMESPACE
```

### View Logs

```bash
# Router logs
oc logs -l app=semantic-router -c semantic-router -n $NAMESPACE -f

# Model download logs (init container)
oc logs -l app=semantic-router -c model-downloader -n $NAMESPACE

# Envoy logs
oc logs -l app=semantic-router -c envoy-proxy -n $NAMESPACE -f
```

### Metrics

```bash
# Port-forward metrics endpoint
POD=$(oc get pods -l app=semantic-router -n $NAMESPACE -o jsonpath='{.items[0].metadata.name}')
oc port-forward $POD 9190:9190 -n $NAMESPACE

# View metrics
curl http://localhost:9190/metrics
```

## Cleanup

Remove all deployed resources:

```bash
NAMESPACE=<your-namespace>

oc delete route semantic-router-kserve -n $NAMESPACE
oc delete service semantic-router-kserve -n $NAMESPACE
oc delete deployment semantic-router-kserve -n $NAMESPACE
oc delete configmap semantic-router-kserve-config semantic-router-envoy-kserve-config -n $NAMESPACE
oc delete pvc semantic-router-models semantic-router-cache -n $NAMESPACE
oc delete peerauthentication semantic-router-kserve-permissive -n $NAMESPACE
oc delete serviceaccount semantic-router -n $NAMESPACE
```

**Warning**: Deleting PVCs will remove downloaded models and cache data. To preserve data, skip PVC deletion.

## Troubleshooting

### Pod Not Starting

```bash
# Check pod status and events
oc get pods -l app=semantic-router -n $NAMESPACE
oc describe pod -l app=semantic-router -n $NAMESPACE

# Check init container logs (model download)
oc logs -l app=semantic-router -c model-downloader -n $NAMESPACE
```

**Common causes:**

- Network issues downloading models
- PVC not bound - check storage class
- Insufficient memory - increase init container resources

### Router Container Crashing

```bash
# Check router logs
oc logs -l app=semantic-router -c semantic-router -n $NAMESPACE --previous
```

**Common causes:**

- Configuration error - validate YAML syntax
- Invalid IP address - use ClusterIP not DNS in `vllm_endpoints.address`
- Missing models - verify init container completed

### Cannot Connect to InferenceService

```bash
# Test from router pod
POD=$(oc get pods -l app=semantic-router -n $NAMESPACE -o jsonpath='{.items[0].metadata.name}')
oc exec $POD -c semantic-router -n $NAMESPACE -- \
curl -v http://<inferenceservice>-predictor.$NAMESPACE.svc.cluster.local:8080/v1/models
```

**Common causes:**

- InferenceService not ready - check `oc get inferenceservice -n $NAMESPACE`
- Wrong DNS name - verify format: `<inferenceservice>-predictor.<namespace>.svc.cluster.local`
- Network policy blocking traffic
- mTLS mode mismatch - ensure PERMISSIVE mode in PeerAuthentication

## Configuration

For detailed configuration options, see the main project documentation:

- **Category Classification**: Train custom models at [Category Classifier Training](../../src/training/classifier_model_fine_tuning/)
- **PII Detection**: Train custom models at [PII Detection Training](../../src/training/pii_model_fine_tuning/)
- **Prompt Guard**: Train custom models at [Prompt Guard Training](../../src/training/prompt_guard_fine_tuning/)

## Related Documentation

### Within This Repository

- **[Category Classifier Training](../../src/training/classifier_model_fine_tuning/)** - Train custom category classification models
- **[PII Detector Training](../../src/training/pii_model_fine_tuning/)** - Train custom PII detection models
- **[Prompt Guard Training](../../src/training/prompt_guard_fine_tuning/)** - Train custom jailbreak detection models

### Other Deployment Options

- **[OpenShift Deployment](../openshift/)** - Deploy with standalone vLLM containers (not KServe)
- *This directory* - OpenShift AI KServe deployment (you are here)

### External Resources

- **Main Project**: https://github.com/vllm-project/semantic-router
- **Full Documentation**: https://vllm-semantic-router.com
- **KServe Docs**: https://kserve.github.io/website/
Loading
Loading