Skip to content

Conversation

@cooktheryan
Copy link
Contributor

What type of PR is this?
Allow people to use kserve for the model serving portion when using semantic router

What this PR does / why we need it:
Need to be able to demonstrate this on opendatahub and RHOAI

Which issue(s) this PR fixes:

Fixes #565

Release Notes: Yes

Signed-off-by: Ryan Cook <rcook@redhat.com>
@netlify
Copy link

netlify bot commented Oct 31, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit f7652ed
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/69188e03b1ea840008d4c951
😎 Deploy Preview https://deploy-preview-566--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

github-actions bot commented Oct 31, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 deploy

Owners: @rootfs, @Xunzhuo
Files changed:

  • deploy/kserve/README.md
  • deploy/kserve/configmap-envoy-config.yaml
  • deploy/kserve/configmap-router-config.yaml
  • deploy/kserve/deploy.sh
  • deploy/kserve/deployment.yaml
  • deploy/kserve/example-multi-model-config.yaml
  • deploy/kserve/inference-examples/README.md
  • deploy/kserve/inference-examples/inferenceservice-granite32-8b.yaml
  • deploy/kserve/inference-examples/servingruntime-granite32-8b.yaml
  • deploy/kserve/kustomization.yaml
  • deploy/kserve/peerauthentication.yaml
  • deploy/kserve/pvc.yaml
  • deploy/kserve/route.yaml
  • deploy/kserve/service-predictor-stable.yaml
  • deploy/kserve/service.yaml
  • deploy/kserve/serviceaccount.yaml
  • deploy/kserve/test-semantic-routing.sh

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Signed-off-by: Ryan Cook <rcook@redhat.com>
Signed-off-by: Ryan Cook <rcook@redhat.com>
Signed-off-by: Ryan Cook <rcook@redhat.com>
Signed-off-by: Ryan Cook <rcook@redhat.com>
Signed-off-by: Ryan Cook <rcook@redhat.com>
Signed-off-by: Ryan Cook <rcook@redhat.com>
@cooktheryan
Copy link
Contributor Author

First request: 719ms
Second request: 362ms
✓ Cache appears to be working (49% faster)

==================================================
Validation Complete
==================================================

@cooktheryan cooktheryan changed the title WIP: kserve functionality kserve functionality Nov 14, 2025
Signed-off-by: Ryan Cook <rcook@redhat.com>
Signed-off-by: Ryan Cook <rcook@redhat.com>
Signed-off-by: Ryan Cook <rcook@redhat.com>
@cooktheryan
Copy link
Contributor Author

@rootfs @Xunzhuo ready for review

Signed-off-by: Ryan Cook <rcook@redhat.com>
apiVersion: v1
kind: ConfigMap
metadata:
name: semantic-router-envoy-kserve-config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not going to block this pr, but can you separate this into 4 sub issues in follow up?

  1. in k8s, can you change to an gateway instead of a static envoy config, like envoy ai gw (plz refer to https://vllm-semantic-router.com/docs/installation/k8s/ai-gateway) or istio? this can do demo show but in production we need an envoy xds control plane.
  2. can you later change this guide resources sepecially the semantic router related resources into helm (which we recently supported)
  3. can you add an integration test after [Feat] Add automate e2e test framework for extensible integration tests #655?
  4. can you move the docs into website?

@Xunzhuo Xunzhuo changed the title kserve functionality [Integration]: Add integration with Kserve functionality Nov 15, 2025
@rootfs rootfs merged commit bbbe3e6 into vllm-project:main Nov 15, 2025
18 checks passed
@rootfs
Copy link
Collaborator

rootfs commented Nov 15, 2025

Thank you for making this happen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Integration] Add kserve integration for using semantic router

3 participants