|
1 | | -# Serverless Configuration and Documentation |
2 | | - |
3 | | -## Overview |
4 | | - |
5 | | -This document provides a detailed guide on configuring serverless environments using Kubernetes, with a focus on integrating Prometheus for monitoring and KEDA for scaling. The configuration aims to ensure efficient resource utilization and seamless scaling of applications. |
6 | | - |
7 | | -## Concepts |
8 | | - |
9 | | -### Prometheus Configuration |
10 | | - |
11 | | -Prometheus is used for monitoring and alerting. To enable cross-namespace ServiceMonitor discovery, use `namespaceSelector`. In Prometheus, define `serviceMonitorSelector` to associate with ServiceMonitors. |
12 | | - |
13 | | -```yaml |
14 | | -apiVersion: monitoring.coreos.com/v1 |
15 | | -kind: ServiceMonitor |
16 | | -metadata: |
17 | | - name: qwen2-0--5b-lb-monitor |
18 | | - namespace: llmaz-system |
19 | | - labels: |
20 | | - control-plane: controller-manager |
21 | | - app.kubernetes.io/name: servicemonitor |
22 | | -spec: |
23 | | - namespaceSelector: |
24 | | - any: true |
25 | | - selector: |
26 | | - matchLabels: |
27 | | - llmaz.io/model-name: qwen2-0--5b |
28 | | - endpoints: |
29 | | - - port: http |
30 | | - path: /metrics |
31 | | - scheme: http |
32 | | -``` |
33 | | -
|
34 | | -- Ensure that the `namespaceSelector` is set to allow cross-namespace monitoring. |
35 | | -- Label your services appropriately to be discovered by Prometheus. |
36 | | - |
37 | | -### KEDA Configuration |
38 | | - |
39 | | -KEDA (Kubernetes Event-driven Autoscaling) is used for scaling applications based on custom metrics. It can be integrated with Prometheus to trigger scaling actions. |
40 | | - |
41 | | - |
42 | | -```yaml |
43 | | -apiVersion: keda.sh/v1alpha1 |
44 | | -kind: ScaledObject |
45 | | -metadata: |
46 | | - name: qwen2-0--5b-scaler |
47 | | - namespace: default |
48 | | -spec: |
49 | | - scaleTargetRef: |
50 | | - apiVersion: inference.llmaz.io/v1alpha1 |
51 | | - kind: Playground |
52 | | - name: qwen2-0--5b |
53 | | - pollingInterval: 30 |
54 | | - cooldownPeriod: 50 |
55 | | - minReplicaCount: 0 |
56 | | - maxReplicaCount: 3 |
57 | | - triggers: |
58 | | - - type: prometheus |
59 | | - metadata: |
60 | | - serverAddress: http://prometheus-operated.llmaz-system.svc.cluster.local:9090 |
61 | | - metricName: llamacpp:requests_processing |
62 | | - query: sum(llamacpp:requests_processing) |
63 | | - threshold: "0.2" |
64 | | -``` |
65 | | - |
66 | | -- Ensure that the `serverAddress` points to the correct Prometheus service. |
67 | | -- Adjust `pollingInterval` and `cooldownPeriod` to optimize scaling behavior and avoid conflicts with other scaling mechanisms. |
| 1 | +# Serverless Examples |
68 | 2 |
|
69 | | -### Integration with Activator |
| 3 | +This directory contains example configurations for setting up serverless deployments with llmaz using KEDA for event-driven autoscaling. |
70 | 4 |
|
71 | | -Consider integrating the serverless configuration with an activator for scale-from-zero scenarios. The activator can be implemented using a controller pattern or as a standalone goroutine. |
| 5 | +> For detailed documentation on serverless concepts, architecture, and configuration, please refer to the [Serverless Features Documentation](../../../site/content/en/docs/features/serverless.md). |
72 | 6 |
|
73 | | -### Controller Runtime Framework |
| 7 | +## Files |
74 | 8 |
|
75 | | -Using the Controller Runtime framework can simplify the development of Kubernetes controllers. It provides abstractions for managing resources and handling events. |
| 9 | +- **basic.yaml**: Example configuration showing a complete serverless setup including: |
| 10 | + - OpenModel definition for Qwen2-0.5B |
| 11 | + - Playground deployment with zero initial replicas |
| 12 | + - Gateway and AIGatewayRoute configuration |
| 13 | + - AIServiceBackend setup |
76 | 14 |
|
77 | | -#### Key Components |
| 15 | +- **service-monitor.yaml**: Prometheus ServiceMonitor for cross-namespace metric collection |
78 | 16 |
|
79 | | -1. **Controller**: Monitors resource states and triggers actions to align actual and desired states. |
80 | | -2. **Reconcile Function**: Core logic for transitioning resource states. |
81 | | -3. **Manager**: Manages the lifecycle of controllers and shared resources. |
82 | | -4. **Client**: Interface for interacting with the Kubernetes API. |
83 | | -5. **Scheme**: Registry for resource types. |
84 | | -6. **Event Source and Handler**: Define event sources and handling logic. |
| 17 | +- **scaled-object.yaml**: KEDA ScaledObject configuration for scaling based on Prometheus metrics |
85 | 18 |
|
| 19 | +## Quick Start |
86 | 20 |
|
87 | | -## Quick Start Guide |
88 | | - |
89 | | -1. Install Prometheus and KEDA using Helm charts, following the official documentation [Install Guide](https://llmaz.inftyai.com/docs/getting-started/installation/). |
| 21 | +1. Install prerequisites (llmaz, Prometheus, and KEDA): |
90 | 22 |
|
91 | 23 | ```bash |
92 | 24 | helm install llmaz oci://registry-1.docker.io/inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.10 |
93 | | -make install-keda |
94 | 25 | make install-prometheus |
| 26 | +make install-keda |
95 | 27 | ``` |
96 | 28 |
|
97 | | -2. Create a ServiceMonitor for Prometheus to discover your services. |
| 29 | +2. Deploy the example configuration: |
| 30 | + |
98 | 31 | ```bash |
99 | | -kubectl apply -f service-monitor.yaml |
| 32 | +kubectl apply -f basic.yaml |
100 | 33 | ``` |
101 | 34 |
|
102 | | -3. Create a ScaledObject for KEDA to manage scaling. |
| 35 | +3. Create ServiceMonitor and ScaledObject: |
| 36 | + |
103 | 37 | ```bash |
| 38 | +kubectl apply -f service-monitor.yaml |
104 | 39 | kubectl apply -f scaled-object.yaml |
105 | 40 | ``` |
106 | 41 |
|
107 | | -4. Test with a cold start application. |
| 42 | +4. Test cold start by sending a request: |
| 43 | + |
108 | 44 | ```bash |
109 | 45 | kubectl exec -it -n kube-system deploy/activator -- wget -O- qwen2-0--5b-lb.default.svc:8080 |
110 | 46 | ``` |
111 | 47 |
|
112 | | -5. Check with Prometheus and KEDA dashboards to monitor metrics and scaling activities in web page. |
| 48 | +5. Monitor metrics and scaling activity: |
| 49 | + |
113 | 50 | ```bash |
114 | 51 | kubectl port-forward services/prometheus-operated 9090:9090 --address 0.0.0.0 -n llmaz-system |
115 | 52 | ``` |
116 | 53 |
|
117 | | -## Conclusion |
| 54 | +## Configuration Notes |
118 | 55 |
|
119 | | -This configuration guide provides a comprehensive approach to setting up a serverless environment with Kubernetes, Prometheus, and KEDA. By following these guidelines, you can ensure efficient scaling and monitoring of your applications. |
| 56 | +- The example uses `minReplicaCount: 0` to enable scale-to-zero |
| 57 | +- Scaling is triggered based on the `llamacpp:requests_processing` metric |
| 58 | +- The activator component intercepts requests when replicas are at zero |
| 59 | +- Adjust `pollingInterval` and `cooldownPeriod` in the ScaledObject to optimize scaling behavior |
0 commit comments