Skip to content

Commit 6ff79f5

Browse files
committed
fix: simpilify readme with basic introduction.
Signed-off-by: X1aoZEOuO <nizefeng2002@outlook.com>
1 parent b73e6f0 commit 6ff79f5

File tree

1 file changed

+29
-89
lines changed

1 file changed

+29
-89
lines changed

docs/examples/serverless/README.md

Lines changed: 29 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -1,119 +1,59 @@
1-
# Serverless Configuration and Documentation
2-
3-
## Overview
4-
5-
This document provides a detailed guide on configuring serverless environments using Kubernetes, with a focus on integrating Prometheus for monitoring and KEDA for scaling. The configuration aims to ensure efficient resource utilization and seamless scaling of applications.
6-
7-
## Concepts
8-
9-
### Prometheus Configuration
10-
11-
Prometheus is used for monitoring and alerting. To enable cross-namespace ServiceMonitor discovery, use `namespaceSelector`. In Prometheus, define `serviceMonitorSelector` to associate with ServiceMonitors.
12-
13-
```yaml
14-
apiVersion: monitoring.coreos.com/v1
15-
kind: ServiceMonitor
16-
metadata:
17-
name: qwen2-0--5b-lb-monitor
18-
namespace: llmaz-system
19-
labels:
20-
control-plane: controller-manager
21-
app.kubernetes.io/name: servicemonitor
22-
spec:
23-
namespaceSelector:
24-
any: true
25-
selector:
26-
matchLabels:
27-
llmaz.io/model-name: qwen2-0--5b
28-
endpoints:
29-
- port: http
30-
path: /metrics
31-
scheme: http
32-
```
33-
34-
- Ensure that the `namespaceSelector` is set to allow cross-namespace monitoring.
35-
- Label your services appropriately to be discovered by Prometheus.
36-
37-
### KEDA Configuration
38-
39-
KEDA (Kubernetes Event-driven Autoscaling) is used for scaling applications based on custom metrics. It can be integrated with Prometheus to trigger scaling actions.
40-
41-
42-
```yaml
43-
apiVersion: keda.sh/v1alpha1
44-
kind: ScaledObject
45-
metadata:
46-
name: qwen2-0--5b-scaler
47-
namespace: default
48-
spec:
49-
scaleTargetRef:
50-
apiVersion: inference.llmaz.io/v1alpha1
51-
kind: Playground
52-
name: qwen2-0--5b
53-
pollingInterval: 30
54-
cooldownPeriod: 50
55-
minReplicaCount: 0
56-
maxReplicaCount: 3
57-
triggers:
58-
- type: prometheus
59-
metadata:
60-
serverAddress: http://prometheus-operated.llmaz-system.svc.cluster.local:9090
61-
metricName: llamacpp:requests_processing
62-
query: sum(llamacpp:requests_processing)
63-
threshold: "0.2"
64-
```
65-
66-
- Ensure that the `serverAddress` points to the correct Prometheus service.
67-
- Adjust `pollingInterval` and `cooldownPeriod` to optimize scaling behavior and avoid conflicts with other scaling mechanisms.
1+
# Serverless Examples
682

69-
### Integration with Activator
3+
This directory contains example configurations for setting up serverless deployments with llmaz using KEDA for event-driven autoscaling.
704

71-
Consider integrating the serverless configuration with an activator for scale-from-zero scenarios. The activator can be implemented using a controller pattern or as a standalone goroutine.
5+
> For detailed documentation on serverless concepts, architecture, and configuration, please refer to the [Serverless Features Documentation](../../../site/content/en/docs/features/serverless.md).
726
73-
### Controller Runtime Framework
7+
## Files
748

75-
Using the Controller Runtime framework can simplify the development of Kubernetes controllers. It provides abstractions for managing resources and handling events.
9+
- **basic.yaml**: Example configuration showing a complete serverless setup including:
10+
- OpenModel definition for Qwen2-0.5B
11+
- Playground deployment with zero initial replicas
12+
- Gateway and AIGatewayRoute configuration
13+
- AIServiceBackend setup
7614

77-
#### Key Components
15+
- **service-monitor.yaml**: Prometheus ServiceMonitor for cross-namespace metric collection
7816

79-
1. **Controller**: Monitors resource states and triggers actions to align actual and desired states.
80-
2. **Reconcile Function**: Core logic for transitioning resource states.
81-
3. **Manager**: Manages the lifecycle of controllers and shared resources.
82-
4. **Client**: Interface for interacting with the Kubernetes API.
83-
5. **Scheme**: Registry for resource types.
84-
6. **Event Source and Handler**: Define event sources and handling logic.
17+
- **scaled-object.yaml**: KEDA ScaledObject configuration for scaling based on Prometheus metrics
8518

19+
## Quick Start
8620

87-
## Quick Start Guide
88-
89-
1. Install Prometheus and KEDA using Helm charts, following the official documentation [Install Guide](https://llmaz.inftyai.com/docs/getting-started/installation/).
21+
1. Install prerequisites (llmaz, Prometheus, and KEDA):
9022

9123
```bash
9224
helm install llmaz oci://registry-1.docker.io/inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.10
93-
make install-keda
9425
make install-prometheus
26+
make install-keda
9527
```
9628

97-
2. Create a ServiceMonitor for Prometheus to discover your services.
29+
2. Deploy the example configuration:
30+
9831
```bash
99-
kubectl apply -f service-monitor.yaml
32+
kubectl apply -f basic.yaml
10033
```
10134

102-
3. Create a ScaledObject for KEDA to manage scaling.
35+
3. Create ServiceMonitor and ScaledObject:
36+
10337
```bash
38+
kubectl apply -f service-monitor.yaml
10439
kubectl apply -f scaled-object.yaml
10540
```
10641

107-
4. Test with a cold start application.
42+
4. Test cold start by sending a request:
43+
10844
```bash
10945
kubectl exec -it -n kube-system deploy/activator -- wget -O- qwen2-0--5b-lb.default.svc:8080
11046
```
11147

112-
5. Check with Prometheus and KEDA dashboards to monitor metrics and scaling activities in web page.
48+
5. Monitor metrics and scaling activity:
49+
11350
```bash
11451
kubectl port-forward services/prometheus-operated 9090:9090 --address 0.0.0.0 -n llmaz-system
11552
```
11653

117-
## Conclusion
54+
## Configuration Notes
11855

119-
This configuration guide provides a comprehensive approach to setting up a serverless environment with Kubernetes, Prometheus, and KEDA. By following these guidelines, you can ensure efficient scaling and monitoring of your applications.
56+
- The example uses `minReplicaCount: 0` to enable scale-to-zero
57+
- Scaling is triggered based on the `llamacpp:requests_processing` metric
58+
- The activator component intercepts requests when replicas are at zero
59+
- Adjust `pollingInterval` and `cooldownPeriod` in the ScaledObject to optimize scaling behavior

0 commit comments

Comments
 (0)