You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/architecture/autoscaling.md
+28-5Lines changed: 28 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -121,7 +121,7 @@ mean per pod = 90 / 1 = 90
121
121
* Capacity `capacity`
122
122
123
123
Based upon inflight requests (or connections), ideal for: long-running functions or functions which can only handle a limited number of requests at once.
124
-
124
+
125
125
A hard limit can be enforced through the `max_inflight` environment variable on the function, so the caller will need to retry the request some of the time. The OpenFaaS Pro queue-worker does this automatically, see also: [Retries](/openfaas-pro/retries).
126
126
127
127
* RPS `rps`
@@ -134,6 +134,10 @@ mean per pod = 90 / 1 = 90
134
134
135
135
Based upon CPU usage of the function, this strategy is idea for CPU-bound workloads, or where Capacity and RPS are not giving the optimal scaling profile. The value configured here is in milli-CPU, so 1000 accounts for *1 CPU core*.
136
136
137
+
* Queue-depth `queue`
138
+
139
+
Based upon the number of async invocations that are queued for a function. This allows you to scale functions rapidly and proactively to the desired number of replicas to process the queue as quickly as possible. Ideal for functions that are only invoked asynchronously.
140
+
137
141
* Scaling to zero
138
142
139
143
[Scaling to zero](/openfaas-pro/scale-to-zero) is an opt-in feature on a per function basis. It can be used in combination with any scaling mode, including *Static scaling*
@@ -175,7 +179,7 @@ hey -t 10 -z 3m -c 5 -q 5 \
175
179
http://127.0.0.1:8080/function/sleep
176
180
```
177
181
178
-
To apply a hard limit, add `--env max_inflight=5` to the `faas-cli store deploy` command.
182
+
To apply a hard limit, add `--env max_inflight=5` to the `faas-cli store deploy` command.
179
183
180
184
What if you need to limit a function to processing only one request at a time?
181
185
@@ -260,6 +264,26 @@ hey -m POST -d data -z 3m -c 5 -q 10 \
260
264
261
265
Note that `com.openfaas.scale.zero=false` is a default, so this is not strictly required.
262
266
267
+
**4) Queue-depth based scaling**
268
+
269
+
When the number of incoming async invocation increases, the queue depth grows. By scaling functions based on this metric, you can proactively add more replicas to process messages faster.
270
+
271
+
```bash
272
+
faas-cli store deploy sleep \
273
+
--label com.openfaas.scale.max=10 \
274
+
--label com.openfaas.scale.target=10 \
275
+
--label com.openfaas.scale.type=queue \
276
+
--label com.openfaas.scale.target-proportion=1 \
277
+
--env max_inflight=1
278
+
279
+
hey -m POST -n 30 -c 30 \
280
+
http://127.0.0.1:8080/async-function/sleep
281
+
```
282
+
283
+
This sleep function takes 2 seconds to complete, and has a *hard limit* on the number of invocations of 1 concurrent request.
284
+
285
+
With the above scaling configuration, if 30 messages are submitted to the queue via async invocations, the sleep function will scale to 3 replicas immediately.
286
+
263
287
## Smoothing out scaling down with a stable window
264
288
265
289
The `com.openfaas.scale.down.window` label can be set with a Go duration up to a maximum of `5m` or `300s`. When set, the autoscaler will record recommendations on each cycle, and only scale down a function to the highest recorded recommendation of replicas.
@@ -306,7 +330,7 @@ Scaling functions to zero replicas can improve efficiency and reduce costs in yo
306
330
307
331
1. **Cost Savings**: By scaling down to zero when idle, you can reduce the number of nodes required in your cluster, leading to lower infrastructure costs with fewer, or smaller nodes required.
308
332
2. **Resource Efficiency**: Scaling down to zero helps to free up resources in your cluster, this also helps with on-premises clusters where the amount of nodes may be fixed.
309
-
3. **Security**: By scaling functions down, the attack surface is also reduced to only active functions.
333
+
3. **Security**: By scaling functions down, the attack surface is also reduced to only active functions.
310
334
311
335
### Scaling down to zero replicas
312
336
@@ -364,10 +388,9 @@ The minimum (initial) and maximum replica count can be set at deployment time by
364
388
* `com.openfaas.scale.factor` by default this is set to `20%` and has to be a value between 0-100 (including borders)
365
389
366
390
> Note:
367
-
> Setting `com.openfaas.scale.min` and `com.openfaas.scale.max` to the same value, allows to disable the auto-scaling functionality of openfaas.
391
+
> Setting `com.openfaas.scale.min` and `com.openfaas.scale.max` to the same value, allows to disable the auto-scaling functionality of openfaas.
368
392
> Setting `com.openfaas.scale.factor=0` also allows to disable the auto-scaling functionality of openfaas.
369
393
370
394
For each alert fired the auto-scaler will add a number of replicas, which is a defined percentage of the max replicas. This percentage can be set using `com.openfaas.scale.factor`. For example setting `com.openfaas.scale.factor=100` will instantly scale to max replicas. This label enables to define the overall scaling behavior of the function.
371
395
372
396
> Note: Active alerts can be viewed in the "Alerts" tab of Prometheus which is deployed with OpenFaaS.
Copy file name to clipboardExpand all lines: docs/openfaas-pro/comparison.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,10 +54,10 @@ Did you know? OpenFaaS Pro's autoscaling engine can scale many different types o
54
54
| Maximum replicas per function | 5 | 1 | No limit applied | as per Standard |
55
55
| Scale from Zero | Not available | Supported | Supported, with additional checks for Istio | as per Standard |
56
56
| Zero downtime updates | Not available | Not available | Supported with readiness probes and rolling updates | as per Standard |
57
-
| Autoscaling strategy | RPS | Not applicable |[CPU utilization, Capacity (inflight requests), RPSand Custom](/architecture/autoscaling)| as per Standard |
57
+
| Autoscaling strategy | RPS | Not applicable |[CPU utilization, Capacity (inflight requests), RPS, async queue-depth and Custom (e.g. Memory)](/architecture/autoscaling)| as per Standard |
58
58
| Autoscaling granularity | One global rule | Not applicable | Configurable per function | as per Standard |
59
59
60
-
Data-driven, intensive, or long running functions are best suited to capacity-based autoscaling, which is only available in OpenFaaS Pro.
60
+
Data-driven, intensive, or long running functions are best suited to capacity-based or queue-based autoscaling, which is only available in OpenFaaS Pro.
61
61
62
62
Scaling to zero is also a commercial feature, which can be opted into on a per function basis, with a custom idle threshold.
0 commit comments