You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/deployments/api-configuration.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,11 +34,11 @@ Reference the section below which corresponds to your Predictor type: [Python](#
34
34
max_replica_concurrency: <int> # the maximum number of in-flight requests per replica before requests are rejected with error code 503 (default: 1024)
35
35
window: <duration> # the time over which to average the API's concurrency (default: 60s)
36
36
downscale_stabilization_period: <duration> # the API will not scale below the highest recommendation made during this period (default: 5m)
37
-
upscale_stabilization_period: <duration> # the API will not scale above the lowest recommendation made during this period (default: 0m)
38
-
max_downscale_factor: <float> # the maximum factor by which to scale down the API on a single scaling event (default: 0.5)
39
-
max_upscale_factor: <float> # the maximum factor by which to scale up the API on a single scaling event (default: 10)
40
-
downscale_tolerance: <float> # any recommendation falling within this factor below the current number of replicas will not trigger a scale down event (default: 0.1)
41
-
upscale_tolerance: <float> # any recommendation falling within this factor above the current number of replicas will not trigger a scale up event (default: 0.1)
37
+
upscale_stabilization_period: <duration> # the API will not scale above the lowest recommendation made during this period (default: 1m)
38
+
max_downscale_factor: <float> # the maximum factor by which to scale down the API on a single scaling event (default: 0.75)
39
+
max_upscale_factor: <float> # the maximum factor by which to scale up the API on a single scaling event (default: 1.5)
40
+
downscale_tolerance: <float> # any recommendation falling within this factor below the current number of replicas will not trigger a scale down event (default: 0.05)
41
+
upscale_tolerance: <float> # any recommendation falling within this factor above the current number of replicas will not trigger a scale up event (default: 0.05)
42
42
update_strategy:
43
43
max_surge: <string | int> # maximum number of replicas that can be scheduled above the desired number of replicas during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
44
44
max_unavailable: <string | int> # maximum number of replicas that can be unavailable during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
@@ -76,11 +76,11 @@ See additional documentation for [autoscaling](autoscaling.md), [compute](comput
76
76
max_replica_concurrency: <int> # the maximum number of in-flight requests per replica before requests are rejected with error code 503 (default: 1024)
77
77
window: <duration> # the time over which to average the API's concurrency (default: 60s)
78
78
downscale_stabilization_period: <duration> # the API will not scale below the highest recommendation made during this period (default: 5m)
79
-
upscale_stabilization_period: <duration> # the API will not scale above the lowest recommendation made during this period (default: 0m)
80
-
max_downscale_factor: <float> # the maximum factor by which to scale down the API on a single scaling event (default: 0.5)
81
-
max_upscale_factor: <float> # the maximum factor by which to scale up the API on a single scaling event (default: 10)
82
-
downscale_tolerance: <float> # any recommendation falling within this factor below the current number of replicas will not trigger a scale down event (default: 0.1)
83
-
upscale_tolerance: <float> # any recommendation falling within this factor above the current number of replicas will not trigger a scale up event (default: 0.1)
79
+
upscale_stabilization_period: <duration> # the API will not scale above the lowest recommendation made during this period (default: 1m)
80
+
max_downscale_factor: <float> # the maximum factor by which to scale down the API on a single scaling event (default: 0.75)
81
+
max_upscale_factor: <float> # the maximum factor by which to scale up the API on a single scaling event (default: 1.5)
82
+
downscale_tolerance: <float> # any recommendation falling within this factor below the current number of replicas will not trigger a scale down event (default: 0.05)
83
+
upscale_tolerance: <float> # any recommendation falling within this factor above the current number of replicas will not trigger a scale up event (default: 0.05)
84
84
update_strategy:
85
85
max_surge: <string | int> # maximum number of replicas that can be scheduled above the desired number of replicas during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
86
86
max_unavailable: <string | int> # maximum number of replicas that can be unavailable during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
@@ -117,11 +117,11 @@ See additional documentation for [autoscaling](autoscaling.md), [compute](comput
117
117
max_replica_concurrency: <int> # the maximum number of in-flight requests per replica before requests are rejected with error code 503 (default: 1024)
118
118
window: <duration> # the time over which to average the API's concurrency (default: 60s)
119
119
downscale_stabilization_period: <duration> # the API will not scale below the highest recommendation made during this period (default: 5m)
120
-
upscale_stabilization_period: <duration> # the API will not scale above the lowest recommendation made during this period (default: 0m)
121
-
max_downscale_factor: <float> # the maximum factor by which to scale down the API on a single scaling event (default: 0.5)
122
-
max_upscale_factor: <float> # the maximum factor by which to scale up the API on a single scaling event (default: 10)
123
-
downscale_tolerance: <float> # any recommendation falling within this factor below the current number of replicas will not trigger a scale down event (default: 0.1)
124
-
upscale_tolerance: <float> # any recommendation falling within this factor above the current number of replicas will not trigger a scale up event (default: 0.1)
120
+
upscale_stabilization_period: <duration> # the API will not scale above the lowest recommendation made during this period (default: 1m)
121
+
max_downscale_factor: <float> # the maximum factor by which to scale down the API on a single scaling event (default: 0.75)
122
+
max_upscale_factor: <float> # the maximum factor by which to scale up the API on a single scaling event (default: 1.5)
123
+
downscale_tolerance: <float> # any recommendation falling within this factor below the current number of replicas will not trigger a scale down event (default: 0.05)
124
+
upscale_tolerance: <float> # any recommendation falling within this factor above the current number of replicas will not trigger a scale up event (default: 0.05)
125
125
update_strategy:
126
126
max_surge: <string | int> # maximum number of replicas that can be scheduled above the desired number of replicas during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
127
127
max_unavailable: <string | int> # maximum number of replicas that can be unavailable during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
Copy file name to clipboardExpand all lines: docs/deployments/autoscaling.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,15 +36,15 @@ Cortex autoscales your web services based on your configuration.
36
36
37
37
*`downscale_stabilization_period` (default: 5m): The API will not scale below the highest recommendation made during this period. Every 10 seconds, the autoscaler makes a recommendation based on all of the other configuration parameters described here. It will then take the max of the current recommendation and all recommendations made during the `downscale_stabilization_period`, and use that to determine the final number of replicas to scale to. Increasing this value will cause the cluster to react more slowly to decreased traffic, and will reduce thrashing.
38
38
39
-
*`upscale_stabilization_period` (default: 0m): The API will not scale above the lowest recommendation made during this period. Every 10 seconds, the autoscaler makes a recommendation based on all of the other configuration parameters described here. It will then take the min of the current recommendation and all recommendations made during the `upscale_stabilization_period`, and use that to determine the final number of replicas to scale to. Increasing this value will cause the cluster to react more slowly to increased traffic, and will reduce thrashing. The default is 0 minutes, which means that the cluster will react quickly to increased traffic.
39
+
*`upscale_stabilization_period` (default: 1m): The API will not scale above the lowest recommendation made during this period. Every 10 seconds, the autoscaler makes a recommendation based on all of the other configuration parameters described here. It will then take the min of the current recommendation and all recommendations made during the `upscale_stabilization_period`, and use that to determine the final number of replicas to scale to. Increasing this value will cause the cluster to react more slowly to increased traffic, and will reduce thrashing. The default is 0 minutes, which means that the cluster will react quickly to increased traffic.
40
40
41
-
*`max_downscale_factor` (default: 0.5): The maximum factor by which to scale down the API on a single scaling event. For example, if `max_downscale_factor` is 0.5 and there are 10 running replicas, the autoscaler will not recommend fewer than 5 replicas. Increasing this number will allow the cluster to shrink more quickly in response to dramatic dips in traffic.
41
+
*`max_downscale_factor` (default: 0.75): The maximum factor by which to scale down the API on a single scaling event. For example, if `max_downscale_factor` is 0.5 and there are 10 running replicas, the autoscaler will not recommend fewer than 5 replicas. Increasing this number will allow the cluster to shrink more quickly in response to dramatic dips in traffic.
42
42
43
-
*`max_upscale_factor` (default: 10): The maximum factor by which to scale up the API on a single scaling event. For example, if `max_upscale_factor` is 10 and there are 5 running replicas, the autoscaler will not recommend more than 50 replicas. Increasing this number will allow the cluster to grow more quickly in response to dramatic spikes in traffic.
43
+
*`max_upscale_factor` (default: 1.5): The maximum factor by which to scale up the API on a single scaling event. For example, if `max_upscale_factor` is 10 and there are 5 running replicas, the autoscaler will not recommend more than 50 replicas. Increasing this number will allow the cluster to grow more quickly in response to dramatic spikes in traffic.
44
44
45
-
*`downscale_tolerance` (default: 0.1): Any recommendation falling within this factor below the current number of replicas will not trigger a scale down event. For example, if `downscale_tolerance` is 0.1 and there are 20 running replicas, a recommendation of 18 or 19 replicas will not be acted on, and the API will remain at 20 replicas. Increasing this value will prevent thrashing, but setting it too high will prevent the cluster from maintaining it's optimal size.
45
+
*`downscale_tolerance` (default: 0.05): Any recommendation falling within this factor below the current number of replicas will not trigger a scale down event. For example, if `downscale_tolerance` is 0.1 and there are 20 running replicas, a recommendation of 18 or 19 replicas will not be acted on, and the API will remain at 20 replicas. Increasing this value will prevent thrashing, but setting it too high will prevent the cluster from maintaining it's optimal size.
46
46
47
-
*`upscale_tolerance` (default: 0.1): Any recommendation falling within this factor above the current number of replicas will not trigger a scale up event. For example, if `upscale_tolerance` is 0.1 and there are 20 running replicas, a recommendation of 21 or 22 replicas will not be acted on, and the API will remain at 20 replicas. Increasing this value will prevent thrashing, but setting it too high will prevent the cluster from maintaining it's optimal size.
47
+
*`upscale_tolerance` (default: 0.05): Any recommendation falling within this factor above the current number of replicas will not trigger a scale up event. For example, if `upscale_tolerance` is 0.1 and there are 20 running replicas, a recommendation of 21 or 22 replicas will not be acted on, and the API will remain at 20 replicas. Increasing this value will prevent thrashing, but setting it too high will prevent the cluster from maintaining it's optimal size.
0 commit comments