+The default value for `target_replica_concurrency` is `processes_per_replica` * `threads_per_process`, which behaves well in many situations (see above for an explanation of how `target_replica_concurrency` affects autoscaling). However, if your application is sensitive to spikes in traffic or if creating new replicas takes too long (see below), you may find it helpful to maintain extra capacity to handle the increased traffic while new replicas are being created. This can be accomplished by setting `target_replica_concurrency` to a lower value relative to the expected replica's concurrency. The smaller `target_replica_concurrency` is, the more unused capacity your API will have, and the more room it will have to handle sudden increased load. The increased request rate will still trigger the autoscaler, and your API will stabilize again (maintaining the overprovisioned capacity).
0 commit comments