Skip to content

Commit b9bc1f5

Browse files
authored
Update README.md
1 parent 9f4b031 commit b9bc1f5

File tree

1 file changed

+1
-1
lines changed
  • End_to_end_Solutions/Load Balance and Utilize Quota Limits [LINK]

1 file changed

+1
-1
lines changed
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Fully utilize AOAI quotas and limits
2-
As at 12 June 2023, one subscription can provision 30 AOAI resources per region, sharing the same TPM and RPM limits. For example, you can allocate 3 deployment instances of GPT-35-Turbo with 80K TPM/480 RPM each to utilize the whole TPM/RPM limits for one region.
2+
As of 12 June 2023, one subscription can provision 30 AOAI resources per region, sharing the same TPM and RPM limits. For example, you can allocate 3 deployment instances of GPT-35-Turbo with 80K TPM/480 RPM each to utilize the whole TPM/RPM limits for one region.
33
Currently, there are four regional Cognitive services that support Azure OpenAI -EastUS, South Cental US, West Europe and France Central which allows for a maximium 120 instances of the same AOAI model can be provisoned across these four regions. This means you can achieve up to (1440x4/60) = maximium 96 request per second for your ChatGPT model. If this is still not enough to meet your production workload requirements, you can consider getting additional subscriptions to create an AOAI resources RAID.
44

55
[Read More](https://github.com/denlai-mshk/aoai-fwdproxy-funcapp)

0 commit comments

Comments
 (0)