Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .changelog/1763060740.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
applies_to:
- client
authors:
- annahay
references: []
breaking: false
new_feature: true
bug_fix: false
---
Add support for configurable token bucket success reward and fractional token management
2 changes: 1 addition & 1 deletion .github/workflows/manual-pull-request-bot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ jobs:
contents: read
needs:
- get-pr-info
runs-on: ubuntu-latest
runs-on: smithy_ubuntu-latest_8-core
steps:
- uses: GitHubSecurityLab/actions-permissions/monitor@v1
- uses: actions/checkout@v4
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ class RetryPartitionTest {
"RetryPartition" to RuntimeType.smithyRuntime(ctx.runtimeConfig).resolve("client::retries::RetryPartition"),
"RuntimeComponents" to RuntimeType.runtimeComponents(ctx.runtimeConfig),
"TokenBucket" to RuntimeType.smithyRuntime(ctx.runtimeConfig).resolve("client::retries::TokenBucket"),
"MAXIMUM_CAPACITY" to RuntimeType.smithyRuntime(ctx.runtimeConfig).resolve("client::retries::token_bucket::MAXIMUM_CAPACITY"),
)
crate.integrationTest("custom_retry_partition") {
tokioTest("test_custom_token_bucket") {
Expand All @@ -139,7 +140,8 @@ class RetryPartitionTest {
) -> Result<(), #{BoxError}> {
self.called.fetch_add(1, Ordering::Relaxed);
let token_bucket = cfg.load::<#{TokenBucket}>().unwrap();
let expected = format!("permits: {}", tokio::sync::Semaphore::MAX_PERMITS);
let max_capacity = #{MAXIMUM_CAPACITY};
let expected = format!("permits: {}", max_capacity);
assert!(
format!("{token_bucket:?}").contains(&expected),
"Expected debug output to contain `{expected}`, but got: {token_bucket:?}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -210,8 +210,11 @@ impl RetryStrategy for StandardRetryStrategy {
.unwrap_or(false);
update_rate_limiter_if_exists(runtime_components, cfg, is_throttling_error);

// on success release any retry quota held by previous attempts
// on success release any retry quota held by previous attempts and award success tokens
if !ctx.is_failed() {
// When a request succeeds, we grant an award, if present
token_bucket.reward_success();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will reward every successful attempt even retries, is that what you were going for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the intention


if let NoPermitWasReleased = self.release_retry_permit() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to worry about add_permits panicking if we exceed MAX_PERMITS? e.g. since we are now rewarding on success (or I believe eventually you want to refill on time as well) then on Drop a released permit puts it's tokens back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the CI testing revealed that let tokens_to_add = amount.min(self.max_permits - self.semaphore.available_permits()); was causing panics in the case where available > max. I've updated this in a new commit so that in the case where available > max, we just return without adding permits.

In the case where we have a race condition that results in the number of available permits being greater than max after a Drop occurs, it will self-correct when a sufficient number of tokens are acquired. My initial assessment is that this is something that we are at risk of regardless of where the drop and add_permits are in relation to each other, due to multi-threading.

// In the event that there was no retry permit to release, we generate new
// permits from nothing. We do this to make up for permits we had to "forget".
Expand Down
Loading
Loading