Pause crawls instead of stopping when quotas are reached or archiving is disabled #2997

tw4l · 2025-11-18T19:18:39Z

Full backend and frontend implementation, with a new email notification to org admins when a crawl is paused because an org quota has been reached.

Backend changes

Modify operator to auto-pause crawls when quotas are reached or archiving is disabled rather than stopping the crawls
Add new crawl states: paused_storage_quota_reached, paused_time_quota_reached, paused_org_readonly
Add uploaded WACZs to org storage totals immediately after upload so that auto-paused crawls will actually put the org's bytesStored above the storage quota
Send an email from new template to all org admins when a crawl is auto-paused with information about what to do
Fix datetime deprecation in tests

Updated nightly tests all pass: https://github.com/webrecorder/browsertrix/actions/runs/19684324914

Frontend changes

Add new paused crawl states
Update checks throughout frontend for whether crawl is paused to compare against all paused states

Dependencies

Relies on crawler changes introduced in webrecorder/browsertrix-crawler#919

Out of scope

Crawl workflow counts are a bit off, counting all crawls that complete as successful regardless of state and sometimes incrementing workflow storage counts incorrectly. I started trying to address that in this branch but it's a bit involved and may require a migration so best handled separately, I think. Issue: #3011

tw4l · 2025-11-25T20:14:50Z

backend/btrixcloud/operator/crawls.py

+
+        # sizes = await redis.hkeys(f"{crawl.id}:size")
+        # for size in sizes:
+        #    await redis.hmset(f"{crawl.id}:size", {size: 0 for size in sizes})


Suggested change

# sizes = await redis.hkeys(f"{crawl.id}:size")

# for size in sizes:

# await redis.hmset(f"{crawl.id}:size", {size: 0 for size in sizes})

Remove before merging

tw4l · 2025-11-25T20:15:20Z

backend/btrixcloud/operator/crawls.py

+        print(f"pending size: {pending_size}", flush=True)
+        print(f"status.filesAdded: {status.filesAdded}", flush=True)
+        print(f"status.filesAddedSize: {status.filesAddedSize}", flush=True)
+        print(f"total: {total_size}", flush=True)
+        print(
+            f"org quota: {crawl.org.bytesStored + stats.size} <= {crawl.org.quotas.storageQuota}",
+            flush=True,
+        )
+


Suggested change

print(f"pending size: {pending_size}", flush=True)

print(f"status.filesAdded: {status.filesAdded}", flush=True)

print(f"status.filesAddedSize: {status.filesAddedSize}", flush=True)

print(f"total: {total_size}", flush=True)

print(

f"org quota: {crawl.org.bytesStored + stats.size} <= {crawl.org.quotas.storageQuota}",

flush=True,

)

Remove before merging, useful for testing

tw4l · 2025-11-25T20:17:01Z

Tagging @emma-sg @SuaYoo for review in addition to @ikreymer , with particular interest in getting your eyes on the frontend, email, and email copy parts of this. Thanks!

SuaYoo

Nice! Still doing manual testing, my initial impression is it's probably worth adding an isPaused helper to utils/crawler.

export function isPaused({ state }: { state: string | null }) {
  return state && (PAUSED_STATES as readonly string[]).includes(state);
}

ikreymer · 2025-11-26T07:25:27Z

We want to send the e-mails multiple times, if a crawl reaches quota, then is resumed, then reaches quota again, right?
If so, should also clear autoPausedEmailsSent when crawl is running again

tw4l · 2025-11-26T17:59:12Z

Nice! Still doing manual testing, my initial impression is it's probably worth adding an isPaused helper to utils/crawler.
export function isPaused({ state }: { state: string | null }) {
  return state && (PAUSED_STATES as readonly string[]).includes(state);
}

I added a helper but made it except a string or null rather than an object with state property, as none of the uses of this take an object with that key. Take a look and let me know what you think.

tw4l · 2025-11-26T18:05:53Z

We want to send the e-mails multiple times, if a crawl reaches quota, then is resumed, then reaches quota again, right? If so, should also clear autoPausedEmailsSent when crawl is running again

Done, and now storing this state in the db to be more reliable.

SuaYoo

Frontend portion looks good!

emma-sg

Email language looks good! Left a few suggestions, one splitting a sentence into two and a few just using curly quotes/removing unused code. Nice work!

I'll take another look for frontend & backend changes, just wanted to get you some feedback on the email template now.

emails/emails/crawl-auto-paused.tsx

- Backend implementation with new crawl pause states: paused_storage_quota_reached, paused_time_quota_reached, paused_org_readonly - Send an email to all org admins when crawl is auto-paused - Frontend updates Partially dependent on crawler changes introduced in webrecorder/browsertrix-crawler#919

tw4l force-pushed the issue-2957-pause-crawl-on-quota-reached branch 6 times, most recently from 4e5d015 to 6730c7f Compare November 25, 2025 17:03

tw4l marked this pull request as ready for review November 25, 2025 20:14

tw4l commented Nov 25, 2025

View reviewed changes

tw4l requested review from SuaYoo, emma-sg and ikreymer November 25, 2025 20:15

SuaYoo reviewed Nov 25, 2025

View reviewed changes

SuaYoo self-requested a review November 26, 2025 19:19

SuaYoo approved these changes Nov 26, 2025

View reviewed changes

tw4l force-pushed the issue-2957-pause-crawl-on-quota-reached branch from 7726a59 to 0ad1644 Compare November 26, 2025 20:34

emma-sg reviewed Nov 26, 2025

View reviewed changes

SuaYoo force-pushed the issue-2957-pause-crawl-on-quota-reached branch from d2cba1b to 97dd148 Compare November 27, 2025 00:03

ikreymer added this to the 1.21 Release milestone Dec 2, 2025

tw4l force-pushed the issue-2957-pause-crawl-on-quota-reached branch 2 times, most recently from 1aa8519 to 93b2bfd Compare December 2, 2025 20:25

ikreymer force-pushed the issue-2957-pause-crawl-on-quota-reached branch from 93b2bfd to 6b9d101 Compare December 2, 2025 20:58

Uh oh!

Pause crawls instead of stopping when quotas are reached or archiving is disabled #2997

Are you sure you want to change the base?

Pause crawls instead of stopping when quotas are reached or archiving is disabled #2997

Uh oh!

Conversation

tw4l commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend changes

Frontend changes

Dependencies

Out of scope

Uh oh!

tw4l Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

tw4l Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

tw4l commented Nov 25, 2025

Uh oh!

SuaYoo left a comment

Choose a reason for hiding this comment

Uh oh!

ikreymer commented Nov 26, 2025

Uh oh!

tw4l commented Nov 26, 2025

Uh oh!

tw4l commented Nov 26, 2025

Uh oh!

SuaYoo left a comment

Choose a reason for hiding this comment

Uh oh!

emma-sg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tw4l commented Nov 18, 2025 •

edited

Loading