-
-
Notifications
You must be signed in to change notification settings - Fork 60
Pause crawls instead of stopping when quotas are reached or archiving is disabled #2997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4e5d015 to
6730c7f
Compare
|
|
||
| # sizes = await redis.hkeys(f"{crawl.id}:size") | ||
| # for size in sizes: | ||
| # await redis.hmset(f"{crawl.id}:size", {size: 0 for size in sizes}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # sizes = await redis.hkeys(f"{crawl.id}:size") | |
| # for size in sizes: | |
| # await redis.hmset(f"{crawl.id}:size", {size: 0 for size in sizes}) |
Remove before merging
| print(f"pending size: {pending_size}", flush=True) | ||
| print(f"status.filesAdded: {status.filesAdded}", flush=True) | ||
| print(f"status.filesAddedSize: {status.filesAddedSize}", flush=True) | ||
| print(f"total: {total_size}", flush=True) | ||
| print( | ||
| f"org quota: {crawl.org.bytesStored + stats.size} <= {crawl.org.quotas.storageQuota}", | ||
| flush=True, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| print(f"pending size: {pending_size}", flush=True) | |
| print(f"status.filesAdded: {status.filesAdded}", flush=True) | |
| print(f"status.filesAddedSize: {status.filesAddedSize}", flush=True) | |
| print(f"total: {total_size}", flush=True) | |
| print( | |
| f"org quota: {crawl.org.bytesStored + stats.size} <= {crawl.org.quotas.storageQuota}", | |
| flush=True, | |
| ) |
Remove before merging, useful for testing
SuaYoo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Still doing manual testing, my initial impression is it's probably worth adding an isPaused helper to utils/crawler.
export function isPaused({ state }: { state: string | null }) {
return state && (PAUSED_STATES as readonly string[]).includes(state);
}|
We want to send the e-mails multiple times, if a crawl reaches quota, then is resumed, then reaches quota again, right? |
I added a helper but made it except a string or null rather than an object with |
Done, and now storing this state in the db to be more reliable. |
SuaYoo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frontend portion looks good!
7726a59 to
0ad1644
Compare
emma-sg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Email language looks good! Left a few suggestions, one splitting a sentence into two and a few just using curly quotes/removing unused code. Nice work!
I'll take another look for frontend & backend changes, just wanted to get you some feedback on the email template now.
d2cba1b to
97dd148
Compare
1aa8519 to
93b2bfd
Compare
- Backend implementation with new crawl pause states: paused_storage_quota_reached, paused_time_quota_reached, paused_org_readonly - Send an email to all org admins when crawl is auto-paused - Frontend updates Partially dependent on crawler changes introduced in webrecorder/browsertrix-crawler#919
93b2bfd to
6b9d101
Compare
Fixes #2957
Full backend and frontend implementation, with a new email notification to org admins when a crawl is paused because an org quota has been reached.
Backend changes
paused_storage_quota_reached,paused_time_quota_reached,paused_org_readonlybytesStoredabove the storage quotaUpdated nightly tests all pass: https://github.com/webrecorder/browsertrix/actions/runs/19684324914
Frontend changes
Dependencies
Relies on crawler changes introduced in webrecorder/browsertrix-crawler#919
Out of scope
Crawl workflow counts are a bit off, counting all crawls that complete as successful regardless of state and sometimes incrementing workflow storage counts incorrectly. I started trying to address that in this branch but it's a bit involved and may require a migration so best handled separately, I think. Issue: #3011