Skip to content

Commit 2725686

Browse files
authored
Regression fix: Update failed crawl database object after deleting files (#2993)
Fixes #2991 After deleting files (e.g. WACZs uploaded while a crawl was paused) for canceled or otherwise failed crawls, ensure we also update the crawl database object. This fixes a regression introduced by crawl pausing, which resulted in org storage numbers being incorrect when later deleting the canceled crawl as a consequence of the crawl files not having been deleted from the database at the same time as they were deleted from storage. It also renames the basecrawls `delete_crawl_files` method to `delete_failed_crawl_files` to make purpose clearer, as it is only used by the operator and should only be used for failed crawls (when deleting successful crawls, there are other workflow- and org-related updates that are handled by other codepaths).
1 parent 9a5a9d1 commit 2725686

File tree

2 files changed

+14
-4
lines changed

2 files changed

+14
-4
lines changed

backend/btrixcloud/basecrawls.py

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -426,11 +426,21 @@ async def _delete_crawl_files(
426426

427427
return size
428428

429-
async def delete_crawl_files(self, crawl_id: str, oid: UUID):
430-
"""Delete crawl files"""
429+
async def delete_failed_crawl_files(self, crawl_id: str, oid: UUID):
430+
"""Delete crawl files for failed crawl"""
431431
crawl = await self.get_base_crawl(crawl_id)
432432
org = await self.orgs.get_org_by_id(oid)
433-
return await self._delete_crawl_files(crawl, org)
433+
await self._delete_crawl_files(crawl, org)
434+
await self.crawls.find_one_and_update(
435+
{"_id": crawl_id, "oid": oid},
436+
{
437+
"$set": {
438+
"files": [],
439+
"fileCount": 0,
440+
"fileSize": 0,
441+
}
442+
},
443+
)
434444

435445
async def delete_all_crawl_qa_files(self, crawl_id: str, org: Organization):
436446
"""Delete files for all qa runs in a crawl"""

backend/btrixcloud/operator/crawls.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1707,7 +1707,7 @@ async def do_crawl_finished_tasks(
17071707
)
17081708

17091709
if state in FAILED_STATES:
1710-
await self.crawl_ops.delete_crawl_files(crawl.id, crawl.oid)
1710+
await self.crawl_ops.delete_failed_crawl_files(crawl.id, crawl.oid)
17111711
await self.page_ops.delete_crawl_pages(crawl.id, crawl.oid)
17121712

17131713
await self.event_webhook_ops.create_crawl_finished_notification(

0 commit comments

Comments
 (0)