Skip to content

Conversation

@tw4l
Copy link
Member

@tw4l tw4l commented Nov 27, 2025

Fixes #2935

Adds:

  • Backend API support for robots config option (now named useRobots, thanks Sua for the suggestion)
  • Add checkbox to Scope section of crawl workflow editor in frontend for all scope types
  • Documentation

I have not added the robotsAgent param that the crawler also supports as it seems like a pretty niche use case at this point, but can add if we'd prefer to do it all in one go.

Dependencies

Browsertrix Crawler 1.10 (not yet released as of writing this), which should include webrecorder/browsertrix-crawler#932

@tw4l tw4l requested review from SuaYoo, emma-sg and ikreymer November 27, 2025 18:01
Copy link
Member

@ikreymer ikreymer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working as expected! Tested the robots checking logging when the option is enabled.
I think we can come back to the --robotsAgent option if/when it is requested.

Copy link
Member

@SuaYoo SuaYoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor copy text suggestions. And somewhat opinionated as an API user, if the field name included a verb like useRobots the field would be consistent with useSitemap and is slightly more self documenting.

tw4l and others added 4 commits December 2, 2025 16:49
@tw4l tw4l force-pushed the issue-2935-robots branch from 2795376 to 7a61027 Compare December 2, 2025 21:50
@tw4l tw4l changed the title Add support for --robots crawler flag to Browsertrix Add support for --useRobots crawler flag to Browsertrix Dec 2, 2025
ikreymer pushed a commit to webrecorder/browsertrix-crawler that referenced this pull request Dec 2, 2025
Follow-up to
#631

Based on feedback from
webrecorder/browsertrix#3029

Renaming `--robots` to `--useRobots` will allow us to keep the
Browsertrix backend API more consistent with similar flags like
`--useSitemap`. Keeping `--robots` as it's a nice shorthand alias.
@ikreymer ikreymer merged commit c0e75cd into main Dec 4, 2025
29 of 31 checks passed
@ikreymer ikreymer deleted the issue-2935-robots branch December 4, 2025 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add checkbox to workflow editor to fetch robots.txt and respect disallows

5 participants