-
-
Notifications
You must be signed in to change notification settings - Fork 60
Add support for --useRobots crawler flag to Browsertrix #3029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ikreymer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working as expected! Tested the robots checking logging when the option is enabled.
I think we can come back to the --robotsAgent option if/when it is requested.
SuaYoo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor copy text suggestions. And somewhat opinionated as an API user, if the field name included a verb like useRobots the field would be consistent with useSitemap and is slightly more self documenting.
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: sua yoo <sua@webrecorder.org>
2795376 to
7a61027
Compare
Follow-up to #631 Based on feedback from webrecorder/browsertrix#3029 Renaming `--robots` to `--useRobots` will allow us to keep the Browsertrix backend API more consistent with similar flags like `--useSitemap`. Keeping `--robots` as it's a nice shorthand alias.
Fixes #2935
Adds:
useRobots, thanks Sua for the suggestion)I have not added the
robotsAgentparam that the crawler also supports as it seems like a pretty niche use case at this point, but can add if we'd prefer to do it all in one go.Dependencies
Browsertrix Crawler 1.10 (not yet released as of writing this), which should include webrecorder/browsertrix-crawler#932