@@ -55,6 +55,8 @@ Also, be sure to [install the `asyncio`-based Twisted reactor](https://docs.scra
5555TWISTED_REACTOR = " twisted.internet.asyncioreactor.AsyncioSelectorReactor"
5656```
5757
58+ ### Settings
59+
5860` scrapy-playwright ` accepts the following settings:
5961
6062* ` PLAYWRIGHT_BROWSER_TYPE ` (type ` str ` , default ` chromium ` )
@@ -67,7 +69,28 @@ TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
6769
6870* ` PLAYWRIGHT_CONTEXT_ARGS ` (type ` dict ` , default ` {} ` )
6971
70- A dictionary with keyword arguments to be passed when creating the default Browser context.
72+ A dictionary with default keyword arguments to be passed when creating the
73+ "default" Browser context.
74+
75+ ** Deprecated: use ` PLAYWRIGHT_CONTEXTS ` instead**
76+
77+ * ` PLAYWRIGHT_CONTEXTS ` (type ` dict[str, dict] ` , default ` {} ` )
78+
79+ A dictionary which defines Browser contexts to be created on startup.
80+ It should be a mapping of (name, keyword arguments) For instance:
81+ ``` python
82+ {
83+ " first" : {
84+ " context_arg1" : " value" ,
85+ " context_arg2" : " value" ,
86+ },
87+ " second" : {
88+ " context_arg1" : " value" ,
89+ },
90+ }
91+ ```
92+ If no contexts are defined, a default context (called `default` ) is created.
93+ The arguments passed here take precedence over the ones defined in `PLAYWRIGHT_CONTEXT_ARGS ` .
7194 See the docs for [`Browser.new_context` ](https:// playwright.dev/ python/ docs/ api/ class - browser# browsernew_contextkwargs).
7295
7396* `PLAYWRIGHT_DEFAULT_NAVIGATION_TIMEOUT ` (type `Optional[int ]` , default `None ` )
@@ -104,42 +127,7 @@ class AwesomeSpider(scrapy.Spider):
104127```
105128
106129
107- ## Page coroutines
108-
109- A sorted iterable (` list ` , ` tuple ` or ` dict ` , for instance) could be passed
110- in the ` playwright_page_coroutines `
111- [ Request.meta] ( https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta )
112- key to request coroutines to be awaited on the ` Page ` before returning the final
113- ` Response ` to the callback.
114-
115- This is useful when you need to perform certain actions on a page, like scrolling
116- down or clicking links, and you want everything to count as a single Scrapy
117- Response, containing the final result.
118-
119- ### Supported actions
120-
121- * ` scrapy_playwright.page.PageCoroutine(method: str, *args, **kwargs) ` :
122-
123- _ Represents a coroutine to be awaited on a ` playwright.page.Page ` object,
124- such as "click", "screenshot", "evaluate", etc.
125- ` method ` should be the name of the coroutine, ` *args ` and ` **kwargs `
126- are passed to the function call._
127-
128- _ The coroutine result will be stored in the ` PageCoroutine.result ` attribute_
129-
130- For instance,
131- ``` python
132- PageCoroutine(" screenshot" , path = " quotes.png" , fullPage = True )
133- ```
134-
135- produces the same effect as :
136- ```python
137- # 'page' is a playwright.async_api.Page object
138- await page.screenshot(path = " quotes.png" , fullPage = True )
139- ```
140-
141-
142- # ## Receiving the Page object in the callback
130+ # # Receiving the Page object in the callback
143131
144132Specifying a non- False value for the `playwright_include_page` `meta` key for a
145133request will result in the corresponding `playwright.async_api.Page` object
@@ -176,6 +164,109 @@ class AwesomeSpiderWithPage(scrapy.Spider):
176164 Scrapy request workflow (Scheduler, Middlewares, etc).
177165
178166
167+ # # Multiple browser contexts
168+
169+ Multiple [browser contexts](https:// playwright.dev/ python/ docs/ core- concepts/ # browser-contexts)
170+ to be launched at startup can be defined via the `PLAYWRIGHT_CONTEXTS ` [setting](# settings).
171+
172+ # ## Choosing a specific context for a request
173+
174+ Pass the name of the desired context in the `playwright_context` meta key:
175+
176+ ```python
177+ yield scrapy.Request(
178+ url = " https://example.org" ,
179+ meta = {" playwright" : True , " playwright_context" : " first" },
180+ )
181+ ```
182+
183+ # ## Creating a context during a crawl
184+
185+ If the context specified in the `playwright_context` meta key does not exist, it will be created.
186+ You can specify keyword arguments to be passed to
187+ [`Browser.new_context` ](https:// playwright.dev/ python/ docs/ api/ class - browser# browsernew_contextkwargs)
188+ in the `playwright_context_kwargs` meta key:
189+
190+ ```python
191+ yield scrapy.Request(
192+ url = " https://example.org" ,
193+ meta = {
194+ " playwright" : True ,
195+ " playwright_context" : " new" ,
196+ " playwright_context_kwargs" : {
197+ " java_script_enabled" : False ,
198+ " ignore_https_errors" : True ,
199+ " proxy" : {
200+ " server" : " http://myproxy.com:3128" ,
201+ " username" : " user" ,
202+ " password" : " pass" ,
203+ },
204+ },
205+ },
206+ )
207+ ```
208+
209+ Please note that if a context with the specified name already exists,
210+ that context is used and `playwright_context_kwargs` are ignored.
211+
212+ # ## Closing a context during a crawl
213+
214+ After [receiving the Page object in your callback](# receiving-the-page-object-in-the-callback),
215+ you can access a context though the corresponding [`Page.context` ](https:// playwright.dev/ python/ docs/ api/ class - page# page-context)
216+ attribute, and await [`close` ](https:// playwright.dev/ python/ docs/ api/ class - browsercontext# browser-context-close) on it.
217+
218+ ```python
219+ def parse(self , response):
220+ yield scrapy.Request(
221+ url = " https://example.org" ,
222+ callback = self .parse_in_new_context,
223+ meta = {" playwright" : True , " playwright_context" : " new" , " playwright_include_page" : True },
224+ )
225+
226+ async def parse_in_new_context(self , response):
227+ page = response.meta[" playwright_page" ]
228+ title = await page.title()
229+ await page.context.close() # close the context
230+ await page.close()
231+ return {" title" : title}
232+ ```
233+
234+
235+ # # Page coroutines
236+
237+ A sorted iterable (`list ` , `tuple ` or `dict ` , for instance) could be passed
238+ in the `playwright_page_coroutines`
239+ [Request.meta](https:// docs.scrapy.org/ en/ latest/ topics/ request- response.html# scrapy.http.Request.meta)
240+ key to request coroutines to be awaited on the `Page` before returning the final
241+ `Response` to the callback.
242+
243+ This is useful when you need to perform certain actions on a page, like scrolling
244+ down or clicking links, and you want everything to count as a single Scrapy
245+ Response, containing the final result.
246+
247+ # ## Supported actions
248+
249+ * `scrapy_playwright.page.PageCoroutine(method: str , * args, ** kwargs)` :
250+
251+ _Represents a coroutine to be awaited on a `playwright.page.Page` object ,
252+ such as " click" , " screenshot" , " evaluate" , etc.
253+ `method` should be the name of the coroutine, `* args` and `** kwargs`
254+ are passed to the function call._
255+
256+ _The coroutine result will be stored in the `PageCoroutine.result` attribute_
257+
258+ For instance,
259+ ```python
260+ PageCoroutine(" screenshot" , path = " quotes.png" , fullPage = True )
261+ ```
262+
263+ produces the same effect as :
264+ ```python
265+ # 'page' is a playwright.async_api.Page object
266+ await page.screenshot(path = " quotes.png" , fullPage = True )
267+ ```
268+
269+
179270# # Examples
180271
181272** Click on a link, save the resulting page as PDF **
0 commit comments