Skip to content

Commit 251bdc7

Browse files
authored
Set playwright_page request meta key early (#91)
1 parent fa7d5f1 commit 251bdc7

File tree

2 files changed

+9
-7
lines changed

2 files changed

+9
-7
lines changed

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -239,8 +239,10 @@ class AwesomeSpiderWithPage(scrapy.Spider):
239239
* In order to avoid memory issues, it is recommended to manually close the page
240240
by awaiting the `Page.close` coroutine.
241241
* Be careful about leaving pages unclosed, as they count towards the limit set by
242-
`PLAYWRIGHT_MAX_PAGES_PER_CONTEXT`. It's recommended to set a Request errback to
243-
make sure pages are closed even if a request fails.
242+
`PLAYWRIGHT_MAX_PAGES_PER_CONTEXT`. When passing `playwright_include_page=True`,
243+
it's recommended to set a Request errback to make sure pages are closed even
244+
if a request fails (if `playwright_include_page=False` or unset, pages are
245+
automatically closed upon encountering an exception).
244246
* Any network operations resulting from awaiting a coroutine on a `Page` object
245247
(`goto`, `go_back`, etc) will be executed directly by Playwright, bypassing the
246248
Scrapy request workflow (Scheduler, Middlewares, etc).

scrapy_playwright/handler.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -211,17 +211,17 @@ async def _download_request(self, request: Request, spider: Spider) -> Response:
211211
return result
212212

213213
async def _download_request_with_page(self, request: Request, page: Page) -> Response:
214+
# set this early to make it available in errbacks even if something fails
215+
if request.meta.get("playwright_include_page"):
216+
request.meta["playwright_page"] = page
217+
214218
start_time = time()
215219
response = await page.goto(request.url)
216-
217220
await self._apply_page_methods(page, request)
218-
219221
body_str = await page.content()
220222
request.meta["download_latency"] = time() - start_time
221223

222-
if request.meta.get("playwright_include_page"):
223-
request.meta["playwright_page"] = page
224-
else:
224+
if not request.meta.get("playwright_include_page"):
225225
await page.close()
226226
self.stats.inc_value("playwright/page_count/closed")
227227

0 commit comments

Comments
 (0)