Skip to content

Commit 449ac0c

Browse files
authored
Merge pull request #235 from autoscrape-labs/feat/expect-download
Create `expect_download` context manager for easier download handling
2 parents 4bd50d3 + ae8e6d5 commit 449ac0c

File tree

11 files changed

+550
-21
lines changed

11 files changed

+550
-21
lines changed

README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,39 @@ await tab.request.get('https://api.example.com/data', headers=headers)
9797

9898
This opens up incredible possibilities for automation scenarios where you need both browser interaction AND API efficiency!
9999

100+
### New expect_download() context manager — robust file downloads made easy!
101+
Tired of fighting with flaky download flows, missing files, or racy event listeners? Meet `tab.expect_download()`, a delightful, reliable way to handle file downloads.
102+
103+
- Automatically sets the browser’s download behavior
104+
- Works with your own directory or a temporary folder (auto-cleaned!)
105+
- Waits for completion with a timeout (so your tests don’t hang)
106+
- Gives you a handy handle to read bytes/base64 or check `file_path`
107+
108+
Tiny example that just works:
109+
110+
```python
111+
import asyncio
112+
from pathlib import Path
113+
from pydoll.browser import Chrome
114+
115+
async def download_report():
116+
async with Chrome() as browser:
117+
tab = await browser.start()
118+
await tab.go_to('https://example.com/reports')
119+
120+
target_dir = Path('/tmp/my-downloads')
121+
async with tab.expect_download(keep_file_at=target_dir, timeout=10) as download:
122+
# Trigger the download in the page (button/link/etc.)
123+
await (await tab.find(text='Download latest report')).click()
124+
# Wait until finished and read the content
125+
data = await download.read_bytes()
126+
print(f"Downloaded {len(data)} bytes to: {download.file_path}")
127+
128+
asyncio.run(download_report())
129+
```
130+
131+
Want zero-hassle cleanup? Omit `keep_file_at` and we’ll create a temp folder and remove it automatically after the context exits. Perfect for tests.
132+
100133
### Total browser control with custom preferences! (thanks to [@LucasAlvws](https://github.com/LucasAlvws))
101134
Want to completely customize how Chrome behaves? **Now you can control EVERYTHING!**<br>
102135
The new `browser_preferences` system gives you access to hundreds of internal Chrome settings that were previously impossible to change programmatically. We're talking about deep browser customization that goes way beyond command-line flags!

README_zh.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,40 @@ options = ChromiumOptions()
195195
options.start_timeout = 20 # 等待 20 秒
196196
```
197197

198+
### 新的 expect_download() 上下文管理器 —— 稳健、优雅的文件下载!
199+
还在为不稳定的下载流程、丢失的文件或混乱的事件监听而头疼吗?`tab.expect_download()` 来了:一种可靠、简洁的下载方式。
200+
201+
- 自动配置浏览器下载行为
202+
- 支持自定义下载目录或临时目录(自动清理!)
203+
- 内置超时等待,防止任务卡住
204+
- 提供便捷句柄:读取字节/BASE64,获取 `file_path`
205+
206+
一个“开箱即用”的小示例:
207+
208+
```python
209+
import asyncio
210+
from pathlib import Path
211+
from pydoll.browser import Chrome
212+
213+
async def download_report():
214+
async with Chrome() as browser:
215+
tab = await browser.start()
216+
await tab.go_to('https://example.com/reports')
217+
218+
target_dir = Path('/tmp/my-downloads')
219+
async with tab.expect_download(keep_file_at=target_dir, timeout=10) as dl:
220+
# 触发页面上的下载(按钮/链接等)
221+
await (await tab.find(text='Download latest report')).click()
222+
223+
# 等待完成并读取内容
224+
data = await dl.read_bytes()
225+
print(f"已下载 {len(data)} 字节,保存至: {dl.file_path}")
226+
227+
asyncio.run(download_report())
228+
```
229+
230+
想要“零成本清理”?不传 `keep_file_at` 即可——我们会创建临时目录,并在上下文退出后自动清理。对测试场景非常友好。
231+
198232
## 📦 安装
199233

200234
```bash

docs/features.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,61 @@ asyncio.run(background_bypass_example())
209209

210210
Access websites that actively block automation tools without using third-party captcha solving services. This native captcha handling makes Pydoll suitable for automating previously inaccessible websites.
211211

212+
## Reliable Download Handling with expect_download
213+
214+
The `tab.expect_download()` context manager provides a robust, event-driven way to capture file downloads.
215+
216+
- Configures browser download behavior for you
217+
- Supports persistent target directory (`keep_file_at`) or temporary directory with auto-cleanup
218+
- Exposes a `_DownloadHandle` with convenience methods
219+
- Includes timeout protection to avoid indefinite waits
220+
221+
### API Overview
222+
223+
```python
224+
async with tab.expect_download(
225+
keep_file_at: Optional[str | Path] = None,
226+
timeout: Optional[float] = None,
227+
) as handle:
228+
... # trigger download action in page
229+
```
230+
231+
- `keep_file_at`: Target directory to keep the downloaded file. If `None`, a temporary directory is created and removed automatically when the context exits.
232+
- `timeout`: Maximum seconds to wait for completion (defaults to 60 if not provided).
233+
234+
`handle` exposes:
235+
236+
- `handle.file_path: Optional[str]` — final resolved path after completion
237+
- `await handle.read_bytes() -> bytes`
238+
- `await handle.read_base64() -> str`
239+
- `await handle.wait_started(timeout: Optional[float] = None) -> None`
240+
- `await handle.wait_finished(timeout: Optional[float] = None) -> None`
241+
242+
### Usage Examples
243+
244+
Persist file in a specific directory:
245+
246+
```python
247+
async with tab.expect_download(keep_file_at='/tmp/dl', timeout=15) as dl:
248+
await (await tab.find(text='Export CSV')).click()
249+
data = await dl.read_bytes()
250+
print('Saved at:', dl.file_path)
251+
```
252+
253+
Use a temporary directory (auto-cleanup) for tests:
254+
255+
```python
256+
async with tab.expect_download() as dl:
257+
await (await tab.find(text='Download PDF')).click()
258+
pdf_b64 = await dl.read_base64()
259+
# temp directory is cleaned automatically when leaving the context
260+
```
261+
262+
Notes:
263+
264+
- When the page emits no completion event within the configured `timeout`, a `DownloadTimeout` exception is raised.
265+
- If the browser does not provide a `filePath`, the manager falls back to the suggested filename in the chosen directory.
266+
212267
## Multi-Tab Management
213268

214269
Pydoll provides sophisticated tab management capabilities with a singleton pattern that ensures efficient resource usage and prevents duplicate Tab instances for the same browser tab.

docs/zh/features.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,61 @@ asyncio.run(background_bypass_example())
212212

213213
无需使用第三方验证码服务,即可访问屏蔽自动化工具的网站。
214214

215+
## 可靠的下载处理:expect_download
216+
217+
`tab.expect_download()` 提供稳健的、基于事件的文件下载捕获方式。
218+
219+
- 自动为您配置浏览器下载行为
220+
- 支持持久化目录(`keep_file_at`),或使用临时目录并在退出上下文后自动清理
221+
- 提供 `_DownloadHandle` 便捷接口
222+
- 内置超时保护,避免无限等待
223+
224+
### API 概览
225+
226+
```python
227+
async with tab.expect_download(
228+
keep_file_at: Optional[str | Path] = None,
229+
timeout: Optional[float] = None,
230+
) as handle:
231+
... # 在页面中触发下载
232+
```
233+
234+
- `keep_file_at`:指定持久化目录。若为 `None`,则使用临时目录并在退出上下文后自动清理。
235+
- `timeout`:完成等待的最大秒数(未提供时默认 60)。
236+
237+
`handle` 提供:
238+
239+
- `handle.file_path: Optional[str]` — 完成后解析出的最终文件路径
240+
- `await handle.read_bytes() -> bytes`
241+
- `await handle.read_base64() -> str`
242+
- `await handle.wait_started(timeout: Optional[float] = None) -> None`
243+
- `await handle.wait_finished(timeout: Optional[float] = None) -> None`
244+
245+
### 使用示例
246+
247+
在指定目录中持久化下载文件:
248+
249+
```python
250+
async with tab.expect_download(keep_file_at='/tmp/dl', timeout=15) as dl:
251+
await (await tab.find(text='Export CSV')).click()
252+
data = await dl.read_bytes()
253+
print('Saved at:', dl.file_path)
254+
```
255+
256+
用于测试的临时目录(自动清理):
257+
258+
```python
259+
async with tab.expect_download() as dl:
260+
await (await tab.find(text='Download PDF')).click()
261+
pdf_b64 = await dl.read_base64()
262+
# 退出上下文后临时目录会被自动清理
263+
```
264+
265+
注意:
266+
267+
- 如果在配置的 `timeout` 内页面未发出完成事件,将抛出 `DownloadTimeout` 异常。
268+
- 如果浏览器未提供 `filePath`,管理器将回退到使用建议文件名并写入选定目录。
269+
215270
## 多标签页管理
216271

217272
Pydoll 采用单例模式提供完善的标签页管理功能,确保资源高效利用,并防止同一浏览器标签页出现重复的标签页实例。

pydoll/browser/chromium/base.py

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from functools import partial
99
from random import randint
1010
from tempfile import TemporaryDirectory
11-
from typing import Any, Callable, Optional
11+
from typing import Any, Awaitable, Callable, Optional, overload
1212

1313
from pydoll.browser.interfaces import BrowserOptionsManager
1414
from pydoll.browser.managers import (
@@ -118,9 +118,9 @@ async def start(self, headless: bool = False) -> Tab:
118118
if headless:
119119
warnings.warn(
120120
"The 'headless' parameter is deprecated and will be removed in a future version. "
121-
"Use `options.headless = True` instead.",
121+
'Use `options.headless = True` instead.',
122122
DeprecationWarning,
123-
stacklevel=2
123+
stacklevel=2,
124124
)
125125
self.options.headless = headless
126126

@@ -378,9 +378,15 @@ async def reset_permissions(self, browser_context_id: Optional[str] = None):
378378
"""Reset all permissions to defaults and restore prompting behavior."""
379379
return await self._execute_command(BrowserCommands.reset_permissions(browser_context_id))
380380

381+
@overload
381382
async def on(
382383
self, event_name: str, callback: Callable[[Any], Any], temporary: bool = False
383-
) -> int:
384+
) -> int: ...
385+
@overload
386+
async def on(
387+
self, event_name: str, callback: Callable[[Any], Awaitable[Any]], temporary: bool = False
388+
) -> int: ...
389+
async def on(self, event_name, callback, temporary: bool = False) -> int:
384390
"""
385391
Register CDP event listener at browser level.
386392
@@ -409,6 +415,10 @@ async def callback_wrapper(event):
409415
event_name, function_to_register, temporary
410416
)
411417

418+
async def remove_callback(self, callback_id: int):
419+
"""Remove callback from browser."""
420+
return await self._connection_handler.remove_callback(callback_id)
421+
412422
async def enable_fetch_events(
413423
self,
414424
handle_auth_requests: bool = False,

pydoll/browser/options.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -312,10 +312,7 @@ def headless(self) -> bool:
312312
def headless(self, headless: bool):
313313
self._headless = headless
314314
has_argument = '--headless' in self.arguments
315-
methods_map = {
316-
True: self.add_argument,
317-
False: self.remove_argument
318-
}
315+
methods_map = {True: self.add_argument, False: self.remove_argument}
319316
if headless == has_argument:
320317
return
321318
methods_map[headless]('--headless')

0 commit comments

Comments
 (0)