-
-
Notifications
You must be signed in to change notification settings - Fork 327
Description
Currently, if a user wants to perform multiple actions in parallel (like interacting with multiple tabs at the same time), they must manually create tasks and use asyncio.gather. While functional, this exposes internal async logic and adds unnecessary verbosity.
We should introduce a new method in the Browser class: run_in_parallel(*coroutines), which would act as a typed and thread-safe wrapper around asyncio.gather.
This method would:
- Accept any number of coroutine objects
- Internally run them using asyncio.gather
- Return their results in order
This is a small abstraction, but it drastically improves developer experience when writing concurrent scraping logic.
Requirements:
- Must be thread-safe (in case the user calls it from different threads)
- Should be fully typed (generic return types, inference for result list)
- Should propagate exceptions properly like asyncio.gather does
Usage example:
async def scrap_google(tab: Tab):
await tab.go_to("https://google.com")
return await tab.find(tag_name="input")
async def scrap_github(tab: Tab):
await tab.go_to("https://github.com")
return await tab.find(tag_name="h1")
results = await browser.run_in_parallel(
scrap_google(tab1),
scrap_github(tab2)
)In case a large number of coroutines is passed (e.g. 20+), it may not be efficient or safe to run them all in parallel. It would be interesting to introduce a configurable limit for the maximum number of concurrent coroutines. This could be defined via the Options object (max_parallel_tasks, for example).
The run_in_parallel method would then respect this limit and execute the coroutines in batches or using a bounded semaphore internally.
This behavior is open to discussion. The main idea is to give users more control over concurrent load without manually handling throttling logic.