|
1 | | -# Langchain ScraperAPI Integration |
| 1 | +# LangChain – ScraperAPI |
2 | 2 |
|
3 | | -This package contains the LangChain integration with ScraperAPI. |
| 3 | +Give your AI agent the ability to browse websites, search Google and Amazon in just two lines of code. |
| 4 | + |
| 5 | +The `langchain-scraperapi` package adds three ready-to-use LangChain tools backed by the [ScraperAPI](https://www.scraperapi.com/) service: |
| 6 | + |
| 7 | +| Tool class | Use it to | |
| 8 | +|------------|----------| |
| 9 | +| `ScraperAPITool` | Grab the HTML/text/markdown of any web page | |
| 10 | +| `ScraperAPIGoogleSearchTool` | Get structured Google Search SERP data | |
| 11 | +| `ScraperAPIAmazonSearchTool` | Get structured Amazon product-search data | |
4 | 12 |
|
5 | 13 | ## Installation |
6 | 14 |
|
7 | 15 | ```bash |
8 | 16 | pip install -U langchain-scraperapi |
9 | 17 | ``` |
10 | 18 |
|
11 | | -And you should configure credentials by setting the environment variable `SCRAPERAPI_API_KEY`. |
| 19 | +## Setup |
| 20 | + |
| 21 | +Create an account at https://www.scraperapi.com/ and get an API key, then set it as an environment variable: |
12 | 22 |
|
13 | | -## Tools |
| 23 | +```python |
| 24 | +import os |
| 25 | +os.environ["SCRAPERAPI_API_KEY"] = "your-api-key" |
| 26 | +``` |
14 | 27 |
|
15 | | -### ScraperAPITool |
| 28 | +## Quick Start |
16 | 29 |
|
17 | | -`ScraperAPITool` exposes the web scraping tool from ScraperAPI. |
| 30 | +### ScraperAPITool — Browse any website |
| 31 | + |
| 32 | +Scrape HTML, text, or markdown from any webpage: |
18 | 33 |
|
19 | 34 | ```python |
20 | | -from langchain_scraperapi import ScraperAPITool |
| 35 | +from langchain_scraperapi.tools import ScraperAPITool |
21 | 36 |
|
22 | 37 | tool = ScraperAPITool() |
23 | | -tool.invoke("url: http://example.com", "output_format": "markdown") |
| 38 | + |
| 39 | +# Get text content |
| 40 | +result = tool.invoke({ |
| 41 | + "url": "https://example.com", |
| 42 | + "output_format": "text", |
| 43 | + "render": True |
| 44 | +}) |
| 45 | +print(result) |
24 | 46 | ``` |
25 | 47 |
|
26 | | -### ScraperAPIGoogleSearchTool |
| 48 | +**Parameters:** |
| 49 | +- `url` (required) – target page URL |
| 50 | +- `output_format` – `"text"` | `"markdown"` (default returns HTML) |
| 51 | +- `country_code` – e.g. `"us"`, `"de"` |
| 52 | +- `device_type` – `"desktop"` | `"mobile"` |
| 53 | +- `premium` – use premium proxies |
| 54 | +- `render` – run JavaScript before returning content |
| 55 | +- `keep_headers` – include response headers |
27 | 56 |
|
28 | | -`ScraperAPIGoogleSearchTool` allows the scraping of Google search results in `json` or `csv` format. |
| 57 | +### ScraperAPIGoogleSearchTool — Structured Google Search |
| 58 | + |
| 59 | +Get structured Google Search results: |
29 | 60 |
|
30 | 61 | ```python |
31 | | -from langchain_scraperapi import ScraperAPIGoogleSearchTool |
| 62 | +from langchain_scraperapi.tools import ScraperAPIGoogleSearchTool |
| 63 | + |
| 64 | +google_search = ScraperAPIGoogleSearchTool() |
32 | 65 |
|
33 | | -tool = ScraperAPIGoogleSearchTool() |
34 | | -tool.invoke("query": "What is ScraperAPI?") |
| 66 | +results = google_search.invoke({ |
| 67 | + "query": "what is langchain", |
| 68 | + "num": 20, |
| 69 | + "output_format": "json" |
| 70 | +}) |
| 71 | +print(results) |
35 | 72 | ``` |
36 | 73 |
|
37 | | -### ScraperAPIAmazonSearchTool |
| 74 | +**Parameters:** |
| 75 | +- `query` (required) – search terms |
| 76 | +- `output_format` – `"json"` (default) or `"csv"` |
| 77 | +- `country_code`, `tld`, `num`, `hl`, `gl` – optional search modifiers |
38 | 78 |
|
39 | | -`ScraperAPIAmazonSearchTool` allows the scraping of Amazon search results in `json` or `csv` format. |
| 79 | +### ScraperAPIAmazonSearchTool — Structured Amazon Search |
| 80 | + |
| 81 | +Get structured Amazon product search results: |
40 | 82 |
|
41 | 83 | ```python |
42 | | -from langchain_scraperapi import ScraperAPIAmazonSearchTool |
| 84 | +from langchain_scraperapi.tools import ScraperAPIAmazonSearchTool |
| 85 | + |
| 86 | +amazon_search = ScraperAPIAmazonSearchTool() |
| 87 | + |
| 88 | +products = amazon_search.invoke({ |
| 89 | + "query": "noise cancelling headphones", |
| 90 | + "tld": "co.uk", |
| 91 | + "page": 2 |
| 92 | +}) |
| 93 | +print(products) |
| 94 | +``` |
43 | 95 |
|
44 | | -tool = ScraperAPIAmazonSearchTool() |
45 | | -tool.invoke("query": "office chairs", "output_format": "csv") |
| 96 | +**Parameters:** |
| 97 | +- `query` (required) – product search terms |
| 98 | +- `output_format` – `"json"` (default) or `"csv"` |
| 99 | +- `country_code`, `tld`, `page` – optional search modifiers |
| 100 | + |
| 101 | +## Example: AI Agent that can browse the web |
| 102 | + |
| 103 | +```python |
| 104 | +from langchain_openai import ChatOpenAI |
| 105 | +from langchain.agents import AgentExecutor, create_tool_calling_agent |
| 106 | +from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder |
| 107 | +from langchain_scraperapi.tools import ScraperAPITool |
| 108 | + |
| 109 | +# Set up tools and LLM |
| 110 | +tools = [ScraperAPITool()] |
| 111 | +llm = ChatOpenAI(model_name="gpt-4o", temperature=0) |
| 112 | + |
| 113 | +# Create prompt |
| 114 | +prompt = ChatPromptTemplate.from_messages([ |
| 115 | + ("system", "You are a helpful assistant that can browse websites. Use ScraperAPITool to access web content."), |
| 116 | + ("human", "{input}"), |
| 117 | + MessagesPlaceholder(variable_name="agent_scratchpad"), |
| 118 | +]) |
| 119 | + |
| 120 | +# Create and run agent |
| 121 | +agent = create_tool_calling_agent(llm, tools, prompt) |
| 122 | +agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True) |
| 123 | + |
| 124 | +response = agent_executor.invoke({ |
| 125 | + "input": "Browse hackernews and summarize the top story" |
| 126 | +}) |
46 | 127 | ``` |
47 | 128 |
|
48 | | -For a full list of parameters and more information, refer to the ScraperAPI Python docs: https://docs.scraperapi.com/python/making-requests/structured-data-collection-method |
| 129 | +## Documentation |
| 130 | + |
| 131 | +For complete parameter details and advanced usage, see the [ScraperAPI documentation](https://docs.scraperapi.com/python/making-requests/customizing-requests). |
0 commit comments