|
| 1 | +# TOON Format Integration |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The ScrapeGraph SDK now supports [TOON (Token-Oriented Object Notation)](https://github.com/ScrapeGraphAI/toonify) format for API responses. TOON is a compact data format that reduces LLM token usage by **30-60%** compared to JSON, significantly lowering API costs while maintaining human readability. |
| 6 | + |
| 7 | +## What is TOON? |
| 8 | + |
| 9 | +TOON is a serialization format optimized for LLM token efficiency. It represents structured data in a more compact form than JSON while preserving all information. |
| 10 | + |
| 11 | +### Example Comparison |
| 12 | + |
| 13 | +**JSON** (247 bytes): |
| 14 | +```json |
| 15 | +{ |
| 16 | + "products": [ |
| 17 | + {"id": 101, "name": "Laptop Pro", "price": 1299}, |
| 18 | + {"id": 102, "name": "Magic Mouse", "price": 79}, |
| 19 | + {"id": 103, "name": "USB-C Cable", "price": 19} |
| 20 | + ] |
| 21 | +} |
| 22 | +``` |
| 23 | + |
| 24 | +**TOON** (98 bytes, **60% reduction**): |
| 25 | +``` |
| 26 | +products[3]{id,name,price}: |
| 27 | + 101,Laptop Pro,1299 |
| 28 | + 102,Magic Mouse,79 |
| 29 | + 103,USB-C Cable,19 |
| 30 | +``` |
| 31 | + |
| 32 | +## Benefits |
| 33 | + |
| 34 | +- ✅ **30-60% reduction** in token usage |
| 35 | +- ✅ **Lower LLM API costs** (saves $2,147 per million requests at GPT-4 pricing) |
| 36 | +- ✅ **Faster processing** due to smaller payloads |
| 37 | +- ✅ **Human-readable** format |
| 38 | +- ✅ **Lossless** conversion (preserves all data) |
| 39 | + |
| 40 | +## Usage |
| 41 | + |
| 42 | +### Installation |
| 43 | + |
| 44 | +The TOON integration is automatically available when you install the SDK: |
| 45 | + |
| 46 | +```bash |
| 47 | +pip install scrapegraph-py |
| 48 | +``` |
| 49 | + |
| 50 | +The `toonify` library is included as a dependency. |
| 51 | + |
| 52 | +### Basic Usage |
| 53 | + |
| 54 | +All scraping methods now support a `return_toon` parameter. Set it to `True` to receive responses in TOON format: |
| 55 | + |
| 56 | +```python |
| 57 | +from scrapegraph_py import Client |
| 58 | + |
| 59 | +client = Client(api_key="your-api-key") |
| 60 | + |
| 61 | +# Get response in JSON format (default) |
| 62 | +json_result = client.smartscraper( |
| 63 | + website_url="https://example.com", |
| 64 | + user_prompt="Extract product information", |
| 65 | + return_toon=False # or omit this parameter |
| 66 | +) |
| 67 | + |
| 68 | +# Get response in TOON format (30-60% fewer tokens) |
| 69 | +toon_result = client.smartscraper( |
| 70 | + website_url="https://example.com", |
| 71 | + user_prompt="Extract product information", |
| 72 | + return_toon=True |
| 73 | +) |
| 74 | +``` |
| 75 | + |
| 76 | +### Async Usage |
| 77 | + |
| 78 | +The async client also supports TOON format: |
| 79 | + |
| 80 | +```python |
| 81 | +import asyncio |
| 82 | +from scrapegraph_py import AsyncClient |
| 83 | + |
| 84 | +async def main(): |
| 85 | + async with AsyncClient(api_key="your-api-key") as client: |
| 86 | + # Get response in TOON format |
| 87 | + toon_result = await client.smartscraper( |
| 88 | + website_url="https://example.com", |
| 89 | + user_prompt="Extract product information", |
| 90 | + return_toon=True |
| 91 | + ) |
| 92 | + print(toon_result) |
| 93 | + |
| 94 | +asyncio.run(main()) |
| 95 | +``` |
| 96 | + |
| 97 | +## Supported Methods |
| 98 | + |
| 99 | +The `return_toon` parameter is available for all scraping methods: |
| 100 | + |
| 101 | +### SmartScraper |
| 102 | +```python |
| 103 | +# Sync |
| 104 | +client.smartscraper(..., return_toon=True) |
| 105 | +client.get_smartscraper(request_id, return_toon=True) |
| 106 | + |
| 107 | +# Async |
| 108 | +await client.smartscraper(..., return_toon=True) |
| 109 | +await client.get_smartscraper(request_id, return_toon=True) |
| 110 | +``` |
| 111 | + |
| 112 | +### SearchScraper |
| 113 | +```python |
| 114 | +# Sync |
| 115 | +client.searchscraper(..., return_toon=True) |
| 116 | +client.get_searchscraper(request_id, return_toon=True) |
| 117 | + |
| 118 | +# Async |
| 119 | +await client.searchscraper(..., return_toon=True) |
| 120 | +await client.get_searchscraper(request_id, return_toon=True) |
| 121 | +``` |
| 122 | + |
| 123 | +### Crawl |
| 124 | +```python |
| 125 | +# Sync |
| 126 | +client.crawl(..., return_toon=True) |
| 127 | +client.get_crawl(crawl_id, return_toon=True) |
| 128 | + |
| 129 | +# Async |
| 130 | +await client.crawl(..., return_toon=True) |
| 131 | +await client.get_crawl(crawl_id, return_toon=True) |
| 132 | +``` |
| 133 | + |
| 134 | +### AgenticScraper |
| 135 | +```python |
| 136 | +# Sync |
| 137 | +client.agenticscraper(..., return_toon=True) |
| 138 | +client.get_agenticscraper(request_id, return_toon=True) |
| 139 | + |
| 140 | +# Async |
| 141 | +await client.agenticscraper(..., return_toon=True) |
| 142 | +await client.get_agenticscraper(request_id, return_toon=True) |
| 143 | +``` |
| 144 | + |
| 145 | +### Markdownify |
| 146 | +```python |
| 147 | +# Sync |
| 148 | +client.markdownify(..., return_toon=True) |
| 149 | +client.get_markdownify(request_id, return_toon=True) |
| 150 | + |
| 151 | +# Async |
| 152 | +await client.markdownify(..., return_toon=True) |
| 153 | +await client.get_markdownify(request_id, return_toon=True) |
| 154 | +``` |
| 155 | + |
| 156 | +### Scrape |
| 157 | +```python |
| 158 | +# Sync |
| 159 | +client.scrape(..., return_toon=True) |
| 160 | +client.get_scrape(request_id, return_toon=True) |
| 161 | + |
| 162 | +# Async |
| 163 | +await client.scrape(..., return_toon=True) |
| 164 | +await client.get_scrape(request_id, return_toon=True) |
| 165 | +``` |
| 166 | + |
| 167 | +## Examples |
| 168 | + |
| 169 | +Complete examples are available in the `examples/` directory: |
| 170 | + |
| 171 | +- `examples/toon_example.py` - Sync examples demonstrating TOON format |
| 172 | +- `examples/toon_async_example.py` - Async examples demonstrating TOON format |
| 173 | + |
| 174 | +Run the examples: |
| 175 | + |
| 176 | +```bash |
| 177 | +# Set your API key |
| 178 | +export SGAI_API_KEY="your-api-key" |
| 179 | + |
| 180 | +# Run sync example |
| 181 | +python examples/toon_example.py |
| 182 | + |
| 183 | +# Run async example |
| 184 | +python examples/toon_async_example.py |
| 185 | +``` |
| 186 | + |
| 187 | +## When to Use TOON |
| 188 | + |
| 189 | +**Use TOON when:** |
| 190 | +- ✅ Passing scraped data to LLM APIs (reduces token costs) |
| 191 | +- ✅ Working with large structured datasets |
| 192 | +- ✅ Context window is limited |
| 193 | +- ✅ Token cost optimization is important |
| 194 | + |
| 195 | +**Use JSON when:** |
| 196 | +- ❌ Maximum compatibility with third-party tools is required |
| 197 | +- ❌ Data needs to be processed by JSON-only tools |
| 198 | +- ❌ Working with highly irregular/nested data |
| 199 | + |
| 200 | +## Cost Savings Example |
| 201 | + |
| 202 | +At GPT-4 pricing: |
| 203 | +- **Input tokens**: $0.01 per 1K tokens |
| 204 | +- **Output tokens**: $0.03 per 1K tokens |
| 205 | + |
| 206 | +With 50% token reduction using TOON: |
| 207 | +- **1 million API requests** with 1K tokens each |
| 208 | +- **Savings**: $2,147 per million requests |
| 209 | +- **Savings**: $5,408 per billion tokens |
| 210 | + |
| 211 | +## Technical Details |
| 212 | + |
| 213 | +The TOON integration is implemented through a converter utility (`scrapegraph_py.utils.toon_converter`) that: |
| 214 | + |
| 215 | +1. Takes the API response (dict) |
| 216 | +2. Converts it to TOON format using the `toonify` library |
| 217 | +3. Returns the TOON-formatted string |
| 218 | + |
| 219 | +The conversion is **lossless** - all data is preserved and can be converted back to the original structure using the TOON decoder. |
| 220 | + |
| 221 | +## Learn More |
| 222 | + |
| 223 | +- [Toonify GitHub Repository](https://github.com/ScrapeGraphAI/toonify) |
| 224 | +- [TOON Format Specification](https://github.com/toon-format/toon) |
| 225 | +- [ScrapeGraph Documentation](https://docs.scrapegraphai.com) |
| 226 | + |
| 227 | +## Contributing |
| 228 | + |
| 229 | +Found a bug or have a suggestion for the TOON integration? Please open an issue or submit a pull request on our [GitHub repository](https://github.com/ScrapeGraphAI/scrapegraph-sdk). |
| 230 | + |
0 commit comments