Skip to content

Commit 9a8587c

Browse files
committed
Prepare sections
1 parent 442a569 commit 9a8587c

File tree

8 files changed

+27
-5
lines changed

8 files changed

+27
-5
lines changed

content/academy/advanced_web_scraping.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ In this course, we'll be tackling some of the most challenging and advanced web-
1313

1414
If you've managed to follow along with all of the courses prior to this one, then you're more than ready to take these upcoming lessons on 😎
1515

16+
Just like the [**Web scraping for beginners**]({{@link web_scraping_for_beginners.md}}) course, this course is divided into two main sections: **Data collection** and **Crawling**.
17+
1618
## [](#first-up) First up
1719

18-
This course's [first lesson]({{@link advanced_web_scraping/scraping_paginated_sites.md}}) dives head-first into one of the most valuable skills you can have as a scraper developer: **Scraping paginated sites**.
20+
This course's [first lesson]({{@link advanced_web_scraping/crawling/scraping_paginated_sites.md}}) dives head-first into one of the most valuable skills you can have as a scraper developer: **Scraping paginated sites**.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
title: Advanced crawling
3+
description: section description
4+
menuWeight: 8.2
5+
category: courses
6+
paths:
7+
- advanced-web-scraping/crawling
8+
---
9+
10+
# Advanced crawling

content/academy/advanced_web_scraping/scraping_paginated_sites.md renamed to content/academy/advanced_web_scraping/crawling/scraping_paginated_sites.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
---
22
title: Scraping paginated sites
33
description: Learn how to extract all of a website's listings even if they limit the number of results pages. See code examples for setting up your scraper.
4-
menuWeight: 8.1
4+
menuWeight: 1
55
paths:
6-
- advanced-web-scraping/scraping-paginated-sites
6+
- advanced-web-scraping/crawling/scraping-paginated-sites
77
---
88

99
# Scraping websites with limited pagination
1010

1111
Limited pagination is a common practice on e-commerce sites and is becoming more popular over time. It makes sense: a real user will never want to look through more than 200 pages of results – only bots love unlimited pagination. Fortunately, there are ways to overcome this limit while keeping our code clean and generic.
1212

13-
![Pagination in on Google search results page]({{@asset advanced_web_scraping/images/pagination.webp}})
13+
![Pagination in on Google search results page]({{@asset advanced_web_scraping/crawling/images/pagination.webp}})
1414

1515
> In a rush? Skip the tutorial and get the [full code example](https://github.com/metalwarrior665/apify-utils/tree/master/examples/crawler-with-filters).
1616
@@ -52,7 +52,7 @@ This has several benefits:
5252

5353
In the previous section, we analyzed different options to split the pages to overcome the pagination limit. We have chosen range filters as the most reliable way to do that. In this section, we will discuss a generic algorithm to work with ranges, look at a few special cases and then write an example crawler.
5454

55-
![An example of range filters on a website]({{@asset advanced_web_scraping/images/pagination-filters.webp}})
55+
![An example of range filters on a website]({{@asset advanced_web_scraping/crawling/images/pagination-filters.webp}})
5656

5757
### [](#the-algorithm) The algorithm
5858

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
title: Advanced data collection
3+
description: section description
4+
menuWeight: 8.1
5+
category: courses
6+
paths:
7+
- advanced-web-scraping/data-collection
8+
---
9+
10+
# Advanced data collection

0 commit comments

Comments
 (0)