Skip to content

A userscript designed to scrape lab and course information from the Google Skills Catalog

License

Notifications You must be signed in to change notification settings

chriskyfung/qwiklabs-catalog-scraper

Repository files navigation

Qwiklabs Catalog Scraper

This project is a userscript designed to scrape lab and course information from the Google Skills Catalog. It's built using Vite and vite-plugin-monkey, allowing it to run as a browser extension/userscript. The script provides functionality to add download links to the catalog page for labs and courses, and also to scrape the catalog data directly, handling pagination and exporting the data as a CSV file.

License: GPL v3 GitHub issues

Features

  • Download Links: Adds buttons to the Google Skills Catalog page to directly download lab and course data.
  • Catalog Scraping: Automatically navigates through catalog pages, scrapes activity data, and compiles it into a CSV file.
  • Userscript Integration: Leverages userscript managers (like Tampermonkey or Violentmonkey) for seamless browser integration.

Technologies Used

  • Vite: A fast build tool for modern web projects.
  • vite-plugin-monkey: A Vite plugin for developing userscripts.
  • JavaScript: The core language for the scraper logic.

Installation and Usage

Prerequisites

Building the Userscript

  1. Clone the repository:
    git clone https://github.com/chriskyfung/qwiklabs-catalog-scraper.git
    cd qwiklabs-catalog-scraper
  2. Install dependencies:
    npm install
  3. Build the userscript:
    npm run build
    This will generate the userscript file (e.g., dist/qwiklabs-catalog-scraper.user.js).

Installing the Userscript

  1. Open your userscript manager's dashboard in your browser.
  2. Create a new userscript.
  3. Copy the content of the generated userscript file (dist/qwiklabs-catalog-scraper.user.js) and paste it into the new userscript editor.
  4. Save the userscript.

Running the Scraper

  1. Navigate to the Google Skills Catalog in your browser.
  2. The userscript will automatically activate.
    • Download Links: On the main catalog page, you will see "Scrape catalog as CSV" buttons to download data for labs and courses.
    • Direct Scraping: If you navigate to a specific catalog URL (e.g., one generated by the download links), the script will automatically begin scraping the data and prompt you to download a CSV file once complete.

Screenshot

Discover learning for in-demand skills. Scrape catalog as CSV: Scrape All labs or All courses

Development

Development Commands

  • Development (watch mode):
    npm run dev
    This will build the userscript and watch for changes, automatically rebuilding on file modifications.
  • Linting and Formatting:
    npm run lint
    This command runs both ESLint and Prettier to ensure code quality and consistency.
  • ESLint only:
    npm run eslint
  • Prettier only:
    npm run prettier

Code Structure

  • src/index.js: The main entry point of the userscript, handling URL-based logic and userscript menu commands.
  • src/modules/catalog-page.js: Contains functions for adding download links to the catalog page.
  • src/modules/scraper.js: Implements the core scraping logic, including pagination handling and CSV generation.
  • src/modules/dom-utils.js: Provides utility functions for DOM manipulation and data extraction from activity cards.
  • src/modules/downloader.js: Handles the client-side download of generated CSV files.

Contributing

Feel free to open issues or submit pull requests if you have suggestions or improvements.

License

This project is licensed under the GPL-3.0 License. See the LICENSE file for details.

About

A userscript designed to scrape lab and course information from the Google Skills Catalog

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

  •  

Packages

No packages published

Contributors 2

  •  
  •