This project is a userscript designed to scrape lab and course information from the Google Skills Catalog. It's built using Vite and vite-plugin-monkey, allowing it to run as a browser extension/userscript. The script provides functionality to add download links to the catalog page for labs and courses, and also to scrape the catalog data directly, handling pagination and exporting the data as a CSV file.
- Download Links: Adds buttons to the Google Skills Catalog page to directly download lab and course data.
- Catalog Scraping: Automatically navigates through catalog pages, scrapes activity data, and compiles it into a CSV file.
- Userscript Integration: Leverages userscript managers (like Tampermonkey or Violentmonkey) for seamless browser integration.
- Vite: A fast build tool for modern web projects.
- vite-plugin-monkey: A Vite plugin for developing userscripts.
- JavaScript: The core language for the scraper logic.
-
A userscript manager installed in your browser (e.g., Tampermonkey or Violentmonkey).
- Clone the repository:
git clone https://github.com/chriskyfung/qwiklabs-catalog-scraper.git cd qwiklabs-catalog-scraper - Install dependencies:
npm install
- Build the userscript:
This will generate the userscript file (e.g.,
npm run build
dist/qwiklabs-catalog-scraper.user.js).
- Open your userscript manager's dashboard in your browser.
- Create a new userscript.
- Copy the content of the generated userscript file (
dist/qwiklabs-catalog-scraper.user.js) and paste it into the new userscript editor. - Save the userscript.
- Navigate to the Google Skills Catalog in your browser.
- The userscript will automatically activate.
- Download Links: On the main catalog page, you will see "Scrape catalog as CSV" buttons to download data for labs and courses.
- Direct Scraping: If you navigate to a specific catalog URL (e.g., one generated by the download links), the script will automatically begin scraping the data and prompt you to download a CSV file once complete.
- Development (watch mode):
This will build the userscript and watch for changes, automatically rebuilding on file modifications.
npm run dev
- Linting and Formatting:
This command runs both ESLint and Prettier to ensure code quality and consistency.
npm run lint
- ESLint only:
npm run eslint
- Prettier only:
npm run prettier
src/index.js: The main entry point of the userscript, handling URL-based logic and userscript menu commands.src/modules/catalog-page.js: Contains functions for adding download links to the catalog page.src/modules/scraper.js: Implements the core scraping logic, including pagination handling and CSV generation.src/modules/dom-utils.js: Provides utility functions for DOM manipulation and data extraction from activity cards.src/modules/downloader.js: Handles the client-side download of generated CSV files.
Feel free to open issues or submit pull requests if you have suggestions or improvements.
This project is licensed under the GPL-3.0 License. See the LICENSE file for details.
