@@ -5,15 +5,17 @@ SPDX-License-Identifier: CC-BY-4.0
55
66# Snakemake Storage Plugin: Zenodo
77
8- A Snakemake storage plugin for downloading files from Zenodo with local caching and intelligent rate limiting.
8+ A Snakemake storage plugin for downloading files from Zenodo with local caching, checksum verification, and adaptive rate limiting.
99
1010## Features
1111
12- - ** Local caching** : Downloads are cached to avoid redundant transfers
13- - ** Rate limit handling** : Automatically respects Zenodo's rate limits using ` X-RateLimit-* ` headers
12+ - ** Local caching** : Downloads are cached to avoid redundant transfers (can be disabled)
13+ - ** Checksum verification** : Automatically verifies MD5 checksums from Zenodo API
14+ - ** Rate limit handling** : Automatically respects Zenodo's rate limits using ` X-RateLimit-* ` headers with exponential backoff retry
1415- ** Concurrent download control** : Limits simultaneous downloads to prevent overwhelming Zenodo
1516- ** Progress bars** : Shows download progress with tqdm
1617- ** Immutable URLs** : Returns mtime=0 since Zenodo URLs are persistent
18+ - ** Environment variable support** : Configure via environment variables for CI/CD workflows
1719
1820## Installation
1921
@@ -43,12 +45,20 @@ If you don't explicitly configure it, the plugin will use default settings autom
4345### Settings
4446
4547- ** cache** (optional): Cache directory for downloaded files
46- - Default: ` ~/.cache/snakemake/pypsaeur `
48+ - Default: Platform-dependent user cache directory (via ` platformdirs.user_cache_dir("snakemake-pypsa-eur") ` )
49+ - Set to ` "" ` (empty string) to disable caching
4750 - Files are cached here to avoid re-downloading
51+ - Environment variable: ` SNAKEMAKE_STORAGE_ZENODO_CACHE `
52+
53+ - ** skip_remote_checks** (optional): Skip metadata checking with Zenodo API
54+ - Default: ` False ` (perform checks)
55+ - Set to ` True ` or ` "1" ` to skip remote existence/size checks (useful for CI/CD)
56+ - Environment variable: ` SNAKEMAKE_STORAGE_ZENODO_SKIP_REMOTE_CHECKS `
4857
4958- ** max_concurrent_downloads** (optional): Maximum concurrent downloads
5059 - Default: ` 3 `
5160 - Controls how many Zenodo files can be downloaded simultaneously
61+ - No environment variable support
5262
5363## Usage
5464
@@ -79,15 +89,30 @@ rule download_data:
7989```
8090
8191The plugin will:
82- 1 . Check if the file exists in the cache
92+ 1 . Check if the file exists in the cache (if caching is enabled)
83932 . If cached, copy from cache (fast)
84943 . If not cached, download from Zenodo with:
8595 - Progress bar showing download status
86- - Automatic rate limit handling
96+ - Automatic rate limit handling with exponential backoff retry
8797 - Concurrent download limiting
88- 4 . Store in cache for future use
98+ - MD5 checksum verification against Zenodo API metadata
99+ 4 . Store in cache for future use (if caching is enabled)
100+
101+ ### Example: CI/CD Configuration
102+
103+ For continuous integration environments where you want to skip caching and remote checks:
104+
105+ ``` yaml
106+ # GitHub Actions example
107+ - name : Run snakemake workflows
108+ env :
109+ SNAKEMAKE_STORAGE_ZENODO_CACHE : " "
110+ SNAKEMAKE_STORAGE_ZENODO_SKIP_REMOTE_CHECKS : " 1"
111+ run : |
112+ snakemake --cores all
113+ ` ` `
89114
90- ## Rate Limiting
115+ ## Rate Limiting and Retry
91116
92117Zenodo API limits:
93118- **Guest users**: 60 requests/minute
@@ -97,6 +122,8 @@ The plugin automatically:
97122- Monitors ` X-RateLimit-Remaining` header
98123- Waits when rate limit is reached
99124- Uses `X-RateLimit-Reset` to calculate wait time
125+ - Retries failed requests with exponential backoff (up to 5 attempts)
126+ - Handles transient errors : HTTP errors, timeouts, checksum mismatches, and network issues
100127
101128# # URL Handling
102129
0 commit comments