Skip to content

Commit 6174908

Browse files
Merge pull request #10 from NSAPH-Data-Processing/develop
Add Nepal to Pipeline
2 parents 7db187f + 5a3e0f9 commit 6174908

30 files changed

+7686
-1486
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@ logs/*
33
.snakemake/*
44
.DS_Store
55
sandbox
6-
slurm*
6+
slurm*
7+
data

.renvignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
^logs/
2+
^data/
3+
^slurm-.*\.out
4+
^_.github/
5+
sandbox
6+
logs

README.md

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,28 @@
1-
This file will be overwritten by `index.ipynb`
1+
# ERA5 Exposure Aggregation Pipeline
22

3-
In the meantime, see `notes/index.ipynb` for the notes..
3+
This repository contains a pipeline for aggregating ERA5 environmental exposures data to a 0.1 degree grid. The pipeline is designed to be run on FASRC. We developed
4+
this pipeline using `nbdev`, which means that we can create modules and scripts from notebooks.
5+
Hence, all of the documentation for how the pipeline was developed and validated is
6+
available in `notes/index.ipynb` and the associated notebooks.
7+
8+
## How to Review a PR
9+
10+
To review a PR on this repository, follow these steps:
11+
12+
0. Obtain an API key for the ERA5 datastore from [here](https://cds.climate.copernicus.eu/how-to-api), and ask Tinashe for access to the Golden Lab `googledriver` API key
13+
14+
1. Clone this repository to your workspace on FASRC
15+
16+
2. Create a conda environment with `conda create -n era5_sandbox python=3.10` and install all of the necessary dependencies for the package with `pip install -e .`
17+
18+
3. Run the `core` module to test your API key and setup the data
19+
directory structure
20+
21+
`python src/era5_sandbox/core.py`
22+
23+
4. Symlink your local data directory to the original work
24+
`ln -s [YOUR WORKING DIRECTORY]/data /n/dominici_lab/lab/data_processing/csph-era5_sandbox/data`
25+
26+
5. Dry run by removing a file from data `snakemake --dry-run`
27+
28+
6. Run the pipeline `sbatch snakemake.sbatch`

conf/aggregation/aggregation.yaml

Lines changed: 42 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,44 @@
1-
daily:
2-
function: "numpy.mean"
3-
string: mean
1+
aggregation:
2+
t2m:
3+
hourly_to_daily:
4+
- name: mean
5+
function: "numpy.nanmean"
6+
- name: min
7+
function: "numpy.nanmin"
8+
- name: max
9+
function: "numpy.nanmax"
10+
daily_to_healthshed:
11+
- name: mean
12+
function: "numpy.nanmean"
413

5-
monthly:
6-
function: "numpy.mean"
7-
string: mean
14+
d2m:
15+
hourly_to_daily:
16+
- name: mean
17+
function: "numpy.nanmean"
18+
- name: min
19+
function: "numpy.nanmin"
20+
- name: max
21+
function: "numpy.nanmax"
22+
daily_to_healthshed:
23+
- name: mean
24+
function: "numpy.nanmean"
825

9-
variable: ['t2m', 'd2m']
26+
tp:
27+
hourly_to_daily:
28+
- name: total
29+
function: "numpy.nansum"
30+
daily_to_healthshed:
31+
- name: mean
32+
function: "numpy.nanmean"
33+
34+
swvl1:
35+
hourly_to_daily:
36+
- name: mean
37+
function: "numpy.nanmean"
38+
- name: min
39+
function: "numpy.nanmin"
40+
- name: max
41+
function: "numpy.nanmax"
42+
daily_to_healthshed:
43+
- name: mean
44+
function: "numpy.nanmean"

conf/config.yaml

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ defaults:
22
- _self_
33
- datapaths: datapaths
44
- aggregation: aggregation
5+
- geographies: geographies
56

67
development_mode: false
78

@@ -17,20 +18,14 @@ mdg_shapefile: "https://data.humdata.org/dataset/26fa506b-0727-4d9d-a590-d2abee2
1718
dataset: "reanalysis-era5-single-levels"
1819

1920
query:
21+
geography: ["madagascar", "nepal"]
22+
2023
product_type: reanalysis
21-
# check precipitation
22-
# variable: ["2m_dewpoint_temperature", "2m_temperature", "skin_temperature", "total_precipitation"]
23-
variable: ["2m_dewpoint_temperature", "2m_temperature"]
24+
variable: ["2m_dewpoint_temperature", "2m_temperature", "total_precipitation", "volumetric_soil_water_layer_1"]
2425
year: [2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024]
2526
month: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
2627
day: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
2728
time: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
28-
29-
# this may have to be added for the
30-
#levtype: pl
31-
# in the current workflow we can test with a small number of healthsheds
32-
# this bounding box will need to be expanded by ~ 50km (in G's dataset it is 50) or even up to 70 or 08
33-
# we can also experiment with a buffer that follows the coastline precisely by 100KM
3429

3530
area: [0, 360, -90, 90]
3631
data_format: netcdf

conf/geographies/geographies.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
madagascar:
2+
shapefile: "https://data.humdata.org/dataset/26fa506b-0727-4d9d-a590-d2abee21ee22/resource/ed94d52e-349e-41be-80cb-62dc0435bd34/download/mdg_adm_bngrc_ocha_20181031_shp.zip"
3+
healthsheds: "healthsheds2022.zip"
4+
unique_id: "fs_uid"
5+
6+
nepal:
7+
shapefile: "https://data.humdata.org/dataset/07db728a-4f0f-4e98-8eb0-8fa9df61f01c/resource/2eb4c47f-fd6e-425d-b623-d35be1a7640e/download/npl_adm_nd_20240314_ab_shp.zip"
8+
healthsheds: "Nepal_Healthsheds2024.zip"
9+
unique_id: "fid"

data/.gitignore

Lines changed: 0 additions & 4 deletions
This file was deleted.

0 commit comments

Comments
 (0)