Skip to content

Commit f90969d

Browse files
committed
Adding files for new Case Study post
1 parent 8f0e822 commit f90969d

File tree

8 files changed

+1300
-0
lines changed

8 files changed

+1300
-0
lines changed
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Import temperature data from the DWD and process it\n",
8+
"\n",
9+
"This notebook pulls historical temperature data from the DWD server and formats it for future use in other projects. The data is delivered in a hourly frequencs in a .zip file for each of the available weather stations. To use the data, we need everythin in a single .csv-file, all stations side-by-side. Also, we need the daily average.\n",
10+
"\n",
11+
"To reduce computing time, we also crop all data earlier than 2007. \n",
12+
"\n",
13+
"Files should be executed in the following pipeline:\n",
14+
"* 1-dwd_konverter_download\n",
15+
"* 2-dwd_konverter_extract\n",
16+
"* 3-dwd_konverter_build_df\n",
17+
"* 4-dwd_konverter_final_processing"
18+
]
19+
},
20+
{
21+
"cell_type": "markdown",
22+
"metadata": {},
23+
"source": [
24+
"## 1.) Download files from the DWD-API\n",
25+
"Here we download all relevant files from the DWS Server. The DWD Server is http-based, so we scrape the download page for all links that match 'stundenwerte_TU_.\\*_hist.zip' and download them to the folder 'download'. \n",
26+
"\n",
27+
"Link to the relevant DWD-page: https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/historical/"
28+
]
29+
},
30+
{
31+
"cell_type": "code",
32+
"execution_count": 1,
33+
"metadata": {},
34+
"outputs": [
35+
{
36+
"name": "stdout",
37+
"output_type": "stream",
38+
"text": [
39+
"Done\n"
40+
]
41+
}
42+
],
43+
"source": [
44+
"import requests\n",
45+
"import re\n",
46+
"from bs4 import BeautifulSoup\n",
47+
"from pathlib import Path\n",
48+
"\n",
49+
"# Set base values\n",
50+
"download_folder = Path.cwd() / 'download'\n",
51+
"base_url = 'https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/historical/'\n",
52+
"\n",
53+
"\n",
54+
"# Initiate Session and get the Index-Page\n",
55+
"with requests.Session() as s:\n",
56+
" resp = s.get(base_url)\n",
57+
"\n",
58+
"# Parse the Index-Page for all relevant <a href> \n",
59+
"soup = BeautifulSoup(resp.content)\n",
60+
"links = soup.findAll(\"a\", href=re.compile(\"stundenwerte_TU_.*_hist.zip\"))\n",
61+
"\n",
62+
"# For testing, only download 10 files\n",
63+
"file_max = 10\n",
64+
"dl_count = 0\n",
65+
"\n",
66+
"#Download the .zip files to the download_folder\n",
67+
"for link in links:\n",
68+
" zip_response = requests.get(base_url + link['href'], stream=True)\n",
69+
" # Limit the downloads while testing\n",
70+
" dl_count += 1\n",
71+
" if dl_count > file_max:\n",
72+
" break\n",
73+
" with open(Path(download_folder) / link['href'], 'wb') as file:\n",
74+
" for chunk in zip_response.iter_content(chunk_size=128):\n",
75+
" file.write(chunk) \n",
76+
" \n",
77+
"print('Done')"
78+
]
79+
},
80+
{
81+
"cell_type": "code",
82+
"execution_count": null,
83+
"metadata": {},
84+
"outputs": [],
85+
"source": []
86+
}
87+
],
88+
"metadata": {
89+
"kernelspec": {
90+
"display_name": "Python 3",
91+
"language": "python",
92+
"name": "python3"
93+
},
94+
"language_info": {
95+
"codemirror_mode": {
96+
"name": "ipython",
97+
"version": 3
98+
},
99+
"file_extension": ".py",
100+
"mimetype": "text/x-python",
101+
"name": "python",
102+
"nbconvert_exporter": "python",
103+
"pygments_lexer": "ipython3",
104+
"version": "3.8.5"
105+
}
106+
},
107+
"nbformat": 4,
108+
"nbformat_minor": 4
109+
}
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Import temperature data from the DWD and process it\n",
8+
"\n",
9+
"This notebook pulls historical temperature data from the DWD server and formats it for future use in other projects. The data is delivered in a hourly frequencs in a .zip file for each of the available weather stations. To use the data, we need everythin in a single .csv-file, all stations side-by-side. Also, we need the daily average.\n",
10+
"\n",
11+
"To reduce computing time, we also crop all data earlier than 2007. \n",
12+
"\n",
13+
"Files should be executed in the following pipeline:\n",
14+
"* 1-dwd_konverter_download\n",
15+
"* 2-dwd_konverter_extract\n",
16+
"* 3-dwd_konverter_build_df\n",
17+
"* 4-dwd_konverter_final_processing"
18+
]
19+
},
20+
{
21+
"cell_type": "markdown",
22+
"metadata": {},
23+
"source": [
24+
"## 2.) Extract all .zip-archives\n",
25+
"In this next step, we extract a single file from all the downloaded .zip files and save them to the 'import' folder. Beware, there is going to be a lot of data (~6 GB of .csv files)"
26+
]
27+
},
28+
{
29+
"cell_type": "code",
30+
"execution_count": 1,
31+
"metadata": {},
32+
"outputs": [
33+
{
34+
"data": {
35+
"text/plain": [
36+
"'Done'"
37+
]
38+
},
39+
"metadata": {},
40+
"output_type": "display_data"
41+
}
42+
],
43+
"source": [
44+
"from pathlib import Path\n",
45+
"import glob\n",
46+
"import re\n",
47+
"from zipfile import ZipFile\n",
48+
"\n",
49+
"# Folder definitions\n",
50+
"download_folder = Path.cwd() / 'download'\n",
51+
"import_folder = Path.cwd() / 'import'\n",
52+
"\n",
53+
"# Find all .zip files and generate a list\n",
54+
"unzip_files = glob.glob('download/stundenwerte_TU_*_hist.zip')\n",
55+
"\n",
56+
"# Set the name pattern of the file we need\n",
57+
"regex_name = re.compile('produkt.*')\n",
58+
"\n",
59+
"# Open all files, look for files that match ne regex pattern, extract to 'import'\n",
60+
"for file in unzip_files:\n",
61+
" with ZipFile(file, 'r') as zipObj:\n",
62+
" list_of_filenames = zipObj.namelist()\n",
63+
" extract_filename = list(filter(regex_name.match, list_of_filenames))[0]\n",
64+
" zipObj.extract(extract_filename, import_folder)\n",
65+
"\n",
66+
"display('Done')"
67+
]
68+
},
69+
{
70+
"cell_type": "code",
71+
"execution_count": null,
72+
"metadata": {},
73+
"outputs": [],
74+
"source": []
75+
}
76+
],
77+
"metadata": {
78+
"kernelspec": {
79+
"display_name": "Python 3",
80+
"language": "python",
81+
"name": "python3"
82+
},
83+
"language_info": {
84+
"codemirror_mode": {
85+
"name": "ipython",
86+
"version": 3
87+
},
88+
"file_extension": ".py",
89+
"mimetype": "text/x-python",
90+
"name": "python",
91+
"nbconvert_exporter": "python",
92+
"pygments_lexer": "ipython3",
93+
"version": "3.8.5"
94+
}
95+
},
96+
"nbformat": 4,
97+
"nbformat_minor": 4
98+
}

0 commit comments

Comments
 (0)