Skip to content

Commit 40ba09b

Browse files
[AL-5857]Add ISO format to exports V2 date filters (#1108)
1 parent c3925c9 commit 40ba09b

File tree

4 files changed

+126
-82
lines changed

4 files changed

+126
-82
lines changed

examples/label_export/images.ipynb

Lines changed: 55 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
{
2+
"nbformat": 4,
3+
"nbformat_minor": 0,
4+
"metadata": {},
25
"cells": [
36
{
4-
"attachments": {},
5-
"cell_type": "markdown",
67
"metadata": {},
78
"source": [
89
"<td>\n",
910
" <a target=\"_blank\" href=\"https://labelbox.com\" ><img src=\"https://labelbox.com/blog/content/images/2021/02/logo-v4.svg\" width=256/></a>\n",
1011
"</td>"
11-
]
12+
],
13+
"cell_type": "markdown"
1214
},
1315
{
14-
"attachments": {},
15-
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
1818
"<td>\n",
@@ -24,79 +24,73 @@
2424
"<a href=\"https://github.com/Labelbox/labelbox-python/tree/master/examples/label_export/images.ipynb\" target=\"_blank\"><img\n",
2525
"src=\"https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white\" alt=\"GitHub\"></a>\n",
2626
"</td>"
27-
]
27+
],
28+
"cell_type": "markdown"
2829
},
2930
{
30-
"attachments": {},
31-
"cell_type": "markdown",
3231
"metadata": {},
3332
"source": [
3433
"How to export data, with examples for each type of export along with details on optional parameters and filters."
35-
]
34+
],
35+
"cell_type": "markdown"
3636
},
3737
{
38-
"attachments": {},
39-
"cell_type": "markdown",
4038
"metadata": {},
4139
"source": [
4240
"# Image Data Export"
43-
]
41+
],
42+
"cell_type": "markdown"
4443
},
4544
{
46-
"cell_type": "code",
47-
"execution_count": null,
4845
"metadata": {},
49-
"outputs": [],
5046
"source": [
5147
"!pip install \"labelbox[data]\" -q"
52-
]
48+
],
49+
"cell_type": "code",
50+
"outputs": [],
51+
"execution_count": null
5352
},
5453
{
55-
"cell_type": "code",
56-
"execution_count": null,
5754
"metadata": {},
58-
"outputs": [],
5955
"source": [
6056
"import labelbox as lb"
61-
]
57+
],
58+
"cell_type": "code",
59+
"outputs": [],
60+
"execution_count": null
6261
},
6362
{
64-
"attachments": {},
65-
"cell_type": "markdown",
6663
"metadata": {},
6764
"source": [
6865
"# API Key and Client\n",
6966
"Provide a valid api key below in order to properly connect to the Labelbox Client."
70-
]
67+
],
68+
"cell_type": "markdown"
7169
},
7270
{
73-
"cell_type": "code",
74-
"execution_count": null,
7571
"metadata": {},
76-
"outputs": [],
7772
"source": [
7873
"# Add your api key\n",
7974
"API_KEY = \"API KEY here\"\n",
8075
"client = lb.Client(api_key=API_KEY)"
81-
]
76+
],
77+
"cell_type": "code",
78+
"outputs": [],
79+
"execution_count": null
8280
},
8381
{
84-
"attachments": {},
85-
"cell_type": "markdown",
8682
"metadata": {},
8783
"source": [
8884
"# Export data rows from a project\n",
8985
"\n",
9086
"When you export data rows from a project, you can narrow down your data rows by last_activity_at, label_created_at, and data_row_ids. Then, when you export from a project, you may choose to include or exclude certain attributes in your export.\n",
9187
"\n",
9288
"You can view the JSON export formats for each data type here: https://docs.labelbox.com/reference/label-export#export-specifications"
93-
]
89+
],
90+
"cell_type": "markdown"
9491
},
9592
{
96-
"cell_type": "code",
97-
"execution_count": null,
9893
"metadata": {},
99-
"outputs": [],
10094
"source": [
10195
"# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed \n",
10296
"export_params= {\n",
@@ -107,8 +101,10 @@
107101
" \"performance_details\": True\n",
108102
"}\n",
109103
"\n",
110-
"# You can set the range for last_activity_at and label_created_at. You can also set a list of data \n",
111-
"# row ids to export. \n",
104+
"# You can set the range for last_activity_at and label_created_at in one these formats: \n",
105+
"# YYYY-MM-DD or YYYY-MM-DD hh:mm:ss or ISO 8601 format which is YYYY-MM-DDThh:mm:ss\u00b1hhmm\n",
106+
"# The ISO 8061 allows you to specify the timezone, but the other two formats assume timezone from the user's workspace settings.\n",
107+
"# You can also set a list of data row ids to export. \n",
112108
"# For context, last_activity_at captures the creation and modification of labels, metadata, status, comments and reviews.\n",
113109
"\n",
114110
"# Note: This is an AND logic between the filters, so usually using one filter is sufficient.\n",
@@ -127,11 +123,12 @@
127123
"\n",
128124
"export_json = export_task.result\n",
129125
"print(\"results: \", export_json)"
130-
]
126+
],
127+
"cell_type": "code",
128+
"outputs": [],
129+
"execution_count": null
131130
},
132131
{
133-
"attachments": {},
134-
"cell_type": "markdown",
135132
"metadata": {},
136133
"source": [
137134
"# Export from a dataset\n",
@@ -141,13 +138,11 @@
141138
"When exporting from Catalog, you can include information about a data row from all projects and model runs to which it belongs. Specifically, for the selected data rows, you can export the labels from multiple projects and/or the predictions from multiple model runs.\n",
142139
"\n",
143140
"As shown below, the project_ids and model_run_ids parameters accept a list of IDs"
144-
]
141+
],
142+
"cell_type": "markdown"
145143
},
146144
{
147-
"cell_type": "code",
148-
"execution_count": null,
149145
"metadata": {},
150-
"outputs": [],
151146
"source": [
152147
"# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed \n",
153148
"export_params= {\n",
@@ -176,21 +171,20 @@
176171
" print(export_task.errors)\n",
177172
"export_json = export_task.result\n",
178173
"print(\"results: \", export_json)"
179-
]
174+
],
175+
"cell_type": "code",
176+
"outputs": [],
177+
"execution_count": null
180178
},
181179
{
182-
"attachments": {},
183-
"cell_type": "markdown",
184180
"metadata": {},
185181
"source": [
186182
"# Export from a slice"
187-
]
183+
],
184+
"cell_type": "markdown"
188185
},
189186
{
190-
"cell_type": "code",
191-
"execution_count": null,
192187
"metadata": {},
193-
"outputs": [],
194188
"source": [
195189
"# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed \n",
196190
"export_params= {\n",
@@ -212,21 +206,20 @@
212206
" print(export_task.errors)\n",
213207
"export_json = export_task.result\n",
214208
"print(\"results: \", export_json)"
215-
]
209+
],
210+
"cell_type": "code",
211+
"outputs": [],
212+
"execution_count": null
216213
},
217214
{
218-
"attachments": {},
219-
"cell_type": "markdown",
220215
"metadata": {},
221216
"source": [
222217
"# Export data rows from a model run"
223-
]
218+
],
219+
"cell_type": "markdown"
224220
},
225221
{
226-
"cell_type": "code",
227-
"execution_count": null,
228222
"metadata": {},
229-
"outputs": [],
230223
"source": [
231224
"# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed \n",
232225
"export_params= {\n",
@@ -242,14 +235,10 @@
242235
"print(export_task.errors)\n",
243236
"export_json = export_task.result\n",
244237
"print(\"results: \", export_json)"
245-
]
246-
}
247-
],
248-
"metadata": {
249-
"language_info": {
250-
"name": "python"
238+
],
239+
"cell_type": "code",
240+
"outputs": [],
241+
"execution_count": null
251242
}
252-
},
253-
"nbformat": 4,
254-
"nbformat_minor": 0
255-
}
243+
]
244+
}

labelbox/schema/export_filters.py

Lines changed: 37 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
import sys
22

3-
from datetime import datetime
3+
from datetime import datetime, timezone
44
from typing import Collection, Dict, Tuple, List, Optional
5-
65
if sys.version_info >= (3, 8):
76
from typing import TypedDict
87
else:
98
from typing_extensions import TypedDict
109

1110
MAX_DATA_ROW_IDS_PER_EXPORT_V2 = 2_000
11+
ISO_8061_FORMAT = "%Y-%m-%dT%H:%M:%S%z"
1212

1313

1414
class SharedExportFilters(TypedDict):
@@ -44,20 +44,34 @@ class DatasetExportFilters(SharedExportFilters):
4444
pass
4545

4646

47-
def validate_datetime(string_date: str) -> bool:
48-
"""helper function validate that datetime is as follows: YYYY-MM-DD for the export"""
49-
if string_date:
50-
for fmt in ("%Y-%m-%d", "%Y-%m-%d %H:%M:%S"):
47+
def validate_datetime(datetime_str: str) -> bool:
48+
"""helper function to validate that datetime's format: "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss"
49+
or ISO 8061 format "YYYY-MM-DDThh:mm:ss±hhmm" (Example: "2023-05-23T14:30:00+0530")"""
50+
if datetime_str:
51+
for fmt in ("%Y-%m-%d", "%Y-%m-%d %H:%M:%S", ISO_8061_FORMAT):
5152
try:
52-
datetime.strptime(string_date, fmt)
53+
datetime.strptime(datetime_str, fmt)
5354
return True
5455
except ValueError:
5556
pass
56-
raise ValueError(f"""Incorrect format for: {string_date}.
57-
Format must be \"YYYY-MM-DD\" or \"YYYY-MM-DD hh:mm:ss\"""")
57+
raise ValueError(f"""Incorrect format for: {datetime_str}.
58+
Format must be \"YYYY-MM-DD\" or \"YYYY-MM-DD hh:mm:ss\" or ISO 8061 format \"YYYY-MM-DDThh:mm:ss±hhmm\""""
59+
)
5860
return True
5961

6062

63+
def convert_to_utc_if_iso8061(datetime_str: str, timezone_str: Optional[str]):
64+
"""helper function to convert datetime to UTC if it is in ISO_8061_FORMAT and set timezone to UTC"""
65+
try:
66+
date_obj = datetime.strptime(datetime_str, ISO_8061_FORMAT)
67+
date_obj_utc = date_obj.replace(tzinfo=timezone.utc)
68+
datetime_str = date_obj_utc.strftime(ISO_8061_FORMAT)
69+
timezone_str = "UTC"
70+
except ValueError:
71+
pass
72+
return datetime_str, timezone_str
73+
74+
6175
def build_filters(client, filters):
6276
search_query: List[Dict[str, Collection[str]]] = []
6377
timezone: Optional[str] = None
@@ -69,11 +83,12 @@ def _get_timezone() -> str:
6983

7084
last_activity_at = filters.get("last_activity_at")
7185
if last_activity_at:
72-
if timezone is None:
73-
timezone = _get_timezone()
86+
timezone = _get_timezone()
7487
start, end = last_activity_at
7588
if (start is not None and end is not None):
7689
[validate_datetime(date) for date in last_activity_at]
90+
start, timezone = convert_to_utc_if_iso8061(start, timezone)
91+
end, timezone = convert_to_utc_if_iso8061(end, timezone)
7792
search_query.append({
7893
"type": "data_row_last_activity_at",
7994
"value": {
@@ -87,6 +102,7 @@ def _get_timezone() -> str:
87102
})
88103
elif (start is not None):
89104
validate_datetime(start)
105+
start, timezone = convert_to_utc_if_iso8061(start, timezone)
90106
search_query.append({
91107
"type": "data_row_last_activity_at",
92108
"value": {
@@ -97,6 +113,7 @@ def _get_timezone() -> str:
97113
})
98114
elif (end is not None):
99115
validate_datetime(end)
116+
end, timezone = convert_to_utc_if_iso8061(end, timezone)
100117
search_query.append({
101118
"type": "data_row_last_activity_at",
102119
"value": {
@@ -108,15 +125,17 @@ def _get_timezone() -> str:
108125

109126
label_created_at = filters.get("label_created_at")
110127
if label_created_at:
111-
if timezone is None:
112-
timezone = _get_timezone()
128+
timezone = _get_timezone()
113129
start, end = label_created_at
114130
if (start is not None and end is not None):
115131
[validate_datetime(date) for date in label_created_at]
132+
start, timezone = convert_to_utc_if_iso8061(start, timezone)
133+
end, timezone = convert_to_utc_if_iso8061(end, timezone)
116134
search_query.append({
117135
"type": "labeled_at",
118136
"value": {
119137
"operator": "BETWEEN",
138+
"timezone": timezone,
120139
"value": {
121140
"min": start,
122141
"max": end
@@ -125,19 +144,23 @@ def _get_timezone() -> str:
125144
})
126145
elif (start is not None):
127146
validate_datetime(start)
147+
start, timezone = convert_to_utc_if_iso8061(start, timezone)
128148
search_query.append({
129149
"type": "labeled_at",
130150
"value": {
131151
"operator": "GREATER_THAN_OR_EQUAL",
152+
"timezone": timezone,
132153
"value": start
133154
}
134155
})
135156
elif (end is not None):
136157
validate_datetime(end)
158+
end, timezone = convert_to_utc_if_iso8061(end, timezone)
137159
search_query.append({
138160
"type": "labeled_at",
139161
"value": {
140162
"operator": "LESS_THAN_OR_EQUAL",
163+
"timezone": timezone,
141164
"value": end
142165
}
143166
})
@@ -155,5 +178,4 @@ def _get_timezone() -> str:
155178
"operator": "is",
156179
"type": "data_row_id"
157180
})
158-
159-
return search_query
181+
return search_query

tests/integration/conftest.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -661,7 +661,6 @@ def run_project_export_v2_task(cls,
661661
time.sleep(5)
662662
else:
663663
break
664-
665664
return task.result
666665

667666
@classmethod

0 commit comments

Comments
 (0)