Skip to content

Commit 997a826

Browse files
committed
new conversational notebook
1 parent b175d35 commit 997a826

File tree

2 files changed

+389
-1
lines changed

2 files changed

+389
-1
lines changed
Lines changed: 388 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,388 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "26b9c486",
6+
"metadata": {},
7+
"source": [
8+
"<td>\n",
9+
" <a target=\"_blank\" href=\"https://labelbox.com\" ><img src=\"https://labelbox.com/blog/content/images/2021/02/logo-v4.svg\" width=256/></a>\n",
10+
"</td>"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"id": "51eb4b54",
16+
"metadata": {},
17+
"source": [
18+
"<td>\n",
19+
"<a href=\"https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/annotation_import/conversational.ipynb\" target=\"_blank\"><img\n",
20+
"src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
21+
"</td>\n",
22+
"\n",
23+
"<td>\n",
24+
"<a href=\"https://github.com/Labelbox/labelbox-python/tree/develop/examples/annotation_import/conversational.ipynb\" target=\"_blank\"><img\n",
25+
"src=\"https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white\" alt=\"GitHub\"></a>\n",
26+
"</td>"
27+
]
28+
},
29+
{
30+
"cell_type": "markdown",
31+
"id": "27d147e7",
32+
"metadata": {},
33+
"source": [
34+
"# Conversational Text Annotation Import\n",
35+
"* This notebook will provide examples of each supported annotation type for conversational text assets. It will cover the following:\n",
36+
" * Model-Assisted Labeling (MAL) - used to provide pre-annotated data for your labelers. This will enable a reduction in the total amount of time to properly label your assets. Model-assisted labeling does not submit the labels automatically, and will need to be reviewed by a labeler for submission."
37+
]
38+
},
39+
{
40+
"cell_type": "markdown",
41+
"id": "19b346e2",
42+
"metadata": {},
43+
"source": [
44+
"* For information on what types of annotations are supported per data type, refer to this documentation:\n",
45+
" * https://docs.labelbox.com/docs/model-assisted-labeling#option-1-import-via-python-annotation-types-recommended"
46+
]
47+
},
48+
{
49+
"cell_type": "markdown",
50+
"id": "f4375aef",
51+
"metadata": {},
52+
"source": [
53+
"* Notes:\n",
54+
" * Wait until the import job is complete before opening the Editor to make sure all annotations are imported properly."
55+
]
56+
},
57+
{
58+
"cell_type": "code",
59+
"execution_count": 1,
60+
"id": "00ad1e27",
61+
"metadata": {},
62+
"outputs": [],
63+
"source": [
64+
"!pip install -q 'labelbox[data]'"
65+
]
66+
},
67+
{
68+
"cell_type": "markdown",
69+
"id": "ccc4c3c3",
70+
"metadata": {},
71+
"source": [
72+
"# Imports"
73+
]
74+
},
75+
{
76+
"cell_type": "code",
77+
"execution_count": 2,
78+
"id": "f0de1cde",
79+
"metadata": {},
80+
"outputs": [],
81+
"source": [
82+
"from labelbox.schema.ontology import OntologyBuilder, Tool, Classification, Option\n",
83+
"from labelbox import Client, LabelingFrontend, MALPredictionImport\n",
84+
"from labelbox.data.annotation_types import (\n",
85+
" Label, ImageData, ObjectAnnotation, \n",
86+
" TextEntity,\n",
87+
" Radio, Checklist, Text,\n",
88+
" ClassificationAnnotation, ClassificationAnswer\n",
89+
")\n",
90+
"from labelbox.data.serialization import NDJsonConverter\n",
91+
"from labelbox.schema.media_type import MediaType\n",
92+
"import uuid\n",
93+
"import json"
94+
]
95+
},
96+
{
97+
"cell_type": "markdown",
98+
"id": "54a028dd",
99+
"metadata": {},
100+
"source": [
101+
"# API Key and Client\n",
102+
"Provide a valid api key below in order to properly connect to the Labelbox Client."
103+
]
104+
},
105+
{
106+
"cell_type": "code",
107+
"execution_count": 3,
108+
"id": "4aab38e2",
109+
"metadata": {},
110+
"outputs": [],
111+
"source": [
112+
"# Add your api key\n",
113+
"API_KEY = \"YOUR API KEY\"\n",
114+
"API_KEY = \"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VySWQiOiJja2NjOWZtbXc0aGNkMDczOHFpeWM2YW54Iiwib3JnYW5pemF0aW9uSWQiOiJja2N6NmJ1YnVkeWZpMDg1NW8xZHQxZzlzIiwiYXBpS2V5SWQiOiJja2V2cDF2enAwdDg0MDc1N3I2ZWZldGgzIiwiaWF0IjoxNTk5Njc0NzY0LCJleHAiOjIyMzA4MjY3NjR9.iyqPpEWNpfcjcTid5WVkXLi51g22e_l3FrK-DlFJ2mM\"\n",
115+
"client = Client(api_key=API_KEY)"
116+
]
117+
},
118+
{
119+
"cell_type": "markdown",
120+
"id": "c1763e44",
121+
"metadata": {},
122+
"source": [
123+
"---- \n",
124+
"### Steps\n",
125+
"1. Make sure project is setup\n",
126+
"2. Collect annotations\n",
127+
"3. Upload"
128+
]
129+
},
130+
{
131+
"cell_type": "markdown",
132+
"id": "d30024a7",
133+
"metadata": {},
134+
"source": [
135+
"First, we create an ontology with all the possible tools and classifications supported for PDF. The official list of supported annotations to import can be found here:\n",
136+
"- [Model-Assisted Labeling](https://docs.labelbox.com/docs/model-assisted-labeling) (annotations/labels are not submitted)\n",
137+
"- [Conversational Text Annotations](https://docs.labelbox.com/docs/conversational-annotations)"
138+
]
139+
},
140+
{
141+
"cell_type": "code",
142+
"execution_count": 4,
143+
"id": "ae6f0919",
144+
"metadata": {},
145+
"outputs": [],
146+
"source": [
147+
"ontology_builder = OntologyBuilder(\n",
148+
" tools=[ \n",
149+
" Tool( # NER tool given the name \"ner\"\n",
150+
" tool=Tool.Type.NER, \n",
151+
" name=\"ner\")], \n",
152+
" classifications=[ \n",
153+
" Classification( # Text classification given the name \"text\"\n",
154+
" class_type=Classification.Type.TEXT,\n",
155+
" scope=Classification.Scope.INDEX, \n",
156+
" instructions=\"text\"), \n",
157+
" Classification( # Checklist classification given the name \"text\" with two options: \"first_checklist_answer\" and \"second_checklist_answer\"\n",
158+
" class_type=Classification.Type.CHECKLIST, \n",
159+
" scope=Classification.Scope.INDEX, \n",
160+
" instructions=\"checklist\", \n",
161+
" options=[\n",
162+
" Option(value=\"first_checklist_answer\"),\n",
163+
" Option(value=\"second_checklist_answer\") \n",
164+
" ]\n",
165+
" ), \n",
166+
" Classification( # Radio classification given the name \"text\" with two options: \"first_radio_answer\" and \"second_radio_answer\"\n",
167+
" class_type=Classification.Type.RADIO, \n",
168+
" instructions=\"radio\", \n",
169+
" scope=Classification.Scope.INDEX, \n",
170+
" options=[\n",
171+
" Option(value=\"first_radio_answer\"),\n",
172+
" Option(value=\"second_radio_answer\")\n",
173+
" ]\n",
174+
" )\n",
175+
" ]\n",
176+
")"
177+
]
178+
},
179+
{
180+
"cell_type": "code",
181+
"execution_count": 5,
182+
"id": "b95935a7",
183+
"metadata": {},
184+
"outputs": [
185+
{
186+
"data": {
187+
"text/plain": [
188+
"OntologyBuilder(tools=[Tool(tool=<Type.NER: 'named-entity'>, name='ner', required=False, color=None, classifications=[], schema_id=None, feature_schema_id=None)], classifications=[Classification(class_type=<Type.TEXT: 'text'>, instructions='text', required=False, options=[], schema_id=None, feature_schema_id=None, scope=<Scope.INDEX: 'index'>), Classification(class_type=<Type.CHECKLIST: 'checklist'>, instructions='checklist', required=False, options=[Option(value='first_checklist_answer', label='first_checklist_answer', schema_id=None, feature_schema_id=None, options=[]), Option(value='second_checklist_answer', label='second_checklist_answer', schema_id=None, feature_schema_id=None, options=[])], schema_id=None, feature_schema_id=None, scope=<Scope.INDEX: 'index'>), Classification(class_type=<Type.RADIO: 'radio'>, instructions='radio', required=False, options=[Option(value='first_radio_answer', label='first_radio_answer', schema_id=None, feature_schema_id=None, options=[]), Option(value='second_radio_answer', label='second_radio_answer', schema_id=None, feature_schema_id=None, options=[])], schema_id=None, feature_schema_id=None, scope=<Scope.INDEX: 'index'>)])"
189+
]
190+
},
191+
"execution_count": 5,
192+
"metadata": {},
193+
"output_type": "execute_result"
194+
}
195+
],
196+
"source": [
197+
"ontology_builder"
198+
]
199+
},
200+
{
201+
"cell_type": "code",
202+
"execution_count": 6,
203+
"id": "6b6403a1",
204+
"metadata": {},
205+
"outputs": [],
206+
"source": [
207+
"# Create Labelbox project\n",
208+
"mal_project = client.create_project(name=\"conversational_mal_project\", media_type=MediaType.Document)\n",
209+
"\n",
210+
"# Create one Labelbox dataset\n",
211+
"dataset = client.create_dataset(name=\"conversational_annotation_import_demo_dataset\")\n",
212+
"\n",
213+
"# Grab an example asset and create a Labelbox data row\n",
214+
"data_row = dataset.create_data_row(\n",
215+
" external_id = \"conversation-1\",\n",
216+
" row_data = \"https://storage.googleapis.com/labelbox-developer-testing-assets/conversational_text/1000-conversations/conversation-1.json\"\n",
217+
")\n",
218+
"\n",
219+
"# Setup your ontology / labeling editor\n",
220+
"editor = next(client.get_labeling_frontends(where=LabelingFrontend.name == \"Editor\")) # Unless using a custom editor, do not modify this\n",
221+
"\n",
222+
"mal_project.setup(editor, ontology_builder.asdict()) # Connect your ontology and editor to your MAL project\n",
223+
"mal_project.datasets.connect(dataset) # Connect your dataset to your MAL project"
224+
]
225+
},
226+
{
227+
"cell_type": "markdown",
228+
"id": "f4d3694e",
229+
"metadata": {},
230+
"source": [
231+
"### Object Annotations"
232+
]
233+
},
234+
{
235+
"cell_type": "code",
236+
"execution_count": 7,
237+
"id": "551ca09a",
238+
"metadata": {},
239+
"outputs": [],
240+
"source": [
241+
"# message based ner\n",
242+
"ner_annotation = { \n",
243+
" \"uuid\": str(uuid.uuid4()),\n",
244+
" \"name\": \"ner\",\n",
245+
" \"dataRow\": {\"id\": data_row.uid},\n",
246+
" \"location\": { \n",
247+
" \"start\": 0, \n",
248+
" \"end\": 8 \n",
249+
" },\n",
250+
" \"messageId\": \"4\"\n",
251+
" }"
252+
]
253+
},
254+
{
255+
"cell_type": "markdown",
256+
"id": "1deaf1f1",
257+
"metadata": {},
258+
"source": [
259+
"### Classification Annotations"
260+
]
261+
},
262+
{
263+
"cell_type": "code",
264+
"execution_count": 51,
265+
"id": "9c5d93de",
266+
"metadata": {},
267+
"outputs": [],
268+
"source": [
269+
"# message based classifications\n",
270+
"text_annotation = {\n",
271+
" 'name': 'text',\n",
272+
" 'answer': 'the answer to the text questions right here',\n",
273+
" 'uuid': str(uuid.uuid4()),\n",
274+
" \"dataRow\": {\"id\": data_row.uid},\n",
275+
" \"messageId\": \"0\",\n",
276+
"}\n",
277+
"checklist_annotation = {\n",
278+
" 'name': 'checklist',\n",
279+
" 'uuid': str(uuid.uuid4()),\n",
280+
" 'answers': [\n",
281+
" {'name': 'first_checklist_answer'},\n",
282+
" {'name': 'second_checklist_answer'},\n",
283+
" ],\n",
284+
" \"dataRow\": {\"id\": data_row.uid},\n",
285+
" \"messageId\": \"2\",\n",
286+
"}\n",
287+
"\n",
288+
"radio_annotation = {\n",
289+
" 'name': 'radio',\n",
290+
" 'uuid': str(uuid.uuid4()), \n",
291+
" \"dataRow\": {\"id\": data_row.uid},\n",
292+
" 'answer': {\n",
293+
" 'name': 'first_radio_answer'\n",
294+
" },\n",
295+
" \"messageId\": \"0\",\n",
296+
"}"
297+
]
298+
},
299+
{
300+
"cell_type": "code",
301+
"execution_count": 56,
302+
"id": "762db1d2",
303+
"metadata": {},
304+
"outputs": [],
305+
"source": [
306+
"annotations = [\n",
307+
" ner_annotation,\n",
308+
" text_annotation,\n",
309+
" checklist_annotation,\n",
310+
" radio_annotation\n",
311+
"]"
312+
]
313+
},
314+
{
315+
"cell_type": "markdown",
316+
"id": "55be64cf",
317+
"metadata": {},
318+
"source": [
319+
"### Model Assisted Labeling "
320+
]
321+
},
322+
{
323+
"cell_type": "code",
324+
"execution_count": 54,
325+
"id": "10a1f924",
326+
"metadata": {},
327+
"outputs": [],
328+
"source": [
329+
"# Upload our label using Model-Assisted Labeling\n",
330+
"upload_job = MALPredictionImport.create_from_objects(\n",
331+
" client = client, \n",
332+
" project_id = mal_project.uid, \n",
333+
" name=f\"mal_job-{str(uuid.uuid4())}\", \n",
334+
" predictions=annotations)"
335+
]
336+
},
337+
{
338+
"cell_type": "code",
339+
"execution_count": 55,
340+
"id": "b17f6ba9",
341+
"metadata": {},
342+
"outputs": [
343+
{
344+
"name": "stdout",
345+
"output_type": "stream",
346+
"text": [
347+
"Errors: []\n"
348+
]
349+
}
350+
],
351+
"source": [
352+
"# Errors will appear for each annotation that failed.\n",
353+
"# Empty list means that there were no errors\n",
354+
"# This will provide information only after the upload_job is complete, so we do not need to worry about having to rerun\n",
355+
"print(\"Errors:\", upload_job.errors)"
356+
]
357+
},
358+
{
359+
"cell_type": "code",
360+
"execution_count": null,
361+
"id": "7ee6bc98",
362+
"metadata": {},
363+
"outputs": [],
364+
"source": []
365+
}
366+
],
367+
"metadata": {
368+
"kernelspec": {
369+
"display_name": "Python 3",
370+
"language": "python",
371+
"name": "python3"
372+
},
373+
"language_info": {
374+
"codemirror_mode": {
375+
"name": "ipython",
376+
"version": 3
377+
},
378+
"file_extension": ".py",
379+
"mimetype": "text/x-python",
380+
"name": "python",
381+
"nbconvert_exporter": "python",
382+
"pygments_lexer": "ipython3",
383+
"version": "3.8.8"
384+
}
385+
},
386+
"nbformat": 4,
387+
"nbformat_minor": 5
388+
}

examples/annotation_import/pdf_mal.ipynb renamed to examples/annotation_import/pdf.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@
7474
},
7575
{
7676
"cell_type": "code",
77-
"execution_count": 2,
77+
"execution_count": 1,
7878
"id": "e3522d4b",
7979
"metadata": {},
8080
"outputs": [],

0 commit comments

Comments
 (0)