Merge pull request #722 from Labelbox/jtso/convonb

jtsodapop · web-flow · commit ec8f95e3dcf4 · 2022-10-14T13:44:56.000-04:00
[AL-3764] Conversational Text Example MAL Notebook
diff --git a/examples/annotation_import/conversational.ipynb b/examples/annotation_import/conversational.ipynb
@@ -0,0 +1,388 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "26b9c486",
+   "metadata": {},
+   "source": [
+    "<td>\n",
+    "   <a target=\"_blank\" href=\"https://labelbox.com\" ><img src=\"https://labelbox.com/blog/content/images/2021/02/logo-v4.svg\" width=256/></a>\n",
+    "</td>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51eb4b54",
+   "metadata": {},
+   "source": [
+    "<td>\n",
+    "<a href=\"https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/annotation_import/conversational.ipynb\" target=\"_blank\"><img\n",
+    "src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
+    "</td>\n",
+    "\n",
+    "<td>\n",
+    "<a href=\"https://github.com/Labelbox/labelbox-python/tree/develop/examples/annotation_import/conversational.ipynb\" target=\"_blank\"><img\n",
+    "src=\"https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white\" alt=\"GitHub\"></a>\n",
+    "</td>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27d147e7",
+   "metadata": {},
+   "source": [
+    "# Conversational Text Annotation Import\n",
+    "* This notebook will provide examples of each supported annotation type for conversational text assets. It will cover the following:\n",
+    "    * Model-Assisted Labeling (MAL) - used to provide pre-annotated data for your labelers. This will enable a reduction in the total amount of time to properly label your assets. Model-assisted labeling does not submit the labels automatically, and will need to be reviewed by a labeler for submission."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19b346e2",
+   "metadata": {},
+   "source": [
+    "* For information on what types of annotations are supported per data type, refer to this documentation:\n",
+    "    * https://docs.labelbox.com/docs/model-assisted-labeling#option-1-import-via-python-annotation-types-recommended"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4375aef",
+   "metadata": {},
+   "source": [
+    "* Notes:\n",
+    "    * Wait until the import job is complete before opening the Editor to make sure all annotations are imported properly."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "00ad1e27",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install -q 'labelbox[data]'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ccc4c3c3",
+   "metadata": {},
+   "source": [
+    "# Imports"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "f0de1cde",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from labelbox.schema.ontology import OntologyBuilder, Tool, Classification, Option\n",
+    "from labelbox import Client, LabelingFrontend, MALPredictionImport\n",
+    "from labelbox.data.annotation_types import (\n",
+    "    Label, ImageData, ObjectAnnotation, \n",
+    "    TextEntity,\n",
+    "    Radio, Checklist, Text,\n",
+    "    ClassificationAnnotation, ClassificationAnswer\n",
+    ")\n",
+    "from labelbox.data.serialization import NDJsonConverter\n",
+    "from labelbox.schema.media_type import MediaType\n",
+    "import uuid\n",
+    "import json"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54a028dd",
+   "metadata": {},
+   "source": [
+    "# API Key and Client\n",
+    "Provide a valid api key below in order to properly connect to the Labelbox Client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "4aab38e2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Add your api key\n",
+    "API_KEY = \"YOUR API KEY\"\n",
+    "API_KEY = \"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VySWQiOiJja2NjOWZtbXc0aGNkMDczOHFpeWM2YW54Iiwib3JnYW5pemF0aW9uSWQiOiJja2N6NmJ1YnVkeWZpMDg1NW8xZHQxZzlzIiwiYXBpS2V5SWQiOiJja2V2cDF2enAwdDg0MDc1N3I2ZWZldGgzIiwiaWF0IjoxNTk5Njc0NzY0LCJleHAiOjIyMzA4MjY3NjR9.iyqPpEWNpfcjcTid5WVkXLi51g22e_l3FrK-DlFJ2mM\"\n",
+    "client = Client(api_key=API_KEY)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1763e44",
+   "metadata": {},
+   "source": [
+    "---- \n",
+    "### Steps\n",
+    "1. Make sure project is setup\n",
+    "2. Collect annotations\n",
+    "3. Upload"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d30024a7",
+   "metadata": {},
+   "source": [
+    "First, we create an ontology with all the possible tools and classifications supported for PDF. The official list of supported annotations to import can be found here:\n",
+    "- [Model-Assisted Labeling](https://docs.labelbox.com/docs/model-assisted-labeling) (annotations/labels are not submitted)\n",
+    "- [Conversational Text Annotations](https://docs.labelbox.com/docs/conversational-annotations)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "ae6f0919",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ontology_builder = OntologyBuilder(\n",
+    "  tools=[ \n",
+    "    Tool( # NER tool given the name \"ner\"\n",
+    "      tool=Tool.Type.NER, \n",
+    "      name=\"ner\")], \n",
+    "  classifications=[ \n",
+    "    Classification( # Text classification given the name \"text\"\n",
+    "      class_type=Classification.Type.TEXT,\n",
+    "      scope=Classification.Scope.INDEX,          \n",
+    "      instructions=\"text\"), \n",
+    "    Classification( # Checklist classification given the name \"text\" with two options: \"first_checklist_answer\" and \"second_checklist_answer\"\n",
+    "      class_type=Classification.Type.CHECKLIST, \n",
+    "      scope=Classification.Scope.INDEX,                     \n",
+    "      instructions=\"checklist\", \n",
+    "      options=[\n",
+    "        Option(value=\"first_checklist_answer\"),\n",
+    "        Option(value=\"second_checklist_answer\")            \n",
+    "      ]\n",
+    "    ), \n",
+    "    Classification( # Radio classification given the name \"text\" with two options: \"first_radio_answer\" and \"second_radio_answer\"\n",
+    "      class_type=Classification.Type.RADIO, \n",
+    "      instructions=\"radio\", \n",
+    "      scope=Classification.Scope.INDEX,          \n",
+    "      options=[\n",
+    "        Option(value=\"first_radio_answer\"),\n",
+    "        Option(value=\"second_radio_answer\")\n",
+    "      ]\n",
+    "    )\n",
+    "  ]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "b95935a7",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "OntologyBuilder(tools=[Tool(tool=<Type.NER: 'named-entity'>, name='ner', required=False, color=None, classifications=[], schema_id=None, feature_schema_id=None)], classifications=[Classification(class_type=<Type.TEXT: 'text'>, instructions='text', required=False, options=[], schema_id=None, feature_schema_id=None, scope=<Scope.INDEX: 'index'>), Classification(class_type=<Type.CHECKLIST: 'checklist'>, instructions='checklist', required=False, options=[Option(value='first_checklist_answer', label='first_checklist_answer', schema_id=None, feature_schema_id=None, options=[]), Option(value='second_checklist_answer', label='second_checklist_answer', schema_id=None, feature_schema_id=None, options=[])], schema_id=None, feature_schema_id=None, scope=<Scope.INDEX: 'index'>), Classification(class_type=<Type.RADIO: 'radio'>, instructions='radio', required=False, options=[Option(value='first_radio_answer', label='first_radio_answer', schema_id=None, feature_schema_id=None, options=[]), Option(value='second_radio_answer', label='second_radio_answer', schema_id=None, feature_schema_id=None, options=[])], schema_id=None, feature_schema_id=None, scope=<Scope.INDEX: 'index'>)])"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "ontology_builder"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "6b6403a1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create Labelbox project\n",
+    "mal_project = client.create_project(name=\"conversational_mal_project\", media_type=MediaType.Document)\n",
+    "\n",
+    "# Create one Labelbox dataset\n",
+    "dataset = client.create_dataset(name=\"conversational_annotation_import_demo_dataset\")\n",
+    "\n",
+    "# Grab an example asset and create a Labelbox data row\n",
+    "data_row = dataset.create_data_row(\n",
+    "    external_id = \"conversation-1\",\n",
+    "    row_data = \"https://storage.googleapis.com/labelbox-developer-testing-assets/conversational_text/1000-conversations/conversation-1.json\"\n",
+    ")\n",
+    "\n",
+    "# Setup your ontology / labeling editor\n",
+    "editor = next(client.get_labeling_frontends(where=LabelingFrontend.name == \"Editor\")) # Unless using a custom editor, do not modify this\n",
+    "\n",
+    "mal_project.setup(editor, ontology_builder.asdict()) # Connect your ontology and editor to your MAL project\n",
+    "mal_project.datasets.connect(dataset) # Connect your dataset to your MAL project"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4d3694e",
+   "metadata": {},
+   "source": [
+    "### Object Annotations"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "551ca09a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# message based ner\n",
+    "ner_annotation = { \n",
+    "        \"uuid\": str(uuid.uuid4()),\n",
+    "        \"name\": \"ner\",\n",
+    "        \"dataRow\": {\"id\": data_row.uid},\n",
+    "        \"location\": { \n",
+    "            \"start\": 0, \n",
+    "            \"end\": 8 \n",
+    "        },\n",
+    "        \"messageId\": \"4\"\n",
+    "    }"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1deaf1f1",
+   "metadata": {},
+   "source": [
+    "### Classification Annotations"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "id": "9c5d93de",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# message based classifications\n",
+    "text_annotation = {\n",
+    "    'name': 'text',\n",
+    "    'answer': 'the answer to the text questions right here',\n",
+    "    'uuid': str(uuid.uuid4()),\n",
+    "    \"dataRow\": {\"id\": data_row.uid},\n",
+    "    \"messageId\": \"0\",\n",
+    "}\n",
+    "checklist_annotation = {\n",
+    "    'name': 'checklist',\n",
+    "    'uuid': str(uuid.uuid4()),\n",
+    "    'answers': [\n",
+    "        {'name': 'first_checklist_answer'},\n",
+    "        {'name': 'second_checklist_answer'},\n",
+    "    ],\n",
+    "    \"dataRow\": {\"id\": data_row.uid},\n",
+    "    \"messageId\": \"2\",\n",
+    "}\n",
+    "\n",
+    "radio_annotation = {\n",
+    "    'name': 'radio',\n",
+    "    'uuid': str(uuid.uuid4()),    \n",
+    "    \"dataRow\": {\"id\": data_row.uid},\n",
+    "    'answer': {\n",
+    "        'name': 'first_radio_answer'\n",
+    "    },\n",
+    "    \"messageId\": \"0\",\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 56,
+   "id": "762db1d2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "annotations = [\n",
+    "    ner_annotation,\n",
+    "    text_annotation,\n",
+    "    checklist_annotation,\n",
+    "    radio_annotation\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "55be64cf",
+   "metadata": {},
+   "source": [
+    "### Model Assisted Labeling "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "id": "10a1f924",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Upload our label using Model-Assisted Labeling\n",
+    "upload_job = MALPredictionImport.create_from_objects(\n",
+    "    client = client, \n",
+    "    project_id = mal_project.uid, \n",
+    "    name=f\"mal_job-{str(uuid.uuid4())}\", \n",
+    "    predictions=annotations)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "id": "b17f6ba9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Errors: []\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Errors will appear for each annotation that failed.\n",
+    "# Empty list means that there were no errors\n",
+    "# This will provide information only after the upload_job is complete, so we do not need to worry about having to rerun\n",
+    "print(\"Errors:\", upload_job.errors)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7ee6bc98",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/examples/annotation_import/pdf.ipynb b/examples/annotation_import/pdf.ipynb
@@ -74,7 +74,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
    "id": "e3522d4b",
    "metadata": {},
    "outputs": [],

Original file line number	Diff line number	Diff line change
`@@ -74,7 +74,7 @@`
`74`	`74`	`},`
`75`	`75`	`{`
`76`	`76`	`"cell_type": "code",`
`77`		`- "execution_count": 2,`
	`77`	`+ "execution_count": 1,`
`78`	`78`	`"id": "e3522d4b",`
`79`	`79`	`"metadata": {},`
`80`	`80`	`"outputs": [],`