Skip to content

Commit 0cbc00e

Browse files
authored
Merge pull request #590 from Labelbox/yapfupdate
Yapf
2 parents f5173ae + 00372fd commit 0cbc00e

File tree

1 file changed

+78
-78
lines changed

1 file changed

+78
-78
lines changed

examples/integrations/databricks/labelbox_databricks_example.py

Lines changed: 78 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -7,31 +7,31 @@
77
# MAGIC %md
88
# MAGIC #### Pre-requisites
99
# MAGIC 1. This tutorial notebook requires a Lablbox API Key. Please login to your [Labelbox Account](app.labelbox.com) and generate an [API Key](https://app.labelbox.com/account/api-keys)
10-
# MAGIC 2. A few cells below will install the Labelbox SDK and Connector Library. This install is notebook-scoped and will not affect the rest of your cluster.
11-
# MAGIC 3. Please make sure you are running at least the latest LTS version of Databricks.
12-
# MAGIC
10+
# MAGIC 2. A few cells below will install the Labelbox SDK and Connector Library. This install is notebook-scoped and will not affect the rest of your cluster.
11+
# MAGIC 3. Please make sure you are running at least the latest LTS version of Databricks.
12+
# MAGIC
1313
# MAGIC #### Notebook Preview
14-
# MAGIC This notebook will guide you through these steps:
15-
# MAGIC 1. Connect to Labelbox via the SDK
14+
# MAGIC This notebook will guide you through these steps:
15+
# MAGIC 1. Connect to Labelbox via the SDK
1616
# MAGIC 2. Create a labeling dataset from a table of unstructured data in Databricks
1717
# MAGIC 3. Programmatically set up an ontology and labeling project in Labelbox
18-
# MAGIC 4. Load Bronze and Silver annotation tables from an example labeled project
19-
# MAGIC 5. Additional cells describe how to handle video annotations and use Labelbox Diagnostics and Catalog
20-
# MAGIC
18+
# MAGIC 4. Load Bronze and Silver annotation tables from an example labeled project
19+
# MAGIC 5. Additional cells describe how to handle video annotations and use Labelbox Diagnostics and Catalog
20+
# MAGIC
2121
# MAGIC Additional documentation links are provided at the end of the notebook.
2222

2323
# COMMAND ----------
2424

2525
# MAGIC %md
26-
# MAGIC Thanks for trying out the Databricks and Labelbox Connector! You or someone from your organization signed up for a Labelbox trial through Databricks Partner Connect. This notebook was loaded into your Shared directory to help illustrate how Labelbox and Databricks can be used together to power unstructured data workflows.
27-
# MAGIC
28-
# MAGIC Labelbox can be used to rapidly annotate a variety of unstructured data from your Data Lake ([images](https://labelbox.com/product/image), [video](https://labelbox.com/product/video), [text](https://labelbox.com/product/text), and [geospatial tiled imagery](https://docs.labelbox.com/docs/tiled-imagery-editor)) and the Labelbox Connector for Databricks makes it easy to bring the annotations back into your Lakehouse environment for AI/ML and analytical workflows.
29-
# MAGIC
30-
# MAGIC If you would like to watch a video of the workflow, check out our [Data & AI Summit Demo](https://databricks.com/session_na21/productionizing-unstructured-data-for-ai-and-analytics).
31-
# MAGIC
32-
# MAGIC
26+
# MAGIC Thanks for trying out the Databricks and Labelbox Connector! You or someone from your organization signed up for a Labelbox trial through Databricks Partner Connect. This notebook was loaded into your Shared directory to help illustrate how Labelbox and Databricks can be used together to power unstructured data workflows.
27+
# MAGIC
28+
# MAGIC Labelbox can be used to rapidly annotate a variety of unstructured data from your Data Lake ([images](https://labelbox.com/product/image), [video](https://labelbox.com/product/video), [text](https://labelbox.com/product/text), and [geospatial tiled imagery](https://docs.labelbox.com/docs/tiled-imagery-editor)) and the Labelbox Connector for Databricks makes it easy to bring the annotations back into your Lakehouse environment for AI/ML and analytical workflows.
29+
# MAGIC
30+
# MAGIC If you would like to watch a video of the workflow, check out our [Data & AI Summit Demo](https://databricks.com/session_na21/productionizing-unstructured-data-for-ai-and-analytics).
31+
# MAGIC
32+
# MAGIC
3333
# MAGIC <img src="https://labelbox.com/static/images/partnerships/collab-chart.svg" alt="example-workflow" width="800"/>
34-
# MAGIC
34+
# MAGIC
3535
# MAGIC <h5>Questions or comments? Reach out to us at [ecosystem+databricks@labelbox.com](mailto:ecosystem+databricks@labelbox.com)
3636

3737
# COMMAND ----------
@@ -41,22 +41,23 @@
4141

4242
# COMMAND ----------
4343

44-
#This will import Koalas or Pandas-on-Spark based on your DBR version.
44+
#This will import Koalas or Pandas-on-Spark based on your DBR version.
4545
from pyspark import SparkContext
4646
from packaging import version
47+
4748
sc = SparkContext.getOrCreate()
4849
if version.parse(sc.version) < version.parse("3.2.0"):
49-
import databricks.koalas as pd
50-
needs_koalas = True
50+
import databricks.koalas as pd
51+
needs_koalas = True
5152
else:
52-
import pyspark.pandas as pd
53-
needs_koalas = False
53+
import pyspark.pandas as pd
54+
needs_koalas = False
5455

5556
# COMMAND ----------
5657

5758
# MAGIC %md
5859
# MAGIC ## Configure the SDK
59-
# MAGIC
60+
# MAGIC
6061
# MAGIC Now that Labelbox and the Databricks libraries have been installed, you will need to configure the SDK. You will need an API key that you can create through the app [here](https://app.labelbox.com/account/api-keys). You can also store the key using Databricks Secrets API. The SDK will attempt to use the env var `LABELBOX_API_KEY`
6162

6263
# COMMAND ----------
@@ -65,25 +66,26 @@
6566
from labelbox.schema.ontology import OntologyBuilder, Tool, Classification, Option
6667
import labelspark
6768

68-
API_KEY = ""
69+
API_KEY = ""
70+
71+
if not (API_KEY):
72+
raise ValueError("Go to Labelbox to get an API key")
6973

70-
if not(API_KEY):
71-
raise ValueError("Go to Labelbox to get an API key")
72-
7374
client = Client(API_KEY)
7475

7576
# COMMAND ----------
7677

7778
# MAGIC %md
7879
# MAGIC ## Fetch seed data
79-
# MAGIC
80+
# MAGIC
8081
# MAGIC Next we'll load a demo dataset into a Spark table so you can see how to easily load assets into Labelbox via URL. For simplicity, you can get a Dataset ID from Labelbox and we'll load those URLs into a Spark table for you (so you don't need to worry about finding data to get this demo notebook to run). Below we'll grab the "Example Nature Dataset" included in Labelbox trials.
81-
# MAGIC
82+
# MAGIC
8283
# MAGIC Also, Labelbox has native support for AWS, Azure, and GCP cloud storage. You can connect Labelbox to your storage via [Delegated Access](https://docs.labelbox.com/docs/iam-delegated-access) and easily load those assets for annotation. For more information, you can watch this [video](https://youtu.be/wlWo6EmPDV4).
8384

8485
# COMMAND ----------
8586

86-
sample_dataset = next(client.get_datasets(where=(Dataset.name == "Example Nature Dataset")))
87+
sample_dataset = next(
88+
client.get_datasets(where=(Dataset.name == "Example Nature Dataset")))
8789
sample_dataset.uid
8890

8991
# COMMAND ----------
@@ -94,15 +96,13 @@
9496
tblList = spark.catalog.listTables()
9597

9698
if not any([table.name == SAMPLE_TABLE for table in tblList]):
97-
98-
df = pd.DataFrame([
99-
{
100-
"external_id": dr.external_id,
101-
"row_data": dr.row_data
102-
} for dr in sample_dataset.data_rows()
103-
]).to_spark()
104-
df.registerTempTable(SAMPLE_TABLE)
105-
print(f"Registered table: {SAMPLE_TABLE}")
99+
100+
df = pd.DataFrame([{
101+
"external_id": dr.external_id,
102+
"row_data": dr.row_data
103+
} for dr in sample_dataset.data_rows()]).to_spark()
104+
df.registerTempTable(SAMPLE_TABLE)
105+
print(f"Registered table: {SAMPLE_TABLE}")
106106

107107
# COMMAND ----------
108108

@@ -117,19 +117,19 @@
117117

118118
# MAGIC %md
119119
# MAGIC ## Create a Labeling Project
120-
# MAGIC
120+
# MAGIC
121121
# MAGIC Projects are where teams create labels. A project is requires a dataset of assets to be labeled and an ontology to configure the labeling interface.
122-
# MAGIC
122+
# MAGIC
123123
# MAGIC ### Step 1: Create a dataaset
124-
# MAGIC
124+
# MAGIC
125125
# MAGIC The [Labelbox Connector for Databricks](https://pypi.org/project/labelspark/) expects a spark table with two columns; the first column "external_id" and second column "row_data"
126-
# MAGIC
126+
# MAGIC
127127
# MAGIC external_id is a filename, like "birds.jpg" or "my_video.mp4"
128-
# MAGIC
129-
# MAGIC row_data is the URL path to the file. Labelbox renders assets locally on your users' machines when they label, so your labeler will need permission to access that asset.
130-
# MAGIC
131-
# MAGIC Example:
132-
# MAGIC
128+
# MAGIC
129+
# MAGIC row_data is the URL path to the file. Labelbox renders assets locally on your users' machines when they label, so your labeler will need permission to access that asset.
130+
# MAGIC
131+
# MAGIC Example:
132+
# MAGIC
133133
# MAGIC | external_id | row_data |
134134
# MAGIC |-------------|--------------------------------------|
135135
# MAGIC | image1.jpg | https://url_to_your_asset/image1.jpg |
@@ -140,7 +140,9 @@
140140

141141
unstructured_data = spark.table(SAMPLE_TABLE)
142142

143-
demo_dataset = labelspark.create_dataset(client, unstructured_data, name = "Databricks Demo Dataset")
143+
demo_dataset = labelspark.create_dataset(client,
144+
unstructured_data,
145+
name="Databricks Demo Dataset")
144146

145147
# COMMAND ----------
146148

@@ -151,9 +153,9 @@
151153

152154
# MAGIC %md
153155
# MAGIC ### Step 2: Create a project
154-
# MAGIC
156+
# MAGIC
155157
# MAGIC You can use the labelbox SDK to build your ontology (we'll do that next) You can also set your project up entirely through our website at app.labelbox.com.
156-
# MAGIC
158+
# MAGIC
157159
# MAGIC Check out our [ontology creation documentation.](https://docs.labelbox.com/docs/configure-ontology)
158160

159161
# COMMAND ----------
@@ -165,33 +167,31 @@
165167
ontology = OntologyBuilder()
166168

167169
tools = [
168-
Tool(tool=Tool.Type.BBOX, name="Frog"),
169-
Tool(tool=Tool.Type.BBOX, name="Flower"),
170-
Tool(tool=Tool.Type.BBOX, name="Fruit"),
171-
Tool(tool=Tool.Type.BBOX, name="Plant"),
172-
Tool(tool=Tool.Type.SEGMENTATION, name="Bird"),
173-
Tool(tool=Tool.Type.SEGMENTATION, name="Person"),
174-
Tool(tool=Tool.Type.SEGMENTATION, name="Sleep"),
175-
Tool(tool=Tool.Type.SEGMENTATION, name="Yak"),
176-
Tool(tool=Tool.Type.SEGMENTATION, name="Gemstone"),
170+
Tool(tool=Tool.Type.BBOX, name="Frog"),
171+
Tool(tool=Tool.Type.BBOX, name="Flower"),
172+
Tool(tool=Tool.Type.BBOX, name="Fruit"),
173+
Tool(tool=Tool.Type.BBOX, name="Plant"),
174+
Tool(tool=Tool.Type.SEGMENTATION, name="Bird"),
175+
Tool(tool=Tool.Type.SEGMENTATION, name="Person"),
176+
Tool(tool=Tool.Type.SEGMENTATION, name="Sleep"),
177+
Tool(tool=Tool.Type.SEGMENTATION, name="Yak"),
178+
Tool(tool=Tool.Type.SEGMENTATION, name="Gemstone"),
177179
]
178-
for tool in tools:
179-
ontology.add_tool(tool)
180+
for tool in tools:
181+
ontology.add_tool(tool)
180182

181183
conditions = ["clear", "overcast", "rain", "other"]
182184

183185
weather_classification = Classification(
184186
class_type=Classification.Type.RADIO,
185-
instructions="what is the weather?",
186-
options=[Option(value=c) for c in conditions]
187-
)
187+
instructions="what is the weather?",
188+
options=[Option(value=c) for c in conditions])
188189
ontology.add_classification(weather_classification)
189190

190-
191191
# Setup editor
192192
for editor in client.get_labeling_frontends():
193193
if editor.name == 'Editor':
194-
project_demo.setup(editor, ontology.asdict())
194+
project_demo.setup(editor, ontology.asdict())
195195

196196
print("Project Setup is complete.")
197197

@@ -213,7 +213,7 @@
213213

214214
# MAGIC %md
215215
# MAGIC ##Exporting labels/annotations
216-
# MAGIC
216+
# MAGIC
217217
# MAGIC After creating labels in Labelbox you can export them to use in Databricks for model training and analysis.
218218

219219
# COMMAND ----------
@@ -230,38 +230,38 @@
230230

231231
# MAGIC %md
232232
# MAGIC ## Other features of Labelbox
233-
# MAGIC
233+
# MAGIC
234234
# MAGIC <h3> [Model Assisted Labeling](https://docs.labelbox.com/docs/model-assisted-labeling) </h3>
235235
# MAGIC Once you train a model on your initial set of unstructured data, you can plug that model into Labelbox to support a Model Assisted Labeling workflow. Review the outputs of your model, make corrections, and retrain with ease! You can reduce future labeling costs by >50% by leveraging model assisted labeling.
236-
# MAGIC
236+
# MAGIC
237237
# MAGIC <img src="https://files.readme.io/4c65e12-model-assisted-labeling.png" alt="MAL" width="800"/>
238-
# MAGIC
238+
# MAGIC
239239
# MAGIC <h3> [Catalog](https://docs.labelbox.com/docs/catalog) </h3>
240-
# MAGIC Once you've created datasets and annotations in Labelbox, you can easily browse your datasets and curate new ones in Catalog. Use your model embeddings to find images by similarity search.
241-
# MAGIC
240+
# MAGIC Once you've created datasets and annotations in Labelbox, you can easily browse your datasets and curate new ones in Catalog. Use your model embeddings to find images by similarity search.
241+
# MAGIC
242242
# MAGIC <img src="https://files.readme.io/14f82d4-catalog-marketing.jpg" alt="Catalog" width="800"/>
243-
# MAGIC
243+
# MAGIC
244244
# MAGIC <h3> [Model Diagnostics](https://labelbox.com/product/model-diagnostics) </h3>
245-
# MAGIC Labelbox complements your MLFlow experiment tracking with the ability to easily visualize experiment predictions at scale. Model Diagnostics helps you quickly identify areas where your model is weak so you can collect the right data and refine the next model iteration.
246-
# MAGIC
245+
# MAGIC Labelbox complements your MLFlow experiment tracking with the ability to easily visualize experiment predictions at scale. Model Diagnostics helps you quickly identify areas where your model is weak so you can collect the right data and refine the next model iteration.
246+
# MAGIC
247247
# MAGIC <img src="https://images.ctfassets.net/j20krz61k3rk/4LfIELIjpN6cou4uoFptka/20cbdc38cc075b82f126c2c733fb7082/identify-patterns-in-your-model-behavior.png" alt="Diagnostics" width="800"/>
248248

249249
# COMMAND ----------
250250

251251
# DBTITLE 1,More Info
252252
# MAGIC %md
253-
# MAGIC While using the Labelbox Connector for Databricks, you will likely use the Labelbox SDK (e.g. for programmatic ontology creation). These resources will help familiarize you with the Labelbox Python SDK:
253+
# MAGIC While using the Labelbox Connector for Databricks, you will likely use the Labelbox SDK (e.g. for programmatic ontology creation). These resources will help familiarize you with the Labelbox Python SDK:
254254
# MAGIC * [Visit our docs](https://labelbox.com/docs/python-api) to learn how the SDK works
255255
# MAGIC * Checkout our [notebook examples](https://github.com/Labelbox/labelspark/tree/master/notebooks) to follow along with interactive tutorials
256256
# MAGIC * view our [API reference](https://labelbox.com/docs/python-api/api-reference).
257-
# MAGIC
257+
# MAGIC
258258
# MAGIC <h4>Questions or comments? Reach out to us at [ecosystem+databricks@labelbox.com](mailto:ecosystem+databricks@labelbox.com)
259259

260260
# COMMAND ----------
261261

262262
# MAGIC %md
263263
# MAGIC Copyright Labelbox, Inc. 2021. The source in this notebook is provided subject to the [Labelbox Terms of Service](https://docs.labelbox.com/page/terms-of-service). All included or referenced third party libraries are subject to the licenses set forth below.
264-
# MAGIC
264+
# MAGIC
265265
# MAGIC |Library Name|Library license | Library License URL | Library Source URL |
266266
# MAGIC |---|---|---|---|
267267
# MAGIC |Labelbox Python SDK|Apache-2.0 License |https://github.com/Labelbox/labelbox-python/blob/develop/LICENSE|https://github.com/Labelbox/labelbox-python

0 commit comments

Comments
 (0)