Skip to content

Commit c6c20d9

Browse files
Merge pull request #530 from Labelbox/batches-notebook
Created Batches notebook.
2 parents 2fe156f + b16c52d commit c6c20d9

File tree

1 file changed

+329
-0
lines changed

1 file changed

+329
-0
lines changed

examples/basics/batches.ipynb

Lines changed: 329 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,329 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "db768cda",
6+
"metadata": {
7+
"id": "db768cda"
8+
},
9+
"source": [
10+
"<td>\n",
11+
" <a target=\"_blank\" href=\"https://labelbox.com\" ><img src=\"https://labelbox.com/blog/content/images/2021/02/logo-v4.svg\" width=256/></a>\n",
12+
"</td>"
13+
]
14+
},
15+
{
16+
"cell_type": "markdown",
17+
"id": "cb5611d0",
18+
"metadata": {
19+
"id": "cb5611d0"
20+
},
21+
"source": [
22+
"<td>\n",
23+
"<a href=\"https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/basics/data_rows.ipynb\" target=\"_blank\"><img\n",
24+
"src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
25+
"</td>\n",
26+
"\n",
27+
"<td>\n",
28+
"<a href=\"https://github.com/Labelbox/labelbox-python/tree/develop/examples/basics/data_rows.ipynb\" target=\"_blank\"><img\n",
29+
"src=\"https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white\" alt=\"GitHub\"></a>\n",
30+
"</td>"
31+
]
32+
},
33+
{
34+
"cell_type": "markdown",
35+
"source": [
36+
"## Batches (*Currently in Public Beta*)"
37+
],
38+
"metadata": {
39+
"id": "Lup2QNWjaxKg"
40+
},
41+
"id": "Lup2QNWjaxKg"
42+
},
43+
{
44+
"cell_type": "markdown",
45+
"source": [
46+
"* A Batch is collection of datarows picked out of a Data Set.\n",
47+
"* A Datarow cannot be part of more than one batch in a project.\n",
48+
"* Batches work for all data types, but there should only be one data type per batch.\n",
49+
"* Batches may not be shared between projects.\n",
50+
"* Batches may have Datarows from multiple Datasets.\n",
51+
"* Datarows can only be attached to a Project as part of a single Batch.\n",
52+
"* You can set priority for each Batch."
53+
],
54+
"metadata": {
55+
"id": "KONWmRQkadPf"
56+
},
57+
"id": "KONWmRQkadPf"
58+
},
59+
{
60+
"cell_type": "code",
61+
"execution_count": null,
62+
"metadata": {
63+
"id": "HoW5ypnyzpqb"
64+
},
65+
"outputs": [],
66+
"source": [
67+
"!pip install labelbox[data]"
68+
],
69+
"id": "HoW5ypnyzpqb"
70+
},
71+
{
72+
"cell_type": "code",
73+
"execution_count": null,
74+
"metadata": {
75+
"id": "6-Us9Gj1zpqc"
76+
},
77+
"outputs": [],
78+
"source": [
79+
"from labelbox import DataRow, Client\n",
80+
"import random"
81+
],
82+
"id": "6-Us9Gj1zpqc"
83+
},
84+
{
85+
"cell_type": "markdown",
86+
"metadata": {
87+
"id": "qQiozm-dzpqd"
88+
},
89+
"source": [
90+
"Set the following cell with your data to run this notebook:"
91+
],
92+
"id": "qQiozm-dzpqd"
93+
},
94+
{
95+
"cell_type": "code",
96+
"execution_count": null,
97+
"metadata": {
98+
"id": "84Zna5c0zpqd"
99+
},
100+
"outputs": [],
101+
"source": [
102+
"PROJECT_NAME = \"Batch Queue Demo\" #text project\n",
103+
"DATASET_NAME = \"Batch Queue Demo Data\""
104+
],
105+
"id": "84Zna5c0zpqd"
106+
},
107+
{
108+
"cell_type": "markdown",
109+
"id": "b0b09aee",
110+
"metadata": {
111+
"id": "b0b09aee"
112+
},
113+
"source": [
114+
"# API Key and Client\n",
115+
"Provide a valid api key below in order to properly connect to the Labelbox Client."
116+
]
117+
},
118+
{
119+
"cell_type": "code",
120+
"execution_count": null,
121+
"metadata": {
122+
"id": "Ge-dfNh-zpqe"
123+
},
124+
"outputs": [],
125+
"source": [
126+
"# Add your api key\n",
127+
"API_KEY = None\n",
128+
"client = Client(api_key=API_KEY)"
129+
],
130+
"id": "Ge-dfNh-zpqe"
131+
},
132+
{
133+
"cell_type": "code",
134+
"execution_count": null,
135+
"metadata": {
136+
"id": "nMVtBYQmzpqe"
137+
},
138+
"outputs": [],
139+
"source": [
140+
"dataset = client.create_dataset(name=DATASET_NAME)\n",
141+
"\n",
142+
"uploads = []\n",
143+
"for i in range(10):\n",
144+
" uploads.append({\n",
145+
" 'external_id': i,\n",
146+
" 'row_data': 'https://picsum.photos/200/300'\n",
147+
" })\n",
148+
"dataset.create_data_rows(uploads)"
149+
],
150+
"id": "nMVtBYQmzpqe"
151+
},
152+
{
153+
"cell_type": "markdown",
154+
"source": [
155+
"# Ensure project is in batch mode:"
156+
],
157+
"metadata": {
158+
"id": "61CvCD3C7qv6"
159+
},
160+
"id": "61CvCD3C7qv6"
161+
},
162+
{
163+
"cell_type": "code",
164+
"source": [
165+
"project = client.create_project(name=PROJECT_NAME)\n",
166+
"project.update(queue_mode=project.QueueMode.Batch)"
167+
],
168+
"metadata": {
169+
"id": "tqtT4q31787T"
170+
},
171+
"id": "tqtT4q31787T",
172+
"execution_count": null,
173+
"outputs": []
174+
},
175+
{
176+
"cell_type": "markdown",
177+
"source": [
178+
"# Collect Datarow id's:"
179+
],
180+
"metadata": {
181+
"id": "Xti9AoZWELrq"
182+
},
183+
"id": "Xti9AoZWELrq"
184+
},
185+
{
186+
"cell_type": "markdown",
187+
"source": [
188+
"### Select All Data Rows from dataset."
189+
],
190+
"metadata": {
191+
"id": "9JVLsXdevywS"
192+
},
193+
"id": "9JVLsXdevywS"
194+
},
195+
{
196+
"cell_type": "code",
197+
"source": [
198+
"data_rows = [dr.uid for dr in list(dataset.export_data_rows())]"
199+
],
200+
"metadata": {
201+
"id": "U4C1ZyJ2EgTS"
202+
},
203+
"id": "U4C1ZyJ2EgTS",
204+
"execution_count": null,
205+
"outputs": []
206+
},
207+
{
208+
"cell_type": "markdown",
209+
"source": [
210+
"### Randomly sample\n",
211+
"\n",
212+
"Rather than selecting all of the data we sample 5 data rows at random"
213+
],
214+
"metadata": {
215+
"id": "B0UqO_O1V8ei"
216+
},
217+
"id": "B0UqO_O1V8ei"
218+
},
219+
{
220+
"cell_type": "code",
221+
"source": [
222+
"sample = random.sample(data_rows, 5)"
223+
],
224+
"metadata": {
225+
"id": "WJAXBf1bV-td"
226+
},
227+
"id": "WJAXBf1bV-td",
228+
"execution_count": null,
229+
"outputs": []
230+
},
231+
{
232+
"cell_type": "markdown",
233+
"source": [
234+
"# Batch Manipulation"
235+
],
236+
"metadata": {
237+
"id": "UPdaTqkgYyvt"
238+
},
239+
"id": "UPdaTqkgYyvt"
240+
},
241+
{
242+
"cell_type": "markdown",
243+
"source": [
244+
"### Create a Batch:"
245+
],
246+
"metadata": {
247+
"id": "Al-K1lBBEjtb"
248+
},
249+
"id": "Al-K1lBBEjtb"
250+
},
251+
{
252+
"cell_type": "code",
253+
"source": [
254+
"batch = project.create_batch(\n",
255+
" \"first batch\", # Each batch in a project must have a unique name\n",
256+
" sample, # A list of data rows or data row ids\n",
257+
" 5 # priority between 1(Highest) - 5(lowest)\n",
258+
")"
259+
],
260+
"metadata": {
261+
"id": "resH3xqeErVv"
262+
},
263+
"id": "resH3xqeErVv",
264+
"execution_count": null,
265+
"outputs": []
266+
},
267+
{
268+
"cell_type": "code",
269+
"source": [
270+
"# number of data rows in the batch\n",
271+
"batch.size"
272+
],
273+
"metadata": {
274+
"id": "gFio7ONOWYdJ"
275+
},
276+
"id": "gFio7ONOWYdJ",
277+
"execution_count": null,
278+
"outputs": []
279+
},
280+
{
281+
"cell_type": "markdown",
282+
"metadata": {
283+
"id": "8Cj64Isxzpqe"
284+
},
285+
"source": [
286+
"### List DataRows in a Batch (Not supported yet)\n",
287+
"Note: You can view your batch through in the Data Row table of the project"
288+
],
289+
"id": "8Cj64Isxzpqe"
290+
},
291+
{
292+
"cell_type": "markdown",
293+
"metadata": {
294+
"id": "rU7iddSQzpqg"
295+
},
296+
"source": [
297+
"### Remove queued data rows by batch (Not supported yet)\n",
298+
"Note: You can do this through the batch management pane on the data rows tab of the project"
299+
],
300+
"id": "rU7iddSQzpqg"
301+
}
302+
],
303+
"metadata": {
304+
"kernelspec": {
305+
"display_name": "Python 3",
306+
"language": "python",
307+
"name": "python3"
308+
},
309+
"language_info": {
310+
"codemirror_mode": {
311+
"name": "ipython",
312+
"version": 3
313+
},
314+
"file_extension": ".py",
315+
"mimetype": "text/x-python",
316+
"name": "python",
317+
"nbconvert_exporter": "python",
318+
"pygments_lexer": "ipython3",
319+
"version": "3.8.5"
320+
},
321+
"colab": {
322+
"name": "Batches.ipynb",
323+
"provenance": [],
324+
"collapsed_sections": []
325+
}
326+
},
327+
"nbformat": 4,
328+
"nbformat_minor": 5
329+
}

0 commit comments

Comments
 (0)