Skip to content

Commit 34a701f

Browse files
gustavocidornelaswhoseoyster
authored andcommitted
Introduce classic ML quickstart notebook
1 parent 478cb46 commit 34a701f

File tree

1 file changed

+320
-0
lines changed

1 file changed

+320
-0
lines changed
Lines changed: 320 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,320 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "ef55abc9",
6+
"metadata": {},
7+
"source": [
8+
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openlayer-ai/examples-gallery/blob/main/tabular-classification/quickstart/tabular-quickstart.ipynb)\n",
9+
"\n",
10+
"\n",
11+
"# <a id=\"top\">Development quickstart</a>\n",
12+
"\n",
13+
"This notebook illustrates a typical development flow using Openlayer.\n",
14+
"\n",
15+
"\n",
16+
"## <a id=\"toc\">Table of contents</a>\n",
17+
"\n",
18+
"1. [**Creating a project**](#project) \n",
19+
"\n",
20+
"2. [**Uploading datasets**](#dataset)\n",
21+
"\n",
22+
"3. [**Uploading a model**](#model)\n",
23+
"\n",
24+
"4. [**Committing and pushing**](#push)"
25+
]
26+
},
27+
{
28+
"cell_type": "markdown",
29+
"id": "ccf87aeb",
30+
"metadata": {},
31+
"source": [
32+
"## <a id=\"project\"> 1. Creating a project</a>\n",
33+
"\n",
34+
"[Back to top](#top)"
35+
]
36+
},
37+
{
38+
"cell_type": "code",
39+
"execution_count": null,
40+
"id": "1c132263",
41+
"metadata": {},
42+
"outputs": [],
43+
"source": [
44+
"!pip install openlayer"
45+
]
46+
},
47+
{
48+
"cell_type": "code",
49+
"execution_count": null,
50+
"id": "2ea07b37",
51+
"metadata": {},
52+
"outputs": [],
53+
"source": [
54+
"import openlayer\n",
55+
"from openlayer.tasks import TaskType\n",
56+
"\n",
57+
"client = openlayer.OpenlayerClient(\"YOUR_API_KEY_HERE\")\n",
58+
"\n",
59+
"project = client.create_or_load_project(\n",
60+
" name=\"Churn Prediction\",\n",
61+
" task_type=TaskType.TabularClassification,\n",
62+
")\n",
63+
"\n",
64+
"# Or \n",
65+
"# project = client.load_project(name=\"Your project name here\")"
66+
]
67+
},
68+
{
69+
"cell_type": "markdown",
70+
"id": "79f8626c",
71+
"metadata": {},
72+
"source": [
73+
"## <a id=\"dataset\"> 2. Uploading datasets </a>\n",
74+
"\n",
75+
"[Back to top](#top)\n",
76+
"\n",
77+
"### <a id=\"download-datasets\"> Downloading the training and validation sets </a>"
78+
]
79+
},
80+
{
81+
"cell_type": "code",
82+
"execution_count": null,
83+
"id": "e1069378",
84+
"metadata": {},
85+
"outputs": [],
86+
"source": [
87+
"%%bash\n",
88+
"\n",
89+
"if [ ! -e \"churn_train.csv\" ]; then\n",
90+
" curl \"https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/tabular-classification/documentation/churn_train.csv\" --output \"churn_train.csv\"\n",
91+
"fi\n",
92+
"\n",
93+
"if [ ! -e \"churn_val.csv\" ]; then\n",
94+
" curl \"https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/tabular-classification/documentation/churn_val.csv\" --output \"churn_val.csv\"\n",
95+
"fi"
96+
]
97+
},
98+
{
99+
"cell_type": "code",
100+
"execution_count": null,
101+
"id": "31eda871",
102+
"metadata": {},
103+
"outputs": [],
104+
"source": [
105+
"import pandas as pd\n",
106+
"\n",
107+
"train_df = pd.read_csv(\"./churn_train.csv\")\n",
108+
"val_df = pd.read_csv(\"./churn_val.csv\")"
109+
]
110+
},
111+
{
112+
"cell_type": "markdown",
113+
"id": "35ae1754",
114+
"metadata": {},
115+
"source": [
116+
"Now, imagine that we have trained a model using this training set. Then, we used the trained model to get the predictions for the training and validation sets. Let's add these predictions as an extra column called `predictions`: "
117+
]
118+
},
119+
{
120+
"cell_type": "code",
121+
"execution_count": null,
122+
"id": "17535385",
123+
"metadata": {},
124+
"outputs": [],
125+
"source": [
126+
"train_df[\"predictions\"] = pd.read_csv(\"https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/tabular-classification/documentation/training_preds.csv\") \n",
127+
"val_df[\"predictions\"] = pd.read_csv(\"https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/tabular-classification/documentation/validation_preds.csv\")"
128+
]
129+
},
130+
{
131+
"cell_type": "code",
132+
"execution_count": null,
133+
"id": "9ee86be7",
134+
"metadata": {},
135+
"outputs": [],
136+
"source": [
137+
"val_df.head()"
138+
]
139+
},
140+
{
141+
"cell_type": "markdown",
142+
"id": "0410ce56",
143+
"metadata": {},
144+
"source": [
145+
"### <a id=\"upload-datasets\"> Uploading the datasets to Openlayer </a>"
146+
]
147+
},
148+
{
149+
"cell_type": "code",
150+
"execution_count": null,
151+
"id": "9b2a3f87",
152+
"metadata": {},
153+
"outputs": [],
154+
"source": [
155+
"dataset_config = {\n",
156+
" \"categoricalFeatureNames\": [\"Gender\", \"Geography\"],\n",
157+
" \"classNames\": [\"Retained\", \"Exited\"],\n",
158+
" \"featureNames\": [\n",
159+
" \"CreditScore\", \n",
160+
" \"Geography\",\n",
161+
" \"Gender\",\n",
162+
" \"Age\", \n",
163+
" \"Tenure\",\n",
164+
" \"Balance\",\n",
165+
" \"NumOfProducts\",\n",
166+
" \"HasCrCard\",\n",
167+
" \"IsActiveMember\",\n",
168+
" \"EstimatedSalary\",\n",
169+
" \"AggregateRate\",\n",
170+
" \"Year\"\n",
171+
" ],\n",
172+
" \"labelColumnName\": \"Exited\",\n",
173+
" \"label\": \"training\", # This becomes 'validation' for the validation set\n",
174+
" \"predictionsColumnName\": \"predictions\"\n",
175+
"}"
176+
]
177+
},
178+
{
179+
"cell_type": "code",
180+
"execution_count": null,
181+
"id": "7271d81b",
182+
"metadata": {},
183+
"outputs": [],
184+
"source": [
185+
"project.add_dataframe(\n",
186+
" dataset_df=train_df,\n",
187+
" dataset_config=dataset_config\n",
188+
")"
189+
]
190+
},
191+
{
192+
"cell_type": "code",
193+
"execution_count": null,
194+
"id": "8e126c53",
195+
"metadata": {},
196+
"outputs": [],
197+
"source": [
198+
"dataset_config[\"label\"] = \"validation\"\n",
199+
"\n",
200+
"project.add_dataframe(\n",
201+
" dataset_df=val_df,\n",
202+
" dataset_config=dataset_config\n",
203+
")"
204+
]
205+
},
206+
{
207+
"cell_type": "markdown",
208+
"id": "719fb373",
209+
"metadata": {},
210+
"source": [
211+
"## <a id=\"model\"> 3. Uploading a model</a>\n",
212+
"\n",
213+
"[Back to top](#top)\n",
214+
"\n",
215+
"Since we added predictions to the datasets above, we also need to specify the model used to get them. Feel free to refer to the documentation for the other model upload options."
216+
]
217+
},
218+
{
219+
"cell_type": "code",
220+
"execution_count": null,
221+
"id": "04806952",
222+
"metadata": {},
223+
"outputs": [],
224+
"source": [
225+
"model_config = {\n",
226+
" \"metadata\": { # Can add anything here, as long as it is a dict\n",
227+
" \"model_type\": \"Gradient Boosting Classifier\",\n",
228+
" \"regularization\": \"None\",\n",
229+
" \"encoder_used\": \"One Hot\",\n",
230+
" \"imputation\": \"Imputed with the training set's mean\"\n",
231+
" },\n",
232+
" \"classNames\": dataset_config[\"classNames\"],\n",
233+
" \"featureNames\": dataset_config[\"featureNames\"],\n",
234+
" \"categoricalFeatureNames\": dataset_config[\"categoricalFeatureNames\"],\n",
235+
"}"
236+
]
237+
},
238+
{
239+
"cell_type": "code",
240+
"execution_count": null,
241+
"id": "ab674332",
242+
"metadata": {},
243+
"outputs": [],
244+
"source": [
245+
"project.add_model(\n",
246+
" model_config=model_config\n",
247+
")"
248+
]
249+
},
250+
{
251+
"cell_type": "markdown",
252+
"id": "3215b297",
253+
"metadata": {},
254+
"source": [
255+
"## <a id=\"push\"> 4. Committing and pushing</a>\n",
256+
"\n",
257+
"[Back to top](#top)"
258+
]
259+
},
260+
{
261+
"cell_type": "code",
262+
"execution_count": null,
263+
"id": "929f8fa9",
264+
"metadata": {},
265+
"outputs": [],
266+
"source": [
267+
"project.commit(\"Initial commit!\")"
268+
]
269+
},
270+
{
271+
"cell_type": "code",
272+
"execution_count": null,
273+
"id": "9c2e2004",
274+
"metadata": {},
275+
"outputs": [],
276+
"source": [
277+
"project.status()"
278+
]
279+
},
280+
{
281+
"cell_type": "code",
282+
"execution_count": null,
283+
"id": "0c3c43ef",
284+
"metadata": {},
285+
"outputs": [],
286+
"source": [
287+
"project.push()"
288+
]
289+
},
290+
{
291+
"cell_type": "code",
292+
"execution_count": null,
293+
"id": "703d5326",
294+
"metadata": {},
295+
"outputs": [],
296+
"source": []
297+
}
298+
],
299+
"metadata": {
300+
"kernelspec": {
301+
"display_name": "Python 3 (ipykernel)",
302+
"language": "python",
303+
"name": "python3"
304+
},
305+
"language_info": {
306+
"codemirror_mode": {
307+
"name": "ipython",
308+
"version": 3
309+
},
310+
"file_extension": ".py",
311+
"mimetype": "text/x-python",
312+
"name": "python",
313+
"nbconvert_exporter": "python",
314+
"pygments_lexer": "ipython3",
315+
"version": "3.8.13"
316+
}
317+
},
318+
"nbformat": 4,
319+
"nbformat_minor": 5
320+
}

0 commit comments

Comments
 (0)