Introduce classic ML quickstart notebook

gustavocidornelas · whoseoyster · commit 34a701f27ab0 · 2023-10-09T20:34:37.000-07:00
diff --git a/examples/tabular-classification/quickstart/tabular-quickstart.ipynb b/examples/tabular-classification/quickstart/tabular-quickstart.ipynb
@@ -0,0 +1,320 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ef55abc9",
+   "metadata": {},
+   "source": [
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openlayer-ai/examples-gallery/blob/main/tabular-classification/quickstart/tabular-quickstart.ipynb)\n",
+    "\n",
+    "\n",
+    "# <a id=\"top\">Development quickstart</a>\n",
+    "\n",
+    "This notebook illustrates a typical development flow using Openlayer.\n",
+    "\n",
+    "\n",
+    "## <a id=\"toc\">Table of contents</a>\n",
+    "\n",
+    "1. [**Creating a project**](#project)   \n",
+    "\n",
+    "2. [**Uploading datasets**](#dataset)\n",
+    "\n",
+    "3. [**Uploading a model**](#model)\n",
+    "\n",
+    "4. [**Committing and pushing**](#push)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ccf87aeb",
+   "metadata": {},
+   "source": [
+    "## <a id=\"project\"> 1. Creating a project</a>\n",
+    "\n",
+    "[Back to top](#top)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1c132263",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install openlayer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2ea07b37",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import openlayer\n",
+    "from openlayer.tasks import TaskType\n",
+    "\n",
+    "client = openlayer.OpenlayerClient(\"YOUR_API_KEY_HERE\")\n",
+    "\n",
+    "project = client.create_or_load_project(\n",
+    "    name=\"Churn Prediction\",\n",
+    "    task_type=TaskType.TabularClassification,\n",
+    ")\n",
+    "\n",
+    "# Or \n",
+    "# project = client.load_project(name=\"Your project name here\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79f8626c",
+   "metadata": {},
+   "source": [
+    "## <a id=\"dataset\"> 2. Uploading datasets </a>\n",
+    "\n",
+    "[Back to top](#top)\n",
+    "\n",
+    "### <a id=\"download-datasets\"> Downloading the training and validation sets </a>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e1069378",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%bash\n",
+    "\n",
+    "if [ ! -e \"churn_train.csv\" ]; then\n",
+    "    curl \"https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/tabular-classification/documentation/churn_train.csv\" --output \"churn_train.csv\"\n",
+    "fi\n",
+    "\n",
+    "if [ ! -e \"churn_val.csv\" ]; then\n",
+    "    curl \"https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/tabular-classification/documentation/churn_val.csv\" --output \"churn_val.csv\"\n",
+    "fi"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "31eda871",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "\n",
+    "train_df = pd.read_csv(\"./churn_train.csv\")\n",
+    "val_df = pd.read_csv(\"./churn_val.csv\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35ae1754",
+   "metadata": {},
+   "source": [
+    "Now, imagine that we have trained a model using this training set. Then, we used the trained model to get the predictions for the training and validation sets. Let's add these predictions as an extra column called `predictions`: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "17535385",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_df[\"predictions\"] = pd.read_csv(\"https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/tabular-classification/documentation/training_preds.csv\") \n",
+    "val_df[\"predictions\"] = pd.read_csv(\"https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/tabular-classification/documentation/validation_preds.csv\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9ee86be7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "val_df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0410ce56",
+   "metadata": {},
+   "source": [
+    "### <a id=\"upload-datasets\"> Uploading the datasets to Openlayer </a>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9b2a3f87",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dataset_config = {\n",
+    "    \"categoricalFeatureNames\": [\"Gender\", \"Geography\"],\n",
+    "    \"classNames\": [\"Retained\", \"Exited\"],\n",
+    "    \"featureNames\": [\n",
+    "        \"CreditScore\", \n",
+    "        \"Geography\",\n",
+    "        \"Gender\",\n",
+    "        \"Age\", \n",
+    "        \"Tenure\",\n",
+    "        \"Balance\",\n",
+    "        \"NumOfProducts\",\n",
+    "        \"HasCrCard\",\n",
+    "        \"IsActiveMember\",\n",
+    "        \"EstimatedSalary\",\n",
+    "        \"AggregateRate\",\n",
+    "        \"Year\"\n",
+    "    ],\n",
+    "    \"labelColumnName\": \"Exited\",\n",
+    "    \"label\": \"training\",  # This becomes 'validation' for the validation set\n",
+    "    \"predictionsColumnName\": \"predictions\"\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7271d81b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "project.add_dataframe(\n",
+    "    dataset_df=train_df,\n",
+    "    dataset_config=dataset_config\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8e126c53",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dataset_config[\"label\"] = \"validation\"\n",
+    "\n",
+    "project.add_dataframe(\n",
+    "    dataset_df=val_df,\n",
+    "    dataset_config=dataset_config\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "719fb373",
+   "metadata": {},
+   "source": [
+    "## <a id=\"model\"> 3. Uploading a model</a>\n",
+    "\n",
+    "[Back to top](#top)\n",
+    "\n",
+    "Since we added predictions to the datasets above, we also need to specify the model used to get them. Feel free to refer to the documentation for the other model upload options."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "04806952",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_config = {\n",
+    "    \"metadata\": {  # Can add anything here, as long as it is a dict\n",
+    "        \"model_type\": \"Gradient Boosting Classifier\",\n",
+    "        \"regularization\": \"None\",\n",
+    "        \"encoder_used\": \"One Hot\",\n",
+    "        \"imputation\": \"Imputed with the training set's mean\"\n",
+    "    },\n",
+    "    \"classNames\": dataset_config[\"classNames\"],\n",
+    "    \"featureNames\": dataset_config[\"featureNames\"],\n",
+    "    \"categoricalFeatureNames\": dataset_config[\"categoricalFeatureNames\"],\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ab674332",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "project.add_model(\n",
+    "    model_config=model_config\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3215b297",
+   "metadata": {},
+   "source": [
+    "## <a id=\"push\"> 4. Committing and pushing</a>\n",
+    "\n",
+    "[Back to top](#top)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "929f8fa9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "project.commit(\"Initial commit!\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9c2e2004",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "project.status()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0c3c43ef",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "project.push()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "703d5326",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}