|
115 | 115 | "source": [ |
116 | 116 | "## Getting started\n", |
117 | 117 | "\n", |
118 | | - "First, install the required dependencies. \n", |
119 | | - "\n", |
120 | | - "```bash\n", |
121 | | - "pip install -r requirements.txt\n", |
122 | | - "```\n", |
| 118 | + "First, install the required dependencies. " |
| 119 | + ] |
| 120 | + }, |
| 121 | + { |
| 122 | + "cell_type": "code", |
| 123 | + "execution_count": 1, |
| 124 | + "metadata": {}, |
| 125 | + "outputs": [], |
| 126 | + "source": [ |
| 127 | + "#!pip install -r requirements.txt\n", |
123 | 128 | "\n", |
| 129 | + "# In an environment like Google Colab, please use the absolute URL to the requirements.txt file.\n", |
| 130 | + "# Note: Some inconsistencies of dependencies might get reported. They can usually be ignored.\n", |
| 131 | + "# Restart the runtime, if asked by Colab.\n", |
| 132 | + "#!pip install -r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/automl/requirements.txt" |
| 133 | + ] |
| 134 | + }, |
| 135 | + { |
| 136 | + "cell_type": "markdown", |
| 137 | + "metadata": {}, |
| 138 | + "source": [ |
124 | 139 | "**Note:** As of time of this writing, PyCaret requires Python 3.8, 3.9 or 3.10.\n", |
125 | 140 | "\n", |
126 | 141 | "Second, you will need a CrateDB instance to store and serve the data. The easiest\n", |
|
131 | 146 | "create an `.env` file with the following content:\n", |
132 | 147 | "\n", |
133 | 148 | "```env\n", |
134 | | - "CRATE_HOST=<your-crate-host> # set this to localhost if you're running crate locally\n", |
135 | | - "CRATE_USER=<your-crate-user> # set this to crate if you're running crate locally\n", |
136 | | - "CRATE_PASSWORD=<your-crate-password> # set this to \"\" if you're running crate locally\n", |
137 | | - "CRATE_SSL=true # set this to false if you're running crate locally\n", |
| 149 | + "# use this string for a connection to CrateDB Cloud\n", |
| 150 | + "CONNECTION_STRING=crate://username:password@hostname/?ssl=true \n", |
| 151 | + "\n", |
| 152 | + "# use this string for a local connection to CrateDB\n", |
| 153 | + "# CONNECTION_STRING=crate://crate@localhost/?ssl=false\n", |
138 | 154 | "```\n", |
139 | 155 | "\n", |
140 | 156 | "You can find your CrateDB credentials in the [CrateDB Cloud Console].\n", |
141 | 157 | "\n", |
142 | 158 | "[CrateDB Cloud Console]: https://cratedb.com/docs/cloud/en/latest/reference/overview.html#cluster\n", |
143 | | - "[deploy a cluster]: https://cratedb.com/docs/cloud/en/latest/tutorials/deploy/stripe.html#deploy-cluster\n", |
144 | | - "\n", |
145 | | - "### Creating demo data\n", |
| 159 | + "[deploy a cluster]: https://cratedb.com/docs/cloud/en/latest/tutorials/deploy/stripe.html#deploy-cluster" |
| 160 | + ] |
| 161 | + }, |
| 162 | + { |
| 163 | + "cell_type": "code", |
| 164 | + "execution_count": 2, |
| 165 | + "metadata": {}, |
| 166 | + "outputs": [], |
| 167 | + "source": [ |
| 168 | + "import os\n", |
146 | 169 | "\n", |
147 | | - "For convenience, this notebook comes with an accompanying CSV dataset which you\n", |
148 | | - "can quickly import into the database. Upload the CSV file to your CrateDB cloud\n", |
149 | | - "cluster, as described [here](https://cratedb.com/docs/cloud/en/latest/reference/overview.html#import).\n", |
150 | | - "To follow this notebook, choose `pycaret_churn` for your table name.\n", |
| 170 | + "# For CrateDB Cloud, use:\n", |
| 171 | + "CONNECTION_STRING = os.environ.get(\n", |
| 172 | + " \"CRATEDB_CONNECTION_STRING\",\n", |
| 173 | + " \"crate://username:password@hostname/?ssl=true\",\n", |
| 174 | + ")\n", |
151 | 175 | "\n", |
152 | | - "This will automatically create a new database table and import the data." |
| 176 | + "# For an self-deployed CrateDB, e.g. via Docker, please use:\n", |
| 177 | + "# CONNECTION_STRING = os.environ.get(\n", |
| 178 | + "# \"CRATEDB_CONNECTION_STRING\",\n", |
| 179 | + "# \"crate://crate@localhost/?ssl=false\",\n", |
| 180 | + "# )" |
153 | 181 | ] |
154 | 182 | }, |
155 | 183 | { |
156 | 184 | "cell_type": "markdown", |
157 | 185 | "metadata": {}, |
158 | 186 | "source": [ |
| 187 | + "### Creating demo data\n", |
| 188 | + "\n", |
| 189 | + "For convenience, this notebook comes with an accompanying CSV dataset which you\n", |
| 190 | + "can quickly import into the database. Upload the CSV file to your CrateDB cloud\n", |
| 191 | + "cluster, as described [here](https://cratedb.com/docs/cloud/en/latest/reference/overview.html#import).\n", |
| 192 | + "To follow this notebook, choose `pycaret_churn` for your table name.\n", |
| 193 | + "\n", |
| 194 | + "This will automatically create a new database table and import the data.\n", |
| 195 | + "\n", |
159 | 196 | "### Alternative data import using code\n", |
160 | 197 | "\n", |
161 | 198 | "If you prefer to use code to import your data, please execute the following lines which read the CSV\n", |
|
175 | 212 | "if os.path.exists(\".env\"):\n", |
176 | 213 | " dotenv.load_dotenv(\".env\", override=True)\n", |
177 | 214 | "\n", |
178 | | - "dburi = f\"crate://{os.environ['CRATE_USER']}:{os.environ['CRATE_PASSWORD']}@{os.environ['CRATE_HOST']}:4200?ssl={os.environ['CRATE_SSL']}\"\n", |
179 | | - "engine = sa.create_engine(dburi, echo=os.environ.get('DEBUG'))\n", |
| 215 | + "engine = sa.create_engine(CONNECTION_STRING, echo=os.environ.get('DEBUG'))\n", |
180 | 216 | "df = pd.read_csv(\"https://github.com/crate/cratedb-datasets/raw/main/machine-learning/automl/churn-dataset.csv\")\n", |
181 | 217 | "\n", |
182 | 218 | "with engine.connect() as conn:\n", |
|
214 | 250 | "if os.path.exists(\".env\"):\n", |
215 | 251 | " dotenv.load_dotenv(\".env\", override=True)\n", |
216 | 252 | "\n", |
217 | | - "dburi = f\"crate://{os.environ['CRATE_USER']}:{os.environ['CRATE_PASSWORD']}@{os.environ['CRATE_HOST']}:4200?ssl={os.environ['CRATE_SSL']}\"\n", |
218 | | - "engine = sa.create_engine(dburi, echo=os.environ.get('DEBUG'))\n", |
| 253 | + "engine = sa.create_engine(CONNECTION_STRING, echo=os.environ.get('DEBUG'))\n", |
219 | 254 | "\n", |
220 | 255 | "with engine.connect() as conn:\n", |
221 | 256 | " with conn.execute(sa.text(\"SELECT * FROM pycaret_churn\")) as cursor:\n", |
|
224 | 259 | "# We set the MLFLOW_TRACKING_URI to our CrateDB instance. We'll see later why\n", |
225 | 260 | "os.environ[\n", |
226 | 261 | " \"MLFLOW_TRACKING_URI\"\n", |
227 | | - "] = f\"{dburi}&schema=mlflow\"" |
| 262 | + "] = f\"{CONNECTION_STRING}&schema=mlflow\"" |
228 | 263 | ] |
229 | 264 | }, |
230 | 265 | { |
|
966 | 1001 | "# - \"n_select\" defines how many models are selected.\n", |
967 | 1002 | "# - \"exclude\" defines which models are excluded from the comparison.\n", |
968 | 1003 | "\n", |
| 1004 | + "# Note: This is only relevant if we are executing automated tests\n", |
969 | 1005 | "if \"PYTEST_CURRENT_TEST\" in os.environ:\n", |
970 | 1006 | " best_models = compare_models(sort=\"AUC\", include=[\"lr\", \"knn\"], n_select=3)\n", |
| 1007 | + "# If we are not in an automated test, compare the available models\n", |
971 | 1008 | "else:\n", |
972 | 1009 | " # For production scenarios, it might be worth to include \"lightgbm\" again.\n", |
973 | 1010 | " best_models = compare_models(sort=\"AUC\", exclude=[\"lightgbm\"], n_select=3)" |
|
3406 | 3443 | "source": [ |
3407 | 3444 | "os.environ[\n", |
3408 | 3445 | " \"MLFLOW_TRACKING_URI\"\n", |
3409 | | - "] = f\"crate://{os.environ['CRATE_USER']}:{os.environ['CRATE_PASSWORD']}@{os.environ['CRATE_HOST']}:4200?ssl={os.environ['CRATE_SSL']}&schema=mlflow\"" |
| 3446 | + "] = f\"{CONNECTION_STRING}&schema=mlflow\"" |
3410 | 3447 | ] |
3411 | 3448 | }, |
3412 | 3449 | { |
|
3484 | 3521 | ], |
3485 | 3522 | "metadata": { |
3486 | 3523 | "kernelspec": { |
3487 | | - "display_name": "crate", |
| 3524 | + "display_name": "Python 3 (ipykernel)", |
3488 | 3525 | "language": "python", |
3489 | 3526 | "name": "python3" |
3490 | 3527 | }, |
|
3498 | 3535 | "name": "python", |
3499 | 3536 | "nbconvert_exporter": "python", |
3500 | 3537 | "pygments_lexer": "ipython3", |
3501 | | - "version": "3.10.0" |
| 3538 | + "version": "3.11.4" |
3502 | 3539 | } |
3503 | 3540 | }, |
3504 | 3541 | "nbformat": 4, |
|
0 commit comments