|
245 | 245 | "metadata": {}, |
246 | 246 | "outputs": [], |
247 | 247 | "source": [ |
248 | | - "# Add column \"TEMP_KELVIN\"\n" |
| 248 | + "# Add column \"TEMP_KELVIN\" " |
249 | 249 | ] |
250 | 250 | }, |
251 | 251 | { |
|
262 | 262 | "\n", |
263 | 263 | "**Selecting several rows:**\n", |
264 | 264 | "\n", |
265 | | - "One common way of selecting only specific rows from your DataFrame is done via **index slicing** to extract part of the DataFrame.\n", |
| 265 | + "One common way of selecting only specific rows from your DataFrame is done via **index slicing** to extract part of the DataFrame. Slicing in pandas can be done in a similar manner as with normal Python lists, i.e. you specify index range you want to select inside the square brackets ``selection = dataframe[start_index:stop_index]``.\n", |
| 266 | + "\n", |
266 | 267 | "Let's select the first five rows and assign them to a variable called `selection`:" |
267 | 268 | ] |
268 | 269 | }, |
|
279 | 280 | }, |
280 | 281 | "outputs": [], |
281 | 282 | "source": [ |
282 | | - "# Select first five rows of dataframe\n", |
| 283 | + "# Select first five rows of dataframe using index values\n", |
283 | 284 | "\n", |
284 | 285 | "\n" |
285 | 286 | ] |
|
291 | 292 | "editable": true |
292 | 293 | }, |
293 | 294 | "source": [ |
294 | | - "As you can see, slicing can be done in a similar manner as with normal Python lists, i.e. you specify index range you want to select inside the square brackets\n", |
295 | | - "``selection = dataframe[start_index:stop_index]``.\n" |
| 295 | + "**Note:** here selected the first five rows (index 0-4) using integer index. \n" |
296 | 296 | ] |
297 | 297 | }, |
298 | 298 | { |
|
305 | 305 | "**Selecting several rows and columns:**\n", |
306 | 306 | "\n", |
307 | 307 | "\n", |
308 | | - "It is also possible to control which columns are chosen, while selecting a subset of rows. Here, we select only temperature values (`TEMP`) between on rows index 0-5:\n" |
| 308 | + "It is also possible to control which columns are chosen when selecting a subset of rows. In this case we will use [pandas.DataFrame.loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) which selects data based on axis labels (row labels and column labels). \n", |
| 309 | + "\n", |
| 310 | + "Let's select temperature values (column `TEMP`) on rows 0-5:\n" |
309 | 311 | ] |
310 | 312 | }, |
311 | 313 | { |
|
321 | 323 | }, |
322 | 324 | "outputs": [], |
323 | 325 | "source": [ |
324 | | - "# Select temp column values between indices 5 and 10\n", |
| 326 | + "# Select temp column values on rows 0-5\n", |
325 | 327 | "\n", |
326 | 328 | "\n" |
327 | 329 | ] |
328 | 330 | }, |
| 331 | + { |
| 332 | + "cell_type": "markdown", |
| 333 | + "metadata": {}, |
| 334 | + "source": [ |
| 335 | + "**Note:** in this case, we get six rows of data (index 0-5)! We are now doing the selection based on axis labels in stead of the integer index." |
| 336 | + ] |
| 337 | + }, |
329 | 338 | { |
330 | 339 | "cell_type": "markdown", |
331 | 340 | "metadata": { |
332 | 341 | "deletable": true, |
333 | 342 | "editable": true |
334 | 343 | }, |
335 | 344 | "source": [ |
336 | | - "It is also possible to select multiple columns using those same indices. Here, we select `TEMP` and the `TEMP_CELSIUS` columns from a set of rows by passing them inside a list (`.loc[start_index:stop_index, list_of_columns]`):" |
| 345 | + "It is also possible to select multiple columns when using `loc`. Here, we select `TEMP` and the `TEMP_CELSIUS` columns from a set of rows by passing them inside a list (`.loc[start_index:stop_index, list_of_columns]`):" |
337 | 346 | ] |
338 | 347 | }, |
339 | 348 | { |
|
349 | 358 | }, |
350 | 359 | "outputs": [], |
351 | 360 | "source": [ |
352 | | - "# Select temp and temp_celsius column values between indices 5 and 10\n", |
| 361 | + "# Select columns temp and temp_celsius on rows 0-5\n", |
353 | 362 | "\n", |
354 | 363 | "\n" |
355 | 364 | ] |
|
466 | 475 | "`.loc` and `.at` are based on the *axis labels* - the names of columns and rows. \n", |
467 | 476 | "`.iloc` is another indexing operator which is based on *integer values*. \n", |
468 | 477 | " \n", |
469 | | - "See pandas documentation for more information about [indexing and selecting data](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-and-selecting-data)\n", |
| 478 | + "See pandas documentation for more information about [indexing and selecting data](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-and-selecting-data).\n", |
470 | 479 | " \n", |
471 | 480 | "</div>" |
472 | 481 | ] |
|
483 | 492 | "execution_count": null, |
484 | 493 | "metadata": {}, |
485 | 494 | "outputs": [], |
486 | | - "source": [] |
| 495 | + "source": [ |
| 496 | + "data.iloc[0:5:,0:2]" |
| 497 | + ] |
487 | 498 | }, |
488 | 499 | { |
489 | 500 | "cell_type": "markdown", |
|
498 | 509 | "execution_count": null, |
499 | 510 | "metadata": {}, |
500 | 511 | "outputs": [], |
501 | | - "source": [] |
| 512 | + "source": [ |
| 513 | + "data.iloc[0,1]" |
| 514 | + ] |
| 515 | + }, |
| 516 | + { |
| 517 | + "cell_type": "markdown", |
| 518 | + "metadata": {}, |
| 519 | + "source": [ |
| 520 | + "We can also access individual rows using `iloc`. Let's check out the last row of data:" |
| 521 | + ] |
| 522 | + }, |
| 523 | + { |
| 524 | + "cell_type": "code", |
| 525 | + "execution_count": null, |
| 526 | + "metadata": {}, |
| 527 | + "outputs": [], |
| 528 | + "source": [ |
| 529 | + "data.iloc[-1]" |
| 530 | + ] |
502 | 531 | }, |
503 | 532 | { |
504 | 533 | "cell_type": "markdown", |
|
656 | 685 | "source": [ |
657 | 686 | "As you can see by looking at the table above (and the change in index values), we now have a DataFrame without the NoData values.\n", |
658 | 687 | "\n", |
659 | | - "Another option is to fill the NoData with some value using the `fillna()` function. Here we can fill the missing values in the with value 0. Note that we are not giving the `subset` parameter this time." |
| 688 | + "Another option is to fill the NoData with some value using the `fillna()` function. Here we can fill the missing values in the with value -9999. Note that we are not giving the `subset` parameter this time." |
660 | 689 | ] |
661 | 690 | }, |
662 | 691 | { |
|
672 | 701 | }, |
673 | 702 | "outputs": [], |
674 | 703 | "source": [ |
675 | | - "# Fill na values with 0\n" |
| 704 | + "# Fill na values\n" |
676 | 705 | ] |
677 | 706 | }, |
678 | 707 | { |
|
682 | 711 | "editable": true |
683 | 712 | }, |
684 | 713 | "source": [ |
685 | | - "As a result we now have a DataFrame where NoData values are filled with the value 0.0." |
| 714 | + "As a result we now have a DataFrame where NoData values are filled with the value -9999." |
686 | 715 | ] |
687 | 716 | }, |
688 | 717 | { |
|
694 | 723 | "source": [ |
695 | 724 | "<div class=\"alert alert-warning\">\n", |
696 | 725 | "\n", |
697 | | - "**Warning:** In many cases filling the data with a specific value is dangerous because you end up modifying the actual data, which might affect the results of your analysis. For example, in the case above we would have dramatically changed the temperature difference columns because the 0 values not an actual temperature difference! Hence, use caution when filling missing values.\n", |
| 726 | + "**Warning:** \n", |
| 727 | + " \n", |
| 728 | + "In many cases filling the data with a specific value is dangerous because you end up modifying the actual data, which might affect the results of your analysis. For example, in the case above we would have dramatically changed the temperature difference columns because the -9999 values not an actual temperature difference! Hence, use caution when filling missing values. \n", |
| 729 | + " \n", |
| 730 | + "You might have to fill in no data values, for example, when working with GIS data. Always pay attention to potential no data values when reading in data files and doing further analysis!\n", |
698 | 731 | "\n", |
699 | 732 | "</div>" |
700 | 733 | ] |
|
0 commit comments