|
| 1 | +# IPython/Jupyter Extensions |
| 2 | + |
| 3 | +The `pyrasterframes.rf_ipython` module injects a number of visualization extensions into the IPython environment, enhancing visualization of `DataFrame`s and `Tile`s. |
| 4 | + |
| 5 | +By default, the last expression's result in a IPython cell is passed to the `IPython.display.display` function. This function in turn looks for a [`DisplayFormatter`](https://ipython.readthedocs.io/en/stable/api/generated/IPython.core.formatters.html#IPython.core.formatters.DisplayFormatter) associated with the type, which in turn converts the instance to a display-appropriate representation, based on MIME type. For example, each `DisplayFormatter` may `plain/text` version for the IPython shell, and a `text/html` version for a Jupyter Notebook. |
| 6 | + |
| 7 | +```python imports, echo=False, results='hidden' |
| 8 | +from pyrasterframes.all import * |
| 9 | +from pyspark.sql.functions import col |
| 10 | +spark = create_rf_spark_session() |
| 11 | +``` |
| 12 | + |
| 13 | +## Initialize Sample |
| 14 | + |
| 15 | +First we read in a sample image as tiles: |
| 16 | + |
| 17 | +```python raster_read |
| 18 | +uri = 'https://modis-pds.s3.amazonaws.com/MCD43A4.006/31/11/2017158/' \ |
| 19 | + 'MCD43A4.A2017158.h31v11.006.2017171203421_B01.TIF' |
| 20 | + |
| 21 | +# here we flatten the projected raster structure |
| 22 | +df = spark.read.raster(uri) \ |
| 23 | + .withColumn('tile', rf_tile('proj_raster')) \ |
| 24 | + .withColumn('crs', rf_crs(col('proj_raster'))) \ |
| 25 | + .withColumn('extent', rf_extent(col('proj_raster'))) \ |
| 26 | + .drop('proj_raster') |
| 27 | +``` |
| 28 | + |
| 29 | +Print the schema to confirm it's "shape": |
| 30 | + |
| 31 | +```python schema |
| 32 | +df.printSchema() |
| 33 | +``` |
| 34 | + |
| 35 | +# Tile Display |
| 36 | + |
| 37 | +Let's look at a single tile. A `pyrasterframes.rf_types.Tile` will automatically render nicely in Jupyter or IPython. |
| 38 | + |
| 39 | +```python single_tile |
| 40 | +tile = df.select(df.tile).first()['tile'] |
| 41 | +tile |
| 42 | +``` |
| 43 | + |
| 44 | +If you access the tile's `cells` you get the underlying numpy ndarray (more specifically in this case, `numpy.ma.MaskedArray`). |
| 45 | + |
| 46 | +```python cells |
| 47 | +tile.cells |
| 48 | +``` |
| 49 | + |
| 50 | +If you just want the string representation of the Tile, use `str`: |
| 51 | + |
| 52 | +```python tile_as_string |
| 53 | +str(tile) |
| 54 | +``` |
| 55 | + |
| 56 | +## pyspark.sql.DataFrame Display |
| 57 | + |
| 58 | +There is also a capability for HTML rendering of the spark DataFrame. Rendering work is done on the JVM and the HTML string representation is provided for IPython to display. |
| 59 | + |
| 60 | +```python spark_dataframe |
| 61 | +df.select('tile', 'extent') |
| 62 | +``` |
| 63 | + |
| 64 | +### Changing number of rows |
| 65 | + |
| 66 | +Because the `IPython.display.display` function doesn't accept any parameters, we have to provide a different means of passing parameters to the rendering code. Pandas does it with global settings via `set_option`/`get_option`. We take a more functional approach and have the user invoke an explicit `display` method: |
| 67 | + |
| 68 | +```python custom_display |
| 69 | +df.display(num_rows=1, truncate=True) |
| 70 | +``` |
| 71 | + |
| 72 | + |
| 73 | +## pandas.DataFrame Display |
| 74 | + |
| 75 | +The same thing works for Pandas DataFrame if it contains a column of `Tile`s. |
| 76 | + |
| 77 | +```python pandas_dataframe |
| 78 | +# Limit copy of data from Spark to a few tiles. |
| 79 | +pandas_df = df.limit(4).toPandas() |
| 80 | +pandas_df.drop(['proj_raster_path'], axis=1) |
| 81 | +``` |
| 82 | + |
0 commit comments