Skip to content

Commit 88f9af5

Browse files
committed
PR feedback fixes, including balancing content with raster-write.pymd.
1 parent 35feb73 commit 88f9af5

File tree

6 files changed

+145
-175
lines changed

6 files changed

+145
-175
lines changed

datasource/src/main/scala/org/locationtech/rasterframes/datasource/raster/RasterSourceRelation.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,6 @@ case class RasterSourceRelation(
153153
.repartitionByRange(numParts,$"spatial_index")
154154
indexed.rdd
155155
}
156-
else df.rdd
156+
else df.repartition(numParts).rdd
157157
}
158158
}

docs/src/main/paradox/release-notes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
### 0.9.1
66

77
* Upgraded to Spark 2.4.7
8-
* Added `pyspark.sql.DataFrame.display(num_rows, truncate)` extension method when `rf_ipython` is imported.
8+
* Added `pyspark.sql.DataFrame.display(num_rows:int, truncate:bool)` extension method when `rf_ipython` is imported.
99
* Added users' manual section on IPython display enhancements.
1010
* Added `method_name` parameter to the `rf_resample` method.
1111
* __BREAKING__: In SQL, the function `rf_resample` now takes 3 arguments. You can use `rf_resample_nearest` with two arguments or refactor to `rf_resample(t, v, "nearest")`.

pyrasterframes/src/main/python/docs/ipython.pymd

Lines changed: 64 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -4,67 +4,91 @@ The `pyrasterframes.rf_ipython` module injects a number of visualization extensi
44

55
By default, the last expression's result in a IPython cell is passed to the `IPython.display.display` function. This function in turn looks for a [`DisplayFormatter`](https://ipython.readthedocs.io/en/stable/api/generated/IPython.core.formatters.html#IPython.core.formatters.DisplayFormatter) associated with the type, which in turn converts the instance to a display-appropriate representation, based on MIME type. For example, each `DisplayFormatter` may `plain/text` version for the IPython shell, and a `text/html` version for a Jupyter Notebook.
66

7-
```python imports, echo=False, results='hidden'
8-
from pyrasterframes.all import *
9-
from pyspark.sql.functions import col
7+
This will be our setup for the following examples:
8+
9+
```python setup
10+
from pyrasterframes import *
11+
from pyrasterframes.rasterfunctions import *
12+
from pyrasterframes.utils import create_rf_spark_session
13+
import pyrasterframes.rf_ipython
14+
from IPython.display import display
15+
import os.path
1016
spark = create_rf_spark_session()
17+
def scene(band):
18+
b = str(band).zfill(2) # converts int 2 to '02'
19+
return 'https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/' \
20+
'MCD43A4.A2019059.h11v08.006.2019072203257_B{}.TIF'.format(b)
21+
rf = spark.read.raster(scene(2), tile_dimensions=(256, 256))
1122
```
1223

13-
## Initialize Sample
24+
## Tile Samples
1425

15-
First we read in a sample image as tiles:
26+
We have some convenience methods to quickly visualize tiles (see discussion of the RasterFrame @ref:[schema](raster-read.md#single-raster) for orientation to the concept) when inspecting a subset of the data in a Notebook.
1627

17-
```python raster_read
18-
uri = 'https://modis-pds.s3.amazonaws.com/MCD43A4.006/31/11/2017158/' \
19-
'MCD43A4.A2017158.h31v11.006.2017171203421_B01.TIF'
28+
In an IPython or Jupyter interpreter, a `Tile` object will be displayed as an image with limited metadata.
2029

21-
# here we flatten the projected raster structure
22-
df = spark.read.raster(uri) \
23-
.withColumn('tile', rf_tile('proj_raster')) \
24-
.withColumn('crs', rf_crs(col('proj_raster'))) \
25-
.withColumn('extent', rf_extent(col('proj_raster'))) \
26-
.drop('proj_raster')
30+
```python, sample_tile
31+
sample_tile = rf.select(rf_tile('proj_raster').alias('tile')).first()['tile']
32+
sample_tile # or `display(sample_tile)`
2733
```
28-
29-
Print the schema to confirm its "shape":
3034

31-
```python schema
32-
df.printSchema()
35+
## DataFrame Samples
36+
37+
Within an IPython or Jupyter interpreter, a Spark and Pandas DataFrames containing a column of _tiles_ will be rendered as the samples discussed above. Simply import the `rf_ipython` submodule to enable enhanced HTML rendering of these DataFrame types.
38+
39+
```python display_samples
40+
rf # or `display(rf)`, or `rf.display()`
3341
```
3442

35-
# Tile Display
43+
### Changing Number of Rows
3644

37-
Let's look at a single tile. A `pyrasterframes.rf_types.Tile` will automatically render nicely in Jupyter or IPython.
45+
By default the RasterFrame sample display renders 5 rows. Because the `IPython.display.display` function doesn't pass parameters to the underlying rendering functions, we have to provide a different means of passing parameters to the rendering code. Pandas approach to this is to use global settings via `set_option`/`get_option`. We take a more functional approach and have the user invoke an explicit `display` method:
3846

39-
```python single_tile
40-
tile = df.select(df.tile).first()['tile']
41-
tile
42-
```
47+
```python custom_display, evaluate=False
48+
rf.display(num_rows=1, truncate=True)
49+
```
4350

44-
## pyspark.sql.DataFrame Display
51+
```python custom_display_mime, echo=False
52+
rf.display(num_rows=1, truncate=True, mimetype='text/markdown')
53+
```
4554

46-
There is also a capability for HTML rendering of the spark DataFrame.
55+
### Pandas
4756

48-
```python spark_dataframe
49-
df.select('tile', 'extent')
50-
```
57+
There is similar rendering support injected into the Pandas by the `rf_ipython` module, for Pandas Dataframes having Tiles in them:
5158

52-
### Changing number of rows
59+
```python pandas_dataframe
60+
# Limit copy of data from Spark to a few tiles.
61+
pandas_df = rf.select(rf_tile('proj_raster'), rf_extent('proj_raster')).limit(4).toPandas()
62+
pandas_df # or `display(pandas_df)`
63+
```
5364

54-
Because the `IPython.display.display` function doesn't accept any parameters, we have to provide a different means of passing parameters to the rendering code. Pandas does it with global settings via `set_option`/`get_option`. We take a more functional approach and have the user invoke an explicit `display` method:
65+
## Sample Colorization
5566

56-
```python custom_display
57-
df.display(num_rows=1, truncate=True)
58-
```
67+
RasterFrames uses the "Viridis" color ramp as the default color profile for tile column. There are other options for reasoning about how color should be applied in the results.
5968

69+
### Color Composite
6070

61-
## pandas.DataFrame Display
71+
As shown in @ref:[Writing Raster Data section](raster-write.md) section, composites can be constructed for visualization:
6272

63-
The same thing works for Pandas DataFrame if it contains a column of `Tile`s.
73+
```python, png_color_composite
74+
from IPython.display import Image # For telling IPython how to interpret the PNG byte array
75+
# Select red, green, and blue, respectively
76+
three_band_rf = spark.read.raster(source=[[scene(1), scene(4), scene(3)]])
77+
composite_rf = three_band_rf.withColumn('png',
78+
rf_render_png('proj_raster_0', 'proj_raster_1', 'proj_raster_2'))
79+
png_bytes = composite_rf.select('png').first()['png']
80+
Image(png_bytes)
81+
```
6482

65-
```python pandas_dataframe
66-
# Limit copy of data from Spark to a few tiles.
67-
pandas_df = df.limit(4).toPandas()
68-
pandas_df.drop(['proj_raster_path'], axis=1)
83+
```python, png_render, echo=False
84+
from IPython.display import display_markdown
85+
display_markdown(pyrasterframes.rf_ipython.binary_to_html(png_bytes), raw=True)
6986
```
7087

88+
### Custom Color Ramp
89+
90+
You can also apply a different color ramp to a single-channel Tile using the @ref[`rf_render_color_ramp_png`](reference.md#rf-render-color-ramp-png) function. See the function documentation for information about the available color maps.
91+
92+
```python, color_map
93+
rf.select(rf_render_color_ramp_png('proj_raster', 'Magma'))
94+
```

pyrasterframes/src/main/python/docs/raster-read.pymd

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -287,6 +287,11 @@ In the initial examples on this page, you may have noticed that the realized (no
287287

288288
## Spatial Indexing and Partitioning
289289

290+
@@@ warning
291+
This is an experimental feature, and may be removed.
292+
@@@
293+
294+
290295
It's often desirable to take extra steps in ensuring your data is effectively distributed over your computing resources. One way of doing that is using something called a ["space filling curve"](https://en.wikipedia.org/wiki/Space-filling_curve), which turns an N-dimensional value into a one dimensional value, with properties that favor keeping entities near each other in N-space near each other in index space. In particular RasterFrames support space-filling curves mapping the geographic location of _tiles_ to a one-dimensional index space called [`xz2`](https://www.geomesa.org/documentation/user/datastores/index_overview.html). To have RasterFrames add a spatial index based partitioning on a raster reads, use the `spatial_index_partitions` parameter. By default it will use the same number of partitions as configured in [`spark.sql.shuffle.partitions`](https://spark.apache.org/docs/latest/sql-performance-tuning.html#other-configuration-options).
291296

292297
```python, spatial_indexing

0 commit comments

Comments
 (0)