Skip to content

Commit abb8f4e

Browse files
committed
Misc documentation tweaks.
1 parent 20cf328 commit abb8f4e

File tree

4 files changed

+35
-36
lines changed

4 files changed

+35
-36
lines changed

pyrasterframes/src/main/python/docs/aggregation.pymd

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,10 @@ We can illustrate aggregate differences by computing an aggregate mean. First, w
2020
```python, sql_dataframe
2121
import pyspark.sql.functions as F
2222

23-
rf = spark.sql("""
24-
SELECT 1 as id, rf_make_ones_tile(5, 5, 'float32') as tile
25-
UNION
26-
SELECT 2 as id, rf_local_multiply(rf_make_ones_tile(5, 5, 'float32'), 3) as tile
27-
""")
23+
df1 = spark.range(1).select('id', rf_make_ones_tile(5, 5, 'float32').alias('tile'))
24+
df2 = spark.range(1).select('id', rf_local_multiply(rf_make_ones_tile(5, 5, 'float32'), F.lit(3)).alias('tile'))
25+
26+
rf = df1.union(df2)
2827

2928
tiles = rf.select("tile").collect()
3029
print(tiles[0]['tile'].cells)
@@ -93,14 +92,13 @@ stats
9392
The @ref:[`rf_agg_local_stats`](reference.md#rf-agg-local-stats) function computes the element-wise local aggregate statistical summary as shown below. The DataFrame used in the previous two code blocks has unequal _tile_ dimensions, so a different DataFrame is used in this code block to avoid a runtime error.
9493

9594
```python, agg_local_stats
96-
rf = spark.sql("""
97-
SELECT 1 as id, rf_make_ones_tile(5, 5, 'float32') as tile
98-
UNION
99-
SELECT 2 as id, rf_make_constant_tile(3, 5, 5, 'float32') as tile
100-
UNION
101-
SELECT 3 as id, rf_make_constant_tile(5, 5, 5, 'float32') as tile
102-
""").agg(rf_agg_local_stats('tile').alias('stats'))
95+
df1 = spark.range(1).select('id', rf_make_ones_tile(5, 5, 'float32').alias('tile'))
96+
df2 = spark.range(1).select('id', rf_make_constant_tile(3, 5, 5, 'float32').alias('tile'))
97+
df3 = spark.range(1).select('id', rf_make_constant_tile(5, 5, 5, 'float32').alias('tile'))
10398

99+
rf = df1.union(df2).union(df3) \
100+
.agg(rf_agg_local_stats('tile').alias('stats'))
101+
104102
agg_local_stats = rf.select('stats.min', 'stats.max', 'stats.mean', 'stats.variance').collect()
105103

106104
for r in agg_local_stats:

pyrasterframes/src/main/python/docs/local-algebra.pymd

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,13 +34,15 @@ We will apply the @ref:[catalog pattern](raster-catalogs.md) for defining the da
3434
This form of `(x - y) / (x + y)` is common in remote sensing and is called a normalized difference. It is used with other band pairs to highlight water, snow, and other phenomena.
3535

3636
```python, read_rasters
37-
bands = {4: 'red', 8: 'nir'}
37+
from pyspark.sql import Row
3838
uri_pattern = 'https://s22s-test-geotiffs.s3.amazonaws.com/luray_snp/B0{}.tif'
39-
catalog_df = pd.DataFrame([
40-
{bands[b_num]: uri_pattern.format(b_num) for b_num in bands.keys()}
39+
catalog_df = spark.createDataFrame([
40+
Row(red=uri_pattern.format(4), nir=uri_pattern.format(8))
4141
])
42-
df = spark.read.raster(catalog=catalog_df.to_csv(index=None),
43-
catalog_col_names=list(catalog_df.columns))
42+
df = spark.read.raster(
43+
catalog=catalog_df,
44+
catalog_col_names=['red', 'nir']
45+
)
4446
df.printSchema()
4547
```
4648

pyrasterframes/src/main/python/docs/raster-read.pymd

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ modis_catalog = spark.read \
9595
.option("header", "true") \
9696
.load(SparkFiles.get(cat_filename)) \
9797
.withColumn('base_url',
98-
F.concat(F.regexp_replace('download_url', 'index.html$', ''), 'gid',)
98+
F.concat(F.regexp_replace('download_url', 'index.html$', ''), 'gid')
9999
) \
100100
.drop('download_url') \
101101
.withColumn('red' , F.concat('base_url', F.lit("_B01.TIF"))) \
@@ -126,15 +126,17 @@ Now that we have prepared our catalog, we simply pass the DataFrame or CSV strin
126126
```python, read_catalog
127127
rf = spark.read.raster(
128128
catalog=equator,
129-
catalog_col_names=['red', 'nir'],
129+
catalog_col_names=['red', 'nir']
130130
)
131131
rf.printSchema()
132132
```
133133

134134
Observe the schema of the resulting DataFrame has a projected raster struct for each column passed in `catalog_col_names`. For reference, the URI is now in a column appended with `_path`. Taking a quick look at the representation of the data, we see again each row contains an arbitrary portion of the entire scene coverage. We also see that for two-D catalogs, each row contains the same spatial extent for all tiles in that row.
135135

136136
```python, cat_read_sample
137-
sample = rf.select('gid', rf_extent('red'), rf_extent('nir'), rf_tile('red'), rf_tile('nir'))
137+
sample = rf \
138+
.select('gid', rf_extent('red'), rf_extent('nir'), rf_tile('red'), rf_tile('nir')) \
139+
.where(~rf_is_no_data_tile('red'))
138140
sample.limit(3)
139141
```
140142

@@ -168,9 +170,10 @@ When reading a multiband raster or a _catalog_ describing multiband rasters, you
168170
For example, we can read a four-band (red, green, blue, and near-infrared) image as follows. The individual rows of the resulting DataFrame still represent distinct spatial extents, with a projected raster column for each band specified by `band_indexes`.
169171

170172
```python, multiband
171-
mb = spark.read.raster('s3://s22s-test-geotiffs/naip/m_3807863_nw_17_1_20160620.tif',
172-
band_indexes=[0, 1, 2, 3],
173-
)
173+
mb = spark.read.raster(
174+
's3://s22s-test-geotiffs/naip/m_3807863_nw_17_1_20160620.tif',
175+
band_indexes=[0, 1, 2, 3],
176+
)
174177
mb.printSchema()
175178
```
176179

@@ -184,14 +187,15 @@ Here is a trivial example with a _catalog_ over multiband rasters. We specify tw
184187
import pandas as pd
185188
mb_cat = pd.DataFrame([
186189
{'foo': 's3://s22s-test-geotiffs/naip/m_3807863_nw_17_1_20160620.tif',
187-
'bar': 's3://s22s-test-geotiffs/naip/m_3807863_nw_17_1_20160620.tif',
190+
'bar': 's3://s22s-test-geotiffs/naip/m_3807863_nw_17_1_20160620.tif'
188191
},
189192
])
190-
mb2 = spark.read.raster(catalog=spark.createDataFrame(mb_cat),
191-
catalog_col_names=['foo', 'bar'],
192-
band_indexes=[0, 1],
193-
tile_dimensions=(64,64)
194-
)
193+
mb2 = spark.read.raster(
194+
catalog=spark.createDataFrame(mb_cat),
195+
catalog_col_names=['foo', 'bar'],
196+
band_indexes=[0, 1],
197+
tile_dimensions=(64,64)
198+
)
195199
mb2.printSchema()
196200
```
197201

pyrasterframes/src/main/python/docs/vector-data.pymd

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ As documented in the @ref:[function reference](reference.md), various user-defin
8686
```python, native_centroid
8787
from pyrasterframes.rasterfunctions import st_centroid
8888
df = df.withColumn('centroid', st_centroid(df.geometry))
89-
centroids = df.select('name', 'geometry', 'naive_centroid', 'centroid')
89+
centroids = df.select('geometry', 'name', 'naive_centroid', 'centroid')
9090
centroids.limit(3)
9191
```
9292

@@ -101,14 +101,9 @@ l8 = l8.withColumn('geom', st_geometry(l8.bounds_wgs84))
101101
l8 = l8.withColumn('paducah', st_point(lit(-88.6275), lit(37.072222)))
102102

103103
l8_filtered = l8.filter(st_intersects(l8.geom, st_bufferPoint(l8.paducah, lit(500000.0))))
104+
l8_filtered.select('product_id', 'entity_id', 'acquisition_date', 'cloud_cover_pct')
104105
```
105106

106-
```python, evaluate=False, echo=False
107-
# suppressed due to run time.
108-
l8_filtered.count()
109-
```
110-
111-
112107
[GeoPandas]: http://geopandas.org
113108
[OGR]: https://gdal.org/drivers/vector/index.html
114109
[Shapely]: https://shapely.readthedocs.io/en/latest/manual.html

0 commit comments

Comments
 (0)