Skip to content

Commit 37f6dc6

Browse files
committed
Incremental backup commit.
1 parent b821445 commit 37f6dc6

File tree

21 files changed

+253
-575
lines changed

21 files changed

+253
-575
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,4 @@ tour/*.tiff
2727
scoverage-report*
2828

2929
zz-*
30+
rf-notebook/src/main/notebooks/.ipython

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,6 @@ Additional, Python sepcific build instruction may be found at [pyrasterframes/sr
6262

6363
## Copyright and License
6464

65-
RasterFrames is released under the Apache 2.0 License, copyright Astraea, Inc. 2017-2019.
65+
RasterFrames is released under the Apache 2.0 License, copyright Astraea, Inc. 2017-2020.
6666

6767

datasource/src/it/scala/org/locationtech/rasterframes/datasource/raster/RaterSourceDataSourceIT.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ class RaterSourceDataSourceIT extends TestEnvironment with TestData {
3131
// A regression test.
3232
val rf = spark.read.raster
3333
.withSpatialIndex()
34-
.load("https://s22s-test-geotiffs.s3.amazonaws.com/water_class/seasonality_90W_50N.tif")
34+
.load("https://rasterframes.s3.amazonaws.com/samples/water_class/seasonality_90W_50N.tif")
3535

3636
val target_rf =
3737
rf.select(rf_extent($"proj_raster").alias("extent"), rf_crs($"proj_raster").alias("crs"), rf_tile($"proj_raster").alias("target"))
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
# Set everything to be logged to the console
19+
log4j.rootCategory=INFO, console
20+
log4j.appender.console=org.apache.log4j.ConsoleAppender
21+
log4j.appender.console.target=System.err
22+
log4j.appender.console.layout=org.apache.log4j.PatternLayout
23+
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
24+
25+
# Set the default spark-shell log level to WARN. When running the spark-shell, the
26+
# log level for this class is used to overwrite the root logger's log level, so that
27+
# the user can have different defaults for the shell and regular Spark apps.
28+
log4j.logger.org.apache.spark.repl.Main=WARN
29+
30+
31+
log4j.logger.org.apache=ERROR
32+
log4j.logger.com.amazonaws=WARN
33+
log4j.logger.geotrellis=WARN
34+
35+
# Settings to quiet third party logs that are too verbose
36+
log4j.logger.org.spark_project.jetty=WARN
37+
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
38+
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
39+
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
40+
log4j.logger.org.locationtech.rasterframes=DEBUG
41+
log4j.logger.org.locationtech.rasterframes.ref=DEBUG
42+
log4j.logger.org.apache.parquet.hadoop.ParquetRecordReader=OFF
43+
44+
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
45+
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
46+
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
47+
48+
log4j.logger.org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator=ERROR
49+
log4j.logger.org.apache.spark.sql.execution.WholeStageCodegenExec=ERROR
50+
log4j.logger.geotrellis.raster.gdal=ERROR

docs/src/main/paradox/index.md

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -29,18 +29,20 @@ The source code can be found on GitHub at [locationtech/rasterframes](https://gi
2929

3030
## Detailed Contents
3131

32-
@@ toc { depth=4 }
32+
@@ toc { depth=3 }
3333

3434
@@@ index
35-
* [Overview](description.md)
36-
* [Getting Started](getting-started.md)
37-
* [Concepts](concepts.md)
38-
* [Raster Data I/O](raster-io.md)
39-
* [Vector Data](vector-data.md)
40-
* [Raster Processing](raster-processing.md)
41-
* [Numpy and Pandas](numpy-pandas.md)
42-
* [Scala and SQL](languages.md)
43-
* [Function Reference](reference.md)
44-
* [Release Notes](release-notes.md)
35+
* @ref:[Overview](description.md)
36+
* @ref:[Getting Started](getting-started.md)
37+
* @ref:[Concepts](concepts.md)
38+
* @ref:[Raster Data I/O](raster-io.md)
39+
* @ref:[Vector Data](vector-data.md)
40+
* @ref:[Raster Processing](raster-processing.md)
41+
* @ref:[Machine Learning](machine-learning.md)
42+
* @ref:[Numpy and Pandas](numpy-pandas.md)
43+
* @ref:[IPython Extensions](ipython.md)
44+
* @ref:[Scala and SQL](languages.md)
45+
* @ref:[Function Reference](reference.md)
46+
* @ref:[Release Notes](release-notes.md)
4547
@@@
4648

docs/src/main/paradox/raster-processing.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99
* @ref:[Aggregation](aggregation.md)
1010
* @ref:[Time Series](time-series.md)
1111
* @ref:[Raster Join](raster-join.md)
12-
* @ref:[Machine Learning](machine-learning.md)
1312

1413
@@@
1514

docs/src/main/paradox/release-notes.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,13 @@
66

77
* Upgraded to Spark 2.4.7
88
* Added `pyspark.sql.DataFrame.display(num_rows, truncate)` extension method when `rf_ipython` is imported.
9+
* Added users' manual section on IPython display enhancements.
910
* Added `method_name` parameter to the `rf_resample` method.
1011
* __BREAKING__: In SQL, the function `rf_resample` now takes 3 arguments. You can use `rf_resample_nearest` with two arguments or refactor to `rf_resample(t, v, "nearest")`.
1112
* Added resample method parameter to SQL and Python APIs. @ref:[See updated docs](raster-join.md).
1213
* Upgraded many of the pyrasterframes dependencies, including:
1314
`descartes`, `fiona`, `folium`, `geopandas`, `matplotlib`, `numpy`, `pandas`, `rasterio`, `shapely`
14-
15+
* Changed `rasterframes.prefer-gdal` configuration parameter to default to `False`, as JVM GeoTIFF performs just as well for COGs as the GDAL one.
1516

1617
### 0.9.0
1718

pyrasterframes/src/main/python/docs/aggregation.pymd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ rf.agg(rf_agg_local_mean('tile')) \
7171
We can also count the total number of data and NoData cells over all the _tiles_ in a DataFrame using @ref:[`rf_agg_data_cells`](reference.md#rf-agg-data-cells) and @ref:[`rf_agg_no_data_cells`](reference.md#rf-agg-no-data-cells). There are ~3.8 million data cells and ~1.9 million NoData cells in this DataFrame. See the section on @ref:["NoData" handling](nodata-handling.md) for additional discussion on handling missing data.
7272

7373
```python, cell_counts
74-
rf = spark.read.raster('https://s22s-test-geotiffs.s3.amazonaws.com/MCD43A4.006/11/05/2018233/MCD43A4.A2018233.h11v05.006.2018242035530_B02.TIF')
74+
rf = spark.read.raster('https://rasterframes.s3.amazonaws.com/samples/MCD43A4.006/11/05/2018233/MCD43A4.A2018233.h11v05.006.2018242035530_B02.TIF')
7575
stats = rf.agg(rf_agg_data_cells('proj_raster'), rf_agg_no_data_cells('proj_raster'))
7676
stats
7777
```
@@ -83,7 +83,7 @@ The statistical summary functions return a summary of cell values: number of dat
8383
The @ref:[`rf_tile_stats`](reference.md#rf-tile-stats) function computes summary statistics separately for each row in a _tile_ column as shown below.
8484

8585
```python, tile_stats
86-
rf = spark.read.raster('https://s22s-test-geotiffs.s3.amazonaws.com/luray_snp/B02.tif')
86+
rf = spark.read.raster('https://rasterframes.s3.amazonaws.com/samples/luray_snp/B02.tif')
8787
stats = rf.select(rf_tile_stats('proj_raster').alias('stats'))
8888

8989
stats.printSchema()
@@ -125,7 +125,7 @@ The @ref:[`rf_tile_histogram`](reference.md#rf-tile-histogram) function computes
125125
```python, tile_histogram
126126
import matplotlib.pyplot as plt
127127

128-
rf = spark.read.raster('https://s22s-test-geotiffs.s3.amazonaws.com/MCD43A4.006/11/05/2018233/MCD43A4.A2018233.h11v05.006.2018242035530_B02.TIF')
128+
rf = spark.read.raster('https://rasterframes.s3.amazonaws.com/samples/MCD43A4.006/11/05/2018233/MCD43A4.A2018233.h11v05.006.2018242035530_B02.TIF')
129129

130130
hist_df = rf.select(rf_tile_histogram('proj_raster')['bins'].alias('bins'))
131131
hist_df.printSchema()
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# IPython/Jupyter Extensions
2+
3+
The `pyrasterframes.rf_ipython` module injects a number of visualization extensions into the IPython environment, enhancing visualization of `DataFrame`s and `Tile`s.
4+
5+
By default, the last expression's result in a IPython cell is passed to the `IPython.display.display` function. This function in turn looks for a [`DisplayFormatter`](https://ipython.readthedocs.io/en/stable/api/generated/IPython.core.formatters.html#IPython.core.formatters.DisplayFormatter) associated with the type, which in turn converts the instance to a display-appropriate representation, based on MIME type. For example, each `DisplayFormatter` may `plain/text` version for the IPython shell, and a `text/html` version for a Jupyter Notebook.
6+
7+
```python imports, echo=False, results='hidden'
8+
from pyrasterframes.all import *
9+
from pyspark.sql.functions import col
10+
spark = create_rf_spark_session()
11+
```
12+
13+
## Initialize Sample
14+
15+
First we read in a sample image as tiles:
16+
17+
```python raster_read
18+
uri = 'https://modis-pds.s3.amazonaws.com/MCD43A4.006/31/11/2017158/' \
19+
'MCD43A4.A2017158.h31v11.006.2017171203421_B01.TIF'
20+
21+
# here we flatten the projected raster structure
22+
df = spark.read.raster(uri) \
23+
.withColumn('tile', rf_tile('proj_raster')) \
24+
.withColumn('crs', rf_crs(col('proj_raster'))) \
25+
.withColumn('extent', rf_extent(col('proj_raster'))) \
26+
.drop('proj_raster')
27+
```
28+
29+
Print the schema to confirm it's "shape":
30+
31+
```python schema
32+
df.printSchema()
33+
```
34+
35+
# Tile Display
36+
37+
Let's look at a single tile. A `pyrasterframes.rf_types.Tile` will automatically render nicely in Jupyter or IPython.
38+
39+
```python single_tile
40+
tile = df.select(df.tile).first()['tile']
41+
tile
42+
```
43+
44+
If you access the tile's `cells` you get the underlying numpy ndarray (more specifically in this case, `numpy.ma.MaskedArray`).
45+
46+
```python cells
47+
tile.cells
48+
```
49+
50+
If you just want the string representation of the Tile, use `str`:
51+
52+
```python tile_as_string
53+
str(tile)
54+
```
55+
56+
## pyspark.sql.DataFrame Display
57+
58+
There is also a capability for HTML rendering of the spark DataFrame. Rendering work is done on the JVM and the HTML string representation is provided for IPython to display.
59+
60+
```python spark_dataframe
61+
df.select('tile', 'extent')
62+
```
63+
64+
### Changing number of rows
65+
66+
Because the `IPython.display.display` function doesn't accept any parameters, we have to provide a different means of passing parameters to the rendering code. Pandas does it with global settings via `set_option`/`get_option`. We take a more functional approach and have the user invoke an explicit `display` method:
67+
68+
```python custom_display
69+
df.display(num_rows=1, truncate=True)
70+
```
71+
72+
73+
## pandas.DataFrame Display
74+
75+
The same thing works for Pandas DataFrame if it contains a column of `Tile`s.
76+
77+
```python pandas_dataframe
78+
# Limit copy of data from Spark to a few tiles.
79+
pandas_df = df.limit(4).toPandas()
80+
pandas_df.drop(['proj_raster_path'], axis=1)
81+
```
82+

pyrasterframes/src/main/python/docs/local-algebra.pymd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ This form of `(x - y) / (x + y)` is common in remote sensing and is called a nor
3535

3636
```python, read_rasters
3737
from pyspark.sql import Row
38-
uri_pattern = 'https://s22s-test-geotiffs.s3.amazonaws.com/luray_snp/B0{}.tif'
38+
uri_pattern = 'https://rasterframes.s3.amazonaws.com/samples/luray_snp/B0{}.tif'
3939
catalog_df = spark.createDataFrame([
4040
Row(red=uri_pattern.format(4), nir=uri_pattern.format(8))
4141
])

0 commit comments

Comments
 (0)