Skip to content

Commit bdcfb39

Browse files
authored
Merge pull request #290 from s22s/docs/editorial-review-1
Application of documentation edits and other tweaks.
2 parents 18ed679 + 6558b0b commit bdcfb39

File tree

26 files changed

+241
-181
lines changed

26 files changed

+241
-181
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ addons:
2424
- pandoc
2525

2626
install:
27-
- pip install rasterio shapely pandas numpy
27+
- pip install rasterio shapely pandas numpy pweave
2828
- wget -O - https://piccolo.link/sbt-1.2.8.tgz | tar xzf -
2929

3030
script:

docs/build.sbt

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,9 @@ makePDF := {
4141
work.mkdirs()
4242

4343
val prepro = files.zipWithIndex.map { case (f, i)
44-
val dest = work / f"$i%02d.md"
44+
val dest = work / f"$i%02d-${f.getName}%s"
4545
// Filter cross links and add a newline
46-
(Seq("sed", "-e", """s/@ref:\[\([^]]*\)\](.*)/_\1_/g;s/@@.*//g""", f.toString) #> dest).!
46+
(Seq("sed", "-e", """s/@ref://g;s/@@.*//g""", f.toString) #> dest).!
4747
// Add newline at the end of the file so as to make pandoc happy
4848
("echo" #>> dest).!
4949
("echo \\pagebreak" #>> dest).!
@@ -55,7 +55,7 @@ makePDF := {
5555
val header = (Compile / sourceDirectory).value / "latex" / "header.latex"
5656

5757
val args = "pandoc" ::
58-
"--from=markdown" ::
58+
"--from=markdown+pipe_tables" ::
5959
"--to=pdf" ::
6060
"-t" :: "latex" ::
6161
"-s" ::
@@ -64,7 +64,6 @@ makePDF := {
6464
"-V" :: "author:Astraea, Inc." ::
6565
"-V" :: "geometry:margin=0.75in" ::
6666
"-V" :: "papersize:letter" ::
67-
"-V" :: "links-as-notes" ::
6867
"--include-in-header" :: header.toString ::
6968
"-o" :: output.toString ::
7069
prepro.map(_.toString).toList

docs/src/main/latex/header.latex

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,9 @@
11
\DeclareUnicodeCharacter{2218}{$\circ$}
2-
\DeclareUnicodeCharacter{2714}{$\checkmark}
2+
\DeclareUnicodeCharacter{2714}{$\checkmark$}
3+
\DeclareUnicodeCharacter{21A9}{$\newline$}
4+
\hypersetup{
5+
colorlinks=true,
6+
linkcolor=blue,
7+
allbordercolors={0 0 0},
8+
pdfborderstyle={/S/U/W 1}
9+
}

docs/src/main/paradox/_template/page.st

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@
3131
.md-left { float: left; }
3232
.md-right { float: right; }
3333
.md-clear { clear: both; }
34+
table { font-size: 80%; }
35+
code { font-size: 0.75em !important; }
3436
</style>
3537
</head>
3638

@@ -132,7 +134,7 @@
132134
<div class="row site-footer-content">
133135
<div class="small-12 text-center large-9 column">
134136
<div class="copyright">
135-
<span class="text">&copy; $page.properties.("date.year")$
137+
<span class="text">Copyright &copy; $page.properties.("date.year")$
136138
<a href="http://www.astraea.earth/">Astraea, Inc.</a></span>
137139
</div>
138140
</div>

experimental/src/main/scala/org/locationtech/rasterframes/experimental/datasource/awspds/MODISCatalogDataSource.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@ object MODISCatalogDataSource extends LazyLogging with ResourceCacheSupport {
9393
"2018-03-12",
9494
"2018-03-13",
9595
"2018-03-14",
96+
"2018-03-15",
9697
"2018-05-16",
9798
"2018-05-17",
9899
"2018-05-18",

pyrasterframes/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ To manually initialize PyRasterFrames in a `pyspark` shell, prepare to call pysp
6262

6363
```
6464

65-
Then in the pyspark shell or app, import the module and call `withRasterFrames` on the SparkSession.
65+
Then in the PySpark shell or script, import the module and call `withRasterFrames` on the SparkSession.
6666

6767
```python
6868
from pyrasterframes.utils import create_rf_spark_session

pyrasterframes/src/main/python/docs/aggregation.pymd

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,13 @@ import os
1111
spark = create_rf_spark_session()
1212
```
1313

14-
There are 3 types of aggregate functions: _tile_ aggregate, DataFrame aggregate, and element-wise local aggregate. In the @ref:[tile aggregate functions](reference.md#tile-statistics), we are computing a statistical summary per row of a _tile_ column in a DataFrame. In the @ref:[DataFrame aggregate functions](reference.md#aggregate-tile-statistics), we are computing statistical summaries over all of the cell values *and* across all of the rows in the DataFrame or group. In the @ref:[element-wise local aggregate functions](reference.md#tile-local-aggregate-statistics), we are computing the element-wise statistical summary across a DataFrame or group of _tiles_.
14+
There are three types of aggregate functions: _tile_ aggregate, DataFrame aggregate, and element-wise local aggregate. In the @ref:[tile aggregate functions](reference.md#tile-statistics), we are computing a statistical summary per row of a _tile_ column in a DataFrame. In the @ref:[DataFrame aggregate functions](reference.md#aggregate-tile-statistics), we are computing statistical summaries over all of the cell values *and* across all of the rows in the DataFrame or group. In the @ref:[element-wise local aggregate functions](reference.md#tile-local-aggregate-statistics), we are computing the element-wise statistical summary across a DataFrame or group of _tiles_.
1515

1616
## Tile Mean Example
1717

18-
We can illustrate these differences in computing an aggregate mean. First, we create a sample DataFrame of 2 _tiles_ where the first _tile_ is composed of 25 values of 1.0 and the second _tile_ is composed of 25 values of 3.0.
18+
We can illustrate aggregate differences by computing an aggregate mean. First, we create a sample DataFrame of 2 _tiles_ where the first _tile_ is composed of 25 values of 1.0 and the second _tile_ is composed of 25 values of 3.0.
1919

20-
```python, sql_dataframe, results='raw'
20+
```python, sql_dataframe
2121
import pyspark.sql.functions as F
2222

2323
rf = spark.sql("""
@@ -26,33 +26,36 @@ UNION
2626
SELECT 2 as id, rf_local_multiply(rf_make_ones_tile(5, 5, 'float32'), 3) as tile
2727
""")
2828

29-
rf.select("id", rf_render_matrix("tile")).show(truncate=False)
29+
tiles = rf.select("tile").collect()
30+
print(tiles[0]['tile'].cells)
31+
print(tiles[1]['tile'].cells)
3032
```
3133

32-
33-
In this code block, we are using the @ref:[`rf_tile_mean`](reference.md#rf-tile-mean) function to compute the _tile_ aggregate mean of cells in each row of column `tile`. The mean of each _tile_ is computed separately, so the first mean is 1.0 and the second mean is 3.0. Notice that the number of rows in the DataFrame is the same before and after the aggregation.
34+
We use the @ref:[`rf_tile_mean`](reference.md#rf-tile-mean) function to compute the _tile_ aggregate mean of cells in each row of column `tile`. The mean of each _tile_ is computed separately, so the first mean is 1.0 and the second mean is 3.0. Notice that the number of rows in the DataFrame is the same before and after the aggregation.
3435

3536
```python, tile_mean, results='raw'
36-
rf.select(F.col('id'), rf_tile_mean(F.col('tile'))).show(truncate=False)
37+
rf.select(F.col('id'), rf_tile_mean(F.col('tile'))).show()
3738
```
3839

39-
In this code block, we are using the @ref:[`rf_agg_mean`](reference.md#rf-agg-mean) function to compute the DataFrame aggregate, which averages 25 values of 1.0 and 25 values of 3.0, across the fifty cells in two rows. Note that only a single row is returned since the average is computed over the full DataFrame.
40+
We use the @ref:[`rf_agg_mean`](reference.md#rf-agg-mean) function to compute the DataFrame aggregate, which averages 25 values of 1.0 and 25 values of 3.0, across the fifty cells in two rows. Note that only a single row is returned since the average is computed over the full DataFrame.
4041

4142
```python, agg_mean, results='raw'
4243
rf.agg(rf_agg_mean(F.col('tile'))).show()
4344
```
4445

45-
In this code block, we are using the @ref:[`rf_agg_local_mean`](reference.md#rf-agg-local-mean) function to compute the element-wise local aggregate mean across the two rows. In this example it is computing the mean of one value of 1.0 and one value of 3.0 to arrive at the element-wise mean, but doing so twenty-five times, one for each position in the _tile_.
46+
We use the @ref:[`rf_agg_local_mean`](reference.md#rf-agg-local-mean) function to compute the element-wise local aggregate mean across the two rows. For this aggregation, we are computing the mean of one value of 1.0 and one value of 3.0 to arrive at the element-wise mean, but doing so twenty-five times, one for each position in the _tile_.
4647

47-
To compute an element-wise local aggregate, _tiles_ need have the same dimensions as in the example below where both _tiles_ have 5 rows and 5 columns. If we tried to compute an element-wise local aggregate over the DataFrame without equal _tile_ dimensions, we would get a runtime error.
48+
To compute an element-wise local aggregate, _tiles_ need to have the same dimensions. In this case, both _tiles_ have 5 rows and 5 columns. If we tried to compute an element-wise local aggregate over the DataFrame without equal _tile_ dimensions, we would get a runtime error.
4849

49-
```python, local_mean, results='raw'
50-
rf.agg(rf_agg_local_mean(F.col('tile')).alias("local_mean")).select(rf_render_matrix("local_mean")).show(truncate=False)
50+
```python, local_mean
51+
t = rf.agg(rf_agg_local_mean(F.col('tile')).alias('local_mean')) \
52+
.collect()[0]['local_mean']
53+
print(t.cells)
5154
```
5255

5356
## Cell Counts Example
5457

55-
We can also count the total number of data and NoData cells over all the _tiles_ in a DataFrame using @ref:[`rf_agg_data_cells`](reference.md#rf-agg-data-cells) and @ref:[`rf_agg_no_data_cells`](reference.md#rf-agg-no-data-cells). There are 3,842,290 data cells and 1,941,734 NoData cells in this DataFrame. See section on @ref:["NoData" handling](nodata-handling.md) for additional discussion on handling missing data.
58+
We can also count the total number of data and NoData cells over all the _tiles_ in a DataFrame using @ref:[`rf_agg_data_cells`](reference.md#rf-agg-data-cells) and @ref:[`rf_agg_no_data_cells`](reference.md#rf-agg-no-data-cells). There are ~3.8 million data cells and ~1.9 million NoData cells in this DataFrame. See the section on @ref:["NoData" handling](nodata-handling.md) for additional discussion on handling missing data.
5659

5760
```python, cell_counts, results='raw'
5861
rf = spark.read.raster('https://s22s-test-geotiffs.s3.amazonaws.com/MCD43A4.006/11/05/2018233/MCD43A4.A2018233.h11v05.006.2018242035530_B02.TIF')
@@ -86,7 +89,7 @@ rf.agg(rf_agg_stats('proj_raster').alias('stats')) \
8689
.show()
8790
```
8891

89-
The @ref:[`rf_agg_local_stats`](reference.md#rf-agg-local-stats) function computes the element-wise local aggregate statistical summary as shown below. The DataFrame used in the previous two code blocks, has unequal _tile_ dimensions, so a different DataFrame is used in this code block to avoid a runtime error.
92+
The @ref:[`rf_agg_local_stats`](reference.md#rf-agg-local-stats) function computes the element-wise local aggregate statistical summary as shown below. The DataFrame used in the previous two code blocks has unequal _tile_ dimensions, so a different DataFrame is used in this code block to avoid a runtime error.
9093

9194
```python, agg_local_stats
9295
rf = spark.sql("""
@@ -106,7 +109,7 @@ for r in agg_local_stats:
106109

107110
## Histogram
108111

109-
The @ref:[`rf_tile_histogram`](reference.md#rf-tile-histogram) function computes a count of cell values within each row of _tile_ and outputs a `bins` array with the schema below. In the graph below, we have plotted `value` on the x-axis and `count` on the y-axis to create the histogram. There are 100 rows of _tile_ in this DataFrame, but this histogram is just computed for the _tile_ in the first row.
112+
The @ref:[`rf_tile_histogram`](reference.md#rf-tile-histogram) function computes a count of cell values within each row of _tile_ and outputs a `bins` array with the schema below. In the graph below, we have plotted each bin's `value` on the x-axis and `count` on the y-axis for the _tile_ in the first row of the DataFrame.
110113

111114

112115
```python, tile_histogram
@@ -118,8 +121,8 @@ hist_df = rf.select(rf_tile_histogram('proj_raster')['bins'].alias('bins'))
118121
hist_df.printSchema()
119122

120123
bins_row = hist_df.first()
121-
values = [int(row['value']) for row in bins_row.bins]
122-
counts = [int(row['count']) for row in bins_row.bins]
124+
values = [int(bin['value']) for bin in bins_row.bins]
125+
counts = [int(bin['count']) for bin in bins_row.bins]
123126

124127
plt.hist(values, weights=counts, bins=100)
125128
plt.show()

pyrasterframes/src/main/python/docs/concepts.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ There are a number of Earth-observation (EO) concepts that crop up in the discus
88

99
## Raster
1010

11-
A raster is a regular grid of numeric values. A raster can be thought of as an image, as is the case if the values in the grid represent brightness along a greyscale. More generally a raster can measure many different phenomena or encode a variety of different discrete classifications.
11+
A raster is a regular grid of numeric values. A raster can be thought of as an image, as is the case if the values in the grid represent brightness along a greyscale. More generally, a raster can measure many different phenomena or encode a variety of different discrete classifications.
1212

1313
## Cell
1414

@@ -17,6 +17,7 @@ A cell is a single row and column intersection in the raster grid. It is a singl
1717
## Cell Type
1818

1919
A numeric cell value may be encoded in a number of different computer numeric formats. There are typically three characteristics used to describe a cell type:
20+
2021
* word size (bit-width)
2122
* signed vs unsigned
2223
* integral vs floating-point
@@ -47,7 +48,7 @@ A scene (or granule) is a discrete instance of EO @ref:[raster data](concepts.md
4748

4849
## Band
4950

50-
A @ref:[scene](concepts.md#scene) frequently defines many different measurements captured a the same date-time, over the same extent, and meant to be processed together. These different measurements are referred to as bands. The name comes from the varying bandwidths of light and electromagnetic radiation measured in many EO datasets.
51+
A @ref:[scene](concepts.md#scene) frequently defines many different measurements captured at the same date-time, over the same extent, and meant to be processed together. These different measurements are referred to as bands. The name comes from the varying bandwidths of light and electromagnetic radiation measured in many EO datasets.
5152

5253
## Coordinate Reference System (CRS)
5354

pyrasterframes/src/main/python/docs/description.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
# Overview
22

3-
RasterFrames® provides a DataFrame-centric view over arbitrary Earth-observation (EO) data, enabling spatiotemporal queries, map algebra raster operations, and compatibility with the ecosystem of [Apache Spark](https://spark.apache.org/docs/latest/) [ML](https://spark.apache.org/docs/latest/ml-guide.html) algorithms. It provides APIs in @ref:[Python, SQL, and Scala](languages.md), and can scale from a laptop to a large distributed cluster, enabling _global_ analysis with satellite imagery in a wholly new, flexible and convenient way.
3+
RasterFrames® provides a DataFrame-centric view over arbitrary Earth-observation (EO) data, enabling spatiotemporal queries, map algebra raster operations, and compatibility with the ecosystem of [Apache Spark](https://spark.apache.org/docs/latest/) [ML](https://spark.apache.org/docs/latest/ml-guide.html) algorithms. It provides APIs in @ref:[Python, SQL, and Scala](languages.md), and can scale from a laptop computer to a large distributed cluster, enabling _global_ analysis with satellite imagery in a wholly new, flexible, and convenient way.
44

55
## Context
66

7-
We have a millennia-long history of organizing information in tabular form. Typically, rows represent independent events or observations, and columns represent attributes and measurements from the observations. The forms have evolved, from hand-written agricultural records and transaction ledgers, to the advent of spreadsheets on the personal computer, and on to the creation of the _DataFrame_ data structure as found in [R Data Frames][R] and [Python Pandas][Pandas]. The table-oriented data structure remains a common and critical component of organizing data across industries, and is the mental model employed by many data scientists across diverse forms of modeling and analysis.
7+
We have a millennia-long history of organizing information in tabular form. Typically, rows represent independent events or observations, and columns represent attributes and measurements from the observations. The forms have evolved, from hand-written agricultural records and transaction ledgers, to the advent of spreadsheets on the personal computer, and on to the creation of the _DataFrame_ data structure as found in [R Data Frames][R] and [Python Pandas][Pandas]. The table-oriented data structure remains a common and critical component of organizing data across industries, and—most importantly—it is the mental model employed by data scientists across diverse forms of modeling and analysis.
88

99
The evolution of the DataFrame form has continued with [Spark SQL](https://spark.apache.org/docs/latest/sql-programming-guide.html), which brings DataFrames to the big data distributed compute space. Through several novel innovations, Spark SQL enables data scientists to work with DataFrames too large for the memory of a single computer. As suggested by the name, these DataFrames are manipulatable via standard SQL, as well as the more general-purpose programming languages Python, R, Java, and Scala.
1010

11-
RasterFrames, an incubating Eclipse Foundation LocationTech project, brings together EO data access, cloud computing, and DataFrame-based data science. The recent explosion of EO data from public and private satellite operators presents both a huge opportunity as well as a challenge to the data analysis community. It is _Big Data_ in the truest sense, and its footprint is rapidly getting bigger. According to a World Bank document on assets for post-disaster situation awareness[^1]:
11+
RasterFrames, an incubating Eclipse Foundation LocationTech project, brings together EO data access, cloud computing, and DataFrame-based data science. The recent explosion of EO data from public and private satellite operators presents both a huge opportunity and a huge challenge to the data analysis community. It is _Big Data_ in the truest sense, and its footprint is rapidly getting bigger. According to a World Bank document on assets for post-disaster situation awareness[^1]:
1212

13-
> Of the 1,738 operational satellites currently orbiting the earth (as of 9/[20]17), 596 are earth observation satellites and 477 of these are non-military assets (ie available to civil society including commercial entities and governments for earth observation, according to the Union of Concerned Scientists). This number is expected to increase significantly over the next ten years. The 200 or so planned remote sensing satellites have a value of over 27 billion USD (Forecast International). This estimate does not include the burgeoning fleets of smallsats as well as micro, nano and even smaller satellites... All this enthusiasm has, not unexpectedly, led to a veritable fire-hose of remotely sensed data which is becoming difficult to navigate even for seasoned experts.
13+
> Of the 1,738 operational satellites currently orbiting the earth (as of 9/[20]17), 596 are earth observation satellites and 477 of these are non-military assets (i.e. available to civil society including commercial entities and governments for earth observation, according to the Union of Concerned Scientists). This number is expected to increase significantly over the next ten years. The 200 or so planned remote sensing satellites have a value of over 27 billion USD (Forecast International). This estimate does not include the burgeoning fleets of smallsats as well as micro, nano and even smaller satellites... All this enthusiasm has, not unexpectedly, led to a veritable fire-hose of remotely sensed data which is becoming difficult to navigate even for seasoned experts.
1414
1515
## Benefit
1616

0 commit comments

Comments
 (0)