Skip to content

Commit 148c1a7

Browse files
authored
Add getting started with Apache Ozone (#2853)
* Add getting started with Apache Ozone Use Apache Ozone as an example S3 impl. that does not have STS. * fix typo in MinIO readme
1 parent d86e4e0 commit 148c1a7

File tree

4 files changed

+277
-1
lines changed

4 files changed

+277
-1
lines changed
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one
3+
# or more contributor license agreements. See the NOTICE file
4+
# distributed with this work for additional information
5+
# regarding copyright ownership. The ASF licenses this file
6+
# to you under the Apache License, Version 2.0 (the
7+
# "License"); you may not use this file except in compliance
8+
# with the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing,
13+
# software distributed under the License is distributed on an
14+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
# KIND, either express or implied. See the License for the
16+
# specific language governing permissions and limitations
17+
# under the License.
18+
#
19+
20+
ENDPOINT=$1
21+
# "invalidKey" in combination with SigV4 means "public" access
22+
KEY_ID=${2:-"invalidKey"}
23+
SECRET=${3:-"secret"}
24+
SLEEP=${4:-"1"}
25+
26+
if [ -z "$ENDPOINT" ]; then
27+
echo Endpoint must be provided
28+
exit 1
29+
fi
30+
31+
# Make up to 30 attempts to list buckets. Success means the service is available
32+
for i in `seq 1 30`; do
33+
echo "Listing buckets at $ENDPOINT"
34+
curl --user "$KEY_ID:$SECRET" --aws-sigv4 "aws:amz:us-west-1:s3" $ENDPOINT
35+
if [[ "$?" == "0" ]]; then
36+
echo
37+
echo "$ENDPOINT is available"
38+
break
39+
fi
40+
echo "Sleeping $SLEEP ..."
41+
sleep $SLEEP
42+
done

getting-started/minio/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ bin/spark-sql \
6060
--conf spark.sql.catalog.polaris.client.region=irrelevant
6161
```
6262

63-
Note: `s3cr3t` is defined as the password for the `root` users in the `docker-compose.yml` file.
63+
Note: `s3cr3t` is defined as the password for the `root` user in the `docker-compose.yml` file.
6464

6565
Note: The `client.region` configuration is required for the AWS S3 client to work, but it is not used in this example
6666
since MinIO does not require a specific region.

getting-started/ozone/README.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Getting Started with Apache Polaris and Apache Ozone
21+
22+
## Overview
23+
24+
This example uses [Apache Ozone](https://ozone.apache.org/) as a storage provider with Polaris.
25+
26+
Spark is used as a query engine. This example assumes a local Spark installation.
27+
See the [Spark Notebooks Example](../spark/README.md) for a more advanced Spark setup.
28+
29+
## Starting the Example
30+
31+
Start the docker compose group by running the following command from the root of the repository:
32+
33+
```shell
34+
docker compose -f getting-started/minio/docker-compose.yml up
35+
```
36+
37+
Note: this example pulls the `apache/polaris:latest` image, but assumes the image is `1.2.0-incubating` or later.
38+
39+
## Connecting From Spark
40+
41+
```shell
42+
bin/spark-sql \
43+
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0 \
44+
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
45+
--conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
46+
--conf spark.sql.catalog.polaris.type=rest \
47+
--conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \
48+
--conf spark.sql.catalog.polaris.token-refresh-enabled=false \
49+
--conf spark.sql.catalog.polaris.warehouse=quickstart_catalog \
50+
--conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \
51+
--conf spark.sql.catalog.polaris.credential=root:s3cr3t \
52+
--conf spark.sql.catalog.polaris.client.region=irrelevant
53+
```
54+
55+
Note: `s3cr3t` is defined as the password for the `root` user in the `docker-compose.yml` file.
56+
57+
Note: The `client.region` configuration is required for the AWS S3 client to work, but it is not used in
58+
this example since Ozone does not require a specific region.
59+
60+
## Running Queries
61+
62+
Run inside the Spark SQL shell:
63+
64+
```
65+
spark-sql (default)> use polaris;
66+
Time taken: 0.837 seconds
67+
68+
spark-sql ()> create namespace ns;
69+
Time taken: 0.374 seconds
70+
71+
spark-sql ()> create table ns.t1 as select 'abc';
72+
Time taken: 2.192 seconds
73+
74+
spark-sql ()> select * from ns.t1;
75+
abc
76+
Time taken: 0.579 seconds, Fetched 1 row(s)
77+
```
78+
79+
## Lack of Credential Vending
80+
81+
Notice that the Spark configuration does not contain a `X-Iceberg-Access-Delegation` header.
82+
This is because Ozone does not support the STS API and consequently cannot produce session
83+
credentials to be vended to Polaris clients.
84+
85+
The lack of STS API is represented in the Catalog storage configuration by the
86+
`stsUnavailable=false` property.
87+
88+
## S3 Credentials
89+
90+
In this example Ozone does not require credentials for accessing its S3 API. Therefore, neither
91+
Polaris, not Spark use any S3 access keys.
92+
93+
If Ozone were configured to require credentials, Spark and Polaris would have to their own separate
94+
S3 access key / secret properties because credential vending is not possible with Ozone 2.0.0.
95+
96+
## S3 Endpoints
97+
98+
Note that the catalog configuration defined in the `docker-compose.yml` contains
99+
different endpoints for the Polaris Server and the client (Spark). Specifically,
100+
the client endpoint is `http://localhost:9878`, but `endpointInternal` is `http://ozone-s3g:9878`.
101+
102+
This is necessary because clients running on `localhost` do not normally see service
103+
names (such as `ozone-s3g`) that are internal to the docker compose environment.
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one
3+
# or more contributor license agreements. See the NOTICE file
4+
# distributed with this work for additional information
5+
# regarding copyright ownership. The ASF licenses this file
6+
# to you under the Apache License, Version 2.0 (the
7+
# "License"); you may not use this file except in compliance
8+
# with the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing,
13+
# software distributed under the License is distributed on an
14+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
# KIND, either express or implied. See the License for the
16+
# specific language governing permissions and limitations
17+
# under the License.
18+
#
19+
20+
services:
21+
22+
ozone-datanode:
23+
image: &ozone-image apache/ozone:2.0.0
24+
ports:
25+
- 9864
26+
command: ["ozone","datanode"]
27+
environment:
28+
&ozone-common-config
29+
OZONE-SITE.XML_hdds.datanode.dir: "/data/hdds"
30+
OZONE-SITE.XML_ozone.metadata.dirs: "/data/metadata"
31+
OZONE-SITE.XML_ozone.om.address: "ozone-om"
32+
OZONE-SITE.XML_ozone.om.http-address: "ozone-om:9874"
33+
OZONE-SITE.XML_ozone.recon.address: "ozone-recon:9891"
34+
OZONE-SITE.XML_ozone.recon.db.dir: "/data/metadata/recon"
35+
OZONE-SITE.XML_ozone.replication: "1"
36+
OZONE-SITE.XML_ozone.scm.block.client.address: "ozone-scm"
37+
OZONE-SITE.XML_ozone.scm.client.address: "ozone-scm"
38+
OZONE-SITE.XML_ozone.scm.datanode.id.dir: "/data/metadata"
39+
OZONE-SITE.XML_ozone.scm.names: "ozone-scm"
40+
no_proxy: "ozone-om,ozone-recon,ozone-scm,ozone-s3g,localhost,127.0.0.1"
41+
ozone-om:
42+
image: *ozone-image
43+
ports:
44+
- 9874:9874
45+
environment:
46+
<<: *ozone-common-config
47+
CORE-SITE.XML_hadoop.proxyuser.hadoop.hosts: "*"
48+
CORE-SITE.XML_hadoop.proxyuser.hadoop.groups: "*"
49+
ENSURE_OM_INITIALIZED: /data/metadata/om/current/VERSION
50+
WAITFOR: ozone-scm:9876
51+
command: ["ozone","om"]
52+
ozone-scm:
53+
image: *ozone-image
54+
ports:
55+
- 9876:9876
56+
environment:
57+
<<: *ozone-common-config
58+
ENSURE_SCM_INITIALIZED: /data/metadata/scm/current/VERSION
59+
command: ["ozone","scm"]
60+
ozone-recon:
61+
image: *ozone-image
62+
ports:
63+
- 9888:9888
64+
environment:
65+
<<: *ozone-common-config
66+
command: ["ozone","recon"]
67+
ozone-s3g:
68+
image: *ozone-image
69+
ports:
70+
- 9878:9878
71+
environment:
72+
<<: *ozone-common-config
73+
command: ["ozone","s3g"]
74+
75+
polaris:
76+
image: apache/polaris:latest
77+
ports:
78+
# API port
79+
- "8181:8181"
80+
# Optional, allows attaching a debugger to the Polaris JVM
81+
- "5005:5005"
82+
environment:
83+
JAVA_DEBUG: true
84+
JAVA_DEBUG_PORT: "*:5005"
85+
AWS_REGION: us-west-2
86+
AWS_ACCESS_KEY_ID: minio_root
87+
AWS_SECRET_ACCESS_KEY: m1n1opwd
88+
POLARIS_BOOTSTRAP_CREDENTIALS: POLARIS,root,s3cr3t
89+
polaris.realm-context.realms: POLARIS
90+
quarkus.otel.sdk.disabled: "true"
91+
healthcheck:
92+
test: ["CMD", "curl", "http://localhost:8182/q/health"]
93+
interval: 2s
94+
timeout: 10s
95+
retries: 10
96+
start_period: 10s
97+
98+
polaris-setup:
99+
image: alpine/curl
100+
depends_on:
101+
polaris:
102+
condition: service_healthy
103+
environment:
104+
- CLIENT_ID=root
105+
- CLIENT_SECRET=s3cr3t
106+
volumes:
107+
- ../assets/:/assets/
108+
entrypoint: "/bin/sh"
109+
command:
110+
- "-c"
111+
- >-
112+
/assets/cloud_providers/await-s3.sh http://ozone-s3g:9878/ ;
113+
source /assets/polaris/obtain-token.sh;
114+
echo Creating bucket...;
115+
curl -X PUT --user "invalidKey:secret" --aws-sigv4 "aws:amz:us-west-1:s3" \
116+
http://ozone-s3g:9878/bucket123 ;
117+
echo Creating catalog...;
118+
export STORAGE_CONFIG_INFO='{"storageType":"S3",
119+
"endpoint":"http://localhost:9878",
120+
"endpointInternal":"http://ozone-s3g:9878",
121+
"stsUnavailable":true,
122+
"pathStyleAccess":true}';
123+
export STORAGE_LOCATION='s3://bucket123';
124+
/assets/polaris/create-catalog.sh POLARIS $$TOKEN;
125+
echo Extra grants...;
126+
curl -H "Authorization: Bearer $$TOKEN" -H 'Content-Type: application/json' \
127+
-X PUT \
128+
http://polaris:8181/api/management/v1/catalogs/quickstart_catalog/catalog-roles/catalog_admin/grants \
129+
-d '{"type":"catalog", "privilege":"CATALOG_MANAGE_CONTENT"}';
130+
echo Done.;
131+

0 commit comments

Comments
 (0)