Skip to content

Commit 2beb541

Browse files
Updated build environment to use Databricks Runtime 13.3 LTS as baseline (#319)
* Updated build environment to use Databricks Runtime 13.3 LTS as baseline * removed duplicate column addition from test (previously undetected by earlier version of PySpark) * updated pipfile dependencies * changed comment in changelog to reflect correct Python dependencies
1 parent 41fc07f commit 2beb541

File tree

11 files changed

+50
-47
lines changed

11 files changed

+50
-47
lines changed

.github/workflows/push.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,10 @@ jobs:
3131
sudo update-alternatives --set java /usr/lib/jvm/temurin-8-jdk-amd64/bin/java
3232
java -version
3333
34-
- name: Set up Python 3.9.21
34+
- name: Set up Python 3.10.12
3535
uses: actions/setup-python@v5
3636
with:
37-
python-version: '3.9.21'
37+
python-version: '3.10.12'
3838
cache: 'pipenv'
3939

4040
- name: Check Python version

.github/workflows/release.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,10 @@ jobs:
2424
sudo update-alternatives --set java /usr/lib/jvm/temurin-8-jdk-amd64/bin/java
2525
java -version
2626
27-
- name: Set up Python 3.9.21
27+
- name: Set up Python 3.10.12
2828
uses: actions/setup-python@v5
2929
with:
30-
python-version: '3.9.21'
30+
python-version: '3.10.12'
3131
cache: 'pipenv'
3232

3333
- name: Check Python version

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,8 @@ All notable changes to the Databricks Labs Data Generator will be documented in
99
* Updated build scripts to use Ubuntu 22.04 to correspond to environment in Databricks runtime
1010

1111
#### Changed
12-
* Changed base Databricks runtime version to DBR 11.3 LTS (based on Apache Spark 3.3.0)
12+
* Changed base Databricks runtime version to DBR 13.3 LTS (based on Apache Spark 3.4.1) - minimum supported version
13+
of Python is now 3.10.12
1314

1415
#### Added
1516
* Added support for serialization to/from JSON format

CONTRIBUTING.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Our recommended mechanism for building the code is to use a `conda` or `pipenv`
4343
But it can be built with any Python virtualization environment.
4444

4545
### Spark dependencies
46-
The builds have been tested against Spark 3.3.0. This requires the OpenJDK 1.8.56 or later version of Java 8.
46+
The builds have been tested against Apache Spark 3.4.1.
4747
The Databricks runtimes use the Azul Zulu version of OpenJDK 8 and we have used these in local testing.
4848
These are not installed automatically by the build process, so you will need to install them separately.
4949

@@ -72,7 +72,7 @@ To build with `pipenv`, perform the following commands:
7272
- Run `make dist` from the main project directory
7373
- The resulting wheel file will be placed in the `dist` subdirectory
7474

75-
The resulting build has been tested against Spark 3.3.0
75+
The resulting build has been tested against Spark 3.4.1
7676

7777
## Creating the HTML documentation
7878

@@ -158,19 +158,19 @@ See https://legacy.python.org/dev/peps/pep-0008/
158158

159159
# Github expectations
160160
When running the unit tests on Github, the environment should use the same environment as the latest Databricks
161-
runtime latest LTS release. While compatibility is preserved on LTS releases from Databricks runtime 11.3 onwards,
161+
runtime latest LTS release. While compatibility is preserved on LTS releases from Databricks runtime 13.3 LTS onwards,
162162
unit tests will be run on the environment corresponding to the latest LTS release.
163163

164-
Libraries will use the same versions as the earliest supported LTS release - currently 11.3 LTS
164+
Libraries will use the same versions as the earliest supported LTS release - currently 13.3 LTS
165165

166166
This means for the current build:
167167

168-
- Use of Ubuntu 22.04 for the test runner
168+
- Use of Ubuntu 22.04.2 LTS for the test runner
169169
- Use of Java 8
170-
- Use of Python 3.9.21 when testing / building the image
170+
- Use of Python 3.10.12 when testing / building the image
171171

172172
See the following resources for more information
173173
= https://docs.databricks.com/en/release-notes/runtime/15.4lts.html
174-
- https://docs.databricks.com/en/release-notes/runtime/11.3lts.html
174+
- https://docs.databricks.com/aws/en/release-notes/runtime/13.3lts
175175
- https://github.com/actions/runner-images/issues/10636
176176

Pipfile

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ sphinx = ">=2.0.0,<3.1.0"
1010
nbsphinx = "*"
1111
numpydoc = "==0.8"
1212
pypandoc = "*"
13-
ipython = "==7.32.0"
13+
ipython = "==8.10.0"
1414
pydata-sphinx-theme = "*"
1515
recommonmark = "*"
1616
sphinx-markdown-builder = "*"
@@ -19,13 +19,13 @@ prospector = "*"
1919

2020
[packages]
2121
numpy = "==1.22.0"
22-
pyspark = "==3.3.0"
23-
pyarrow = "==7.0.0"
24-
wheel = "==0.37.0"
25-
pandas = "==1.3.4"
26-
setuptools = "==58.0.4"
27-
pyparsing = "==3.0.4"
22+
pyspark = "==3.4.1"
23+
pyarrow = "==8.0.0"
24+
wheel = "==0.37.1"
25+
pandas = "==1.4.4"
26+
setuptools = "==63.4.1"
27+
pyparsing = "==3.0.9"
2828
jmespath = "==0.10.0"
2929

3030
[requires]
31-
python_version = "3.9.21"
31+
python_version = "3.10.12"

README.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -83,23 +83,26 @@ The documentation [installation notes](https://databrickslabs.github.io/dbldatag
8383
contains details of installation using alternative mechanisms.
8484

8585
## Compatibility
86-
The Databricks Labs Data Generator framework can be used with Pyspark 3.3.0 and Python 3.9.21 or later. These are
87-
compatible with the Databricks runtime 11.3 LTS and later releases. For full Unity Catalog support,
88-
we recommend using Databricks runtime 13.2 or later (Databricks 13.3 LTS or above preferred)
86+
The Databricks Labs Data Generator framework can be used with Pyspark 3.4.1 and Python 3.10.12 or later. These are
87+
compatible with the Databricks runtime 13.3 LTS and later releases. This version also provides Unity Catalog
88+
compatibily.
8989

9090
For full library compatibility for a specific Databricks Spark release, see the Databricks
9191
release notes for library compatibility
9292

9393
- https://docs.databricks.com/release-notes/runtime/releases.html
9494

95-
When using the Databricks Labs Data Generator on "Unity Catalog" enabled Databricks environments,
95+
In older releases, when using the Databricks Labs Data Generator on "Unity Catalog" enabled Databricks environments,
9696
the Data Generator requires the use of `Single User` or `No Isolation Shared` access modes when using Databricks
9797
runtimes prior to release 13.2. This is because some needed features are not available in `Shared`
9898
mode (for example, use of 3rd party libraries, use of Python UDFs) in these releases.
99-
Depending on settings, the `Custom` access mode may be supported.
99+
Depending on settings, the `Custom` access mode may be supported for those releases.
100100

101101
The use of Unity Catalog `Shared` access mode is supported in Databricks runtimes from Databricks runtime release 13.2
102-
onwards.
102+
onwards.
103+
104+
*This version of the data generator uses the Databricks runtime 13.3 LTS as the minimum supported
105+
version and alleviates these issues.*
103106

104107
See the following documentation for more information:
105108

@@ -155,7 +158,7 @@ The GitHub repository also contains further examples in the examples directory.
155158

156159
## Spark and Databricks Runtime Compatibility
157160
The `dbldatagen` package is intended to be compatible with recent LTS versions of the Databricks runtime, including
158-
older LTS versions at least from 11.3 LTS and later. It also aims to be compatible with Delta Live Table runtimes,
161+
older LTS versions at least from 13.3 LTS and later. It also aims to be compatible with Delta Live Table runtimes,
159162
including `current` and `preview`.
160163

161164
While we don't specifically drop support for older runtimes, changes in Pyspark APIs or

makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,11 @@ prepare: clean
2727

2828
create-dev-env:
2929
@echo "$(OK_COLOR)=> making conda dev environment$(NO_COLOR)"
30-
conda create -n $(ENV_NAME) python=3.9.21
30+
conda create -n $(ENV_NAME) python=3.10.12
3131

3232
create-github-build-env:
3333
@echo "$(OK_COLOR)=> making conda dev environment$(NO_COLOR)"
34-
conda create -n pip_$(ENV_NAME) python=3.9.21
34+
conda create -n pip_$(ENV_NAME) python=3.10.12
3535

3636
install-dev-dependencies:
3737
@echo "$(OK_COLOR)=> installing dev environment requirements$(NO_COLOR)"

python/dev_require.txt

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
# The following packages are used in building the test data generator framework.
22
# All packages used are already installed in the Databricks runtime environment for version 6.5 or later
33
numpy==1.22.0
4-
pandas==1.3.4
4+
pandas==1.4.4
55
pickleshare==0.7.5
66
py4j>=0.10.9.3
7-
pyarrow==7.0.0
8-
pyspark==3.3.0
7+
pyarrow==8.0.0
8+
pyspark==3.4.1
99
python-dateutil==2.8.2
1010
six==1.16.0
11-
pyparsing==3.0.4
11+
pyparsing==3.0.9
1212
jmespath==0.10.0
1313

1414
# The following packages are required for development only
15-
wheel==0.37.0
16-
setuptools==58.0.4
15+
wheel==0.37.1
16+
setuptools==63.4.1
1717
bumpversion
1818
pytest
1919
pytest-cov
@@ -28,9 +28,9 @@ sphinx_rtd_theme
2828
nbsphinx
2929
numpydoc==0.8
3030
pypandoc
31-
ipython==7.32.0
31+
ipython==8.10.0
3232
recommonmark
3333
sphinx-markdown-builder
34-
Jinja2 < 3.1
34+
Jinja2 < 3.1, >= 2.11.3
3535
sphinx-copybutton
3636

python/require.txt

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
# The following packages are used in building the test data generator framework.
22
# All packages used are already installed in the Databricks runtime environment for version 6.5 or later
33
numpy==1.22.0
4-
pandas==1.3.4
4+
pandas==1.4.4
55
pickleshare==0.7.5
66
py4j==0.10.9
7-
pyarrow==7.0.0
8-
pyspark==3.3.0
7+
pyarrow==8.0.0
8+
pyspark==3.4.1
99
python-dateutil==2.8.2
1010
six==1.16.0
11-
pyparsing==3.0.4
11+
pyparsing==3.0.9
1212
jmespath==0.10.0
1313

1414
# The following packages are required for development only
15-
wheel==0.37.0
16-
setuptools==58.0.4
15+
wheel==0.37.1
16+
setuptools==63.4.1
1717
bumpversion
1818
pytest
1919
pytest-cov
@@ -27,9 +27,9 @@ sphinx_rtd_theme
2727
nbsphinx
2828
numpydoc==0.8
2929
pypandoc
30-
ipython==7.32.0
30+
ipython==8.10.0
3131
recommonmark
3232
sphinx-markdown-builder
33-
Jinja2 < 3.1
33+
Jinja2 < 3.1, >= 2.11.3
3434
sphinx-copybutton
3535

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,5 +55,5 @@
5555
"Intended Audience :: Developers",
5656
"Intended Audience :: System Administrators"
5757
],
58-
python_requires='>=3.9.21',
58+
python_requires='>=3.10.12',
5959
)

0 commit comments

Comments
 (0)