@@ -11,68 +11,45 @@ state this explicitly, by submitting any copyrighted material via pull request,
1111other means you agree to license the material under the project's Databricks license and
1212warrant that you have the legal authority to do so.
1313
14- # Building the code
14+ # Development Setup
1515
16- ## Package Dependencies
17- See the contents of the file ` python/require.txt ` to see the Python package dependencies.
18- Dependent packages are not installed automatically by the ` dbldatagen ` package.
16+ ## Python Compatibility
1917
20- ## Python compatibility
18+ The code supports Python 3.10+ and has been tested with Python 3.10 and later.
2119
22- The code has been tested with Python 3.9.21 and later.
20+ ## Quick Start
2321
24- ## Checking your code for common issues
22+ ``` bash
23+ # Install development dependencies
24+ make dev
2525
26- Run ` make dev-lint ` from the project root directory to run various code style checks.
27- These are based on the use of ` prospector ` , ` pylint ` and related tools.
26+ # Format and lint code
27+ make fmt # Format with ruff and fix issues
28+ make lint # Check code quality
2829
29- ## Setting up your build environment
30- Run ` make buildenv ` from the root of the project directory to setup a ` pipenv ` based build environment.
30+ # Run tests
31+ make test # Run tests
3132
32- Run ` make create-dev-env ` from the root of the project directory to
33- set up a conda based virtualized Python build environment in the project directory.
34-
35- You can use alternative build virtualization environments or simply install the requirements
36- directly in your environment.
37-
38-
39- ## Build steps
33+ # Build package
34+ make build # Build with modern build system
35+ ```
4036
41- Our recommended mechanism for building the code is to use a ` conda ` or ` pipenv ` based development process.
37+ ## Development Tools
4238
43- But it can be built with any Python virtualization environment .
39+ All development tools are configured in ` pyproject.toml ` .
4440
45- ### Spark dependencies
46- The builds have been tested against Apache Spark 3.4.1.
47- The Databricks runtimes use the Azul Zulu version of OpenJDK 8 and we have used these in local testing.
48- These are not installed automatically by the build process, so you will need to install them separately.
41+ ## Dependencies
4942
50- ### Building with Conda
51- To build with ` conda ` , perform the following commands:
52- - ` make create-dev-env ` from the main project directory to create your conda environment, if using
53- - activate the conda environment - e.g ` conda activate dbl_testdatagenerator `
54- - install the necessary dependencies in your conda environment via ` make install-dev-dependencies `
55-
56- - use the following to build and run the tests with a coverage report
57- - Run ` make dev-test-with-html-report ` from the main project directory.
43+ All dependencies are defined in ` pyproject.toml ` :
5844
59- - Use the following command to make the distributable:
60- - Run ` make dev-dist ` from the main project directory
61- - The resulting wheel file will be placed in the ` dist ` subdirectory
62-
63- ### Building with Pipenv
64- To build with ` pipenv ` , perform the following commands:
65- - ` make buildenv ` from the main project directory to create your conda environment, if using
66- - install the necessary dependencies in your conda environment via ` make install-dev-dependencies `
67-
68- - use the following to build and run the tests with a coverage report
69- - Run ` make test-with-html-report ` from the main project directory.
45+ - ` [project.dependencies] ` lists dependencies necessary to run the ` dbldatagen ` library
46+ - ` [tool.hatch.envs.default] ` lists the default environment necessary to develop, test, and build the ` dbldatagen ` library
7047
71- - Use the following command to make the distributable:
72- - Run ` make dist ` from the main project directory
73- - The resulting wheel file will be placed in the ` dist ` subdirectory
48+ ## Spark Dependencies
7449
75- The resulting build has been tested against Spark 3.4.1
50+ The builds have been tested against Spark 3.4.1+. This requires OpenJDK 1.8.56 or later version of Java 8.
51+ The Databricks runtimes use the Azul Zulu version of OpenJDK 8.
52+ These are not installed automatically by the build process.
7653
7754## Creating the HTML documentation
7855
@@ -82,7 +59,10 @@ The main html document will be in the file (relative to the root of the build di
8259 ` ./docs/docs/build/html/index.html `
8360
8461## Building the Python wheel
85- Run ` make clean dist ` from the main project directory.
62+
63+ ``` bash
64+ make build # Clean and build the package
65+ ```
8666
8767# Testing
8868
@@ -102,22 +82,15 @@ spark = dg.SparkSingleton.getLocalInstance("<name to flag spark instance>")
10282
10383The name used to flag the spark instance should be the test module or test class name.
10484
105- ## Running unit / integration tests
106-
107- If using an environment with multiple Python versions, make sure to use virtual env or
108- similar to pick up correct python versions. The make target ` create `
109-
110- If necessary, set ` PYSPARK_PYTHON ` and ` PYSPARK_DRIVER_PYTHON ` to point to correct versions of Python.
111-
112- To run the tests using a ` conda ` environment:
113- - Run ` make dev-test ` from the main project directory to run the unit tests.
85+ ## Running Tests
11486
115- - Run ` make dev-test-with-html-report ` to generate test coverage report in ` htmlcov/inxdex.html `
87+ ``` bash
88+ # Run all tests
89+ make test
11690
117- To run the tests using a ` pipenv ` environment:
118- - Run ` make test ` from the main project directory to run the unit tests.
91+ If using an environment with multiple Python versions, make sure to use virtual env or similar to pick up correct python versions.
11992
120- - Run ` make test-with-html-report ` to generate test coverage report in ` htmlcov/inxdex.html `
93+ If necessary, set ` PYSPARK_PYTHON ` and ` PYSPARK_DRIVER_PYTHON ` to point to correct versions of Python.
12194
12295# Using the Databricks Labs data generator
12396The recommended method for installation is to install from the PyPi package
@@ -147,30 +120,44 @@ For example, the following code downloads the release V0.2.1
147120
148121> ' %pip install https://github.com/databrickslabs/dbldatagen/releases/download/v021/dbldatagen-0.2.1-py3-none-any.whl'
149122
150- # Coding Style
123+ # Code Quality and Style
124+
125+ # # Automated Formatting
126+
127+ Code can be automatically formatted and linted with the following commands:
128+
129+ ` ` ` bash
130+ # Format code and fix issues automatically
131+ make fmt
132+
133+ # Check code quality without making changes
134+ make lint
135+ ` ` `
151136
152- The code follows the Pyspark coding conventions.
137+ # # Coding Conventions
153138
154- Basically it follows the Python PEP8 coding conventions - but method and argument names used mixed case starting
155- with a lower case letter rather than underscores following Pyspark coding conventions.
139+ The code follows PySpark coding conventions:
140+ - Python PEP8 standards with some PySpark-specific adaptations
141+ - Method and argument names use mixed case starting with lowercase (following PySpark conventions)
142+ - Line length limit of 120 characters
156143
157- See https://legacy .python.org/dev/peps/ pep-0008/
144+ See the [Python PEP8 Guide]( https://peps .python.org/pep-0008/) for general Python style guidelines.
158145
159146# Github expectations
160- When running the unit tests on Github , the environment should use the same environment as the latest Databricks
161- runtime latest LTS release. While compatibility is preserved on LTS releases from Databricks runtime 13.3 LTS onwards,
147+ When running the unit tests on GitHub , the environment should use the same environment as the latest Databricks
148+ runtime latest LTS release. While compatibility is preserved on LTS releases from Databricks runtime 13.3 onwards,
162149unit tests will be run on the environment corresponding to the latest LTS release.
163150
164151Libraries will use the same versions as the earliest supported LTS release - currently 13.3 LTS
165152
166153This means for the current build:
167154
168- - Use of Ubuntu 22.04.2 LTS for the test runner
155+ - Use of Ubuntu 22.04 for the test runner
169156- Use of Java 8
170157- Use of Python 3.10.12 when testing / building the image
171158
172159See the following resources for more information
173160= https://docs.databricks.com/en/release-notes/runtime/15.4lts.html
174- - https://docs.databricks.com/aws/ en/release-notes/runtime/13 .3lts
161+ - https://docs.databricks.com/en/release-notes/runtime/11 .3lts.html
175162- https://github.com/actions/runner-images/issues/10636
176163
0 commit comments