@@ -13,6 +13,7 @@ This includes:
1313- Whether the generated data set will be a streaming or batch data set
1414- How the column data should be generated and what the dependencies for each column are
1515- How random and psuedo-random data is generated
16+ - The structure for structured columns and JSON valued columns
1617
1718.. seealso ::
1819 See the following links for more details:
@@ -22,6 +23,7 @@ This includes:
2223 * Controlling how existing columns are generated - :data: `~dbldatagen.data_generator.DataGenerator.withColumnSpec `
2324 * Adding column generation specs in bulk - :data: `~dbldatagen.data_generator.DataGenerator.withColumnSpecs `
2425 * Options for column generation - :doc: `options_and_features `
26+ * Generating JSON and complex data - :doc: `generating_json_data `
2527
2628Column data is generated for all columns whether imported from a schema or explicitly added
2729to a data specification. However, column data can be omitted from the final output, allowing columns to be used
@@ -35,15 +37,16 @@ These control the data generation process.
3537
3638The data generation process itself is deferred until the data generation instance ``build `` method is executed.
3739
38- So until the ``build `` method is invoked, the data generation specification is in initialization mode.
40+ So until the :data: `~dbldatagen.data_generator.DataGenerator.build ` method is invoked, the data generation
41+ specification is in initialization mode.
3942
4043Once ``build `` has been invoked, the data generation instance holds state about the data set generated.
4144
4245While ``build `` can be invoked a subsequent time, making further modifications to the definition post build before
4346calling ``build `` again is not recommended. We recommend the use of the ``clone `` method to make a new data generation
4447specification similar to an existing one if further modifications are needed.
4548
46- See :data: `~dbldatagen.data_generator.DataGenerator.clone ` for further information.
49+ See the method :data: `~dbldatagen.data_generator.DataGenerator.clone ` for further information.
4750
4851Adding columns to a data generation spec
4952----------------------------------------
@@ -55,18 +58,21 @@ specification.
5558When building the data generation spec, the ``withSchema `` method may be used to add columns from an existing schema.
5659This does _not_ prevent the use of ``withColumn `` to add new columns.
5760
58- Use ``withColumn `` to define a new column. This method takes a parameter to specify the data type.
59- See :data: `~dbldatagen.data_generator.DataGenerator.withColumn `.
61+ | Use ``withColumn`` to define a new column. This method takes a parameter to specify the data type.
62+ | See the method :data:`~dbldatagen.data_generator.DataGenerator.withColumn` for further details .
6063
6164Use ``withColumnSpec `` to define how a column previously defined in a schema should be generated. This method does not
6265take a data type property, but uses the data type information defined in the schema.
63- See :data: `~dbldatagen.data_generator.DataGenerator.withColumnSpec `.
66+ See the method :data: `~dbldatagen.data_generator.DataGenerator.withColumnSpec ` for further details .
6467
65- Use ``withColumnSpecs `` to define how multiple columns imported from a schema should be generated.
66- As the pattern matching may inadvertently match an unintended column, it is permitted to override the specification
67- added through this method by a subsequent call to ``withColumnSpec `` to change the definition of how a specific column
68- should be generated
69- See :data: `~dbldatagen.data_generator.DataGenerator.withColumnSpecs `.
68+ | Use ``withColumnSpecs`` to define how multiple columns imported from a schema should be generated.
69+ As the pattern matching may inadvertently match an unintended column, it is permitted to override the specification
70+ added through this method by a subsequent call to ``withColumnSpec`` to change the definition of how a specific column
71+ should be generated.
72+ | See the method :data:`~dbldatagen.data_generator.DataGenerator.withColumnSpecs` for further details.
73+
74+ Use the method :data: `~dbldatagen.data_generator.DataGenerator.withStructColumn ` for simpler creation of struct and
75+ JSON valued columns.
7076
7177By default all columns are marked as being dependent on an internal ``id `` seed column.
7278Use the ``baseColumn `` attribute to mark a column as being dependent on another column or set of columns.
@@ -85,6 +91,7 @@ Use of the base column attribute has several effects:
8591
8692 If you need to generate a field with the same name as the seed column (by default `id `), you may override
8793 the default seed column name in the constructor of the data generation spec through the use of the
94+ ``seedColumnName `` parameter.
8895
8996
9097 Note that Spark SQL is case insensitive with respect to column names.
@@ -127,6 +134,12 @@ For example, the following code will generate rows with varying numbers of synth
127134
128135 df = ds.build()
129136
137+ | The helper method ``withStructColumn`` of the ``DataGenerator`` class enables simpler definition of structured
138+ and JSON valued columns.
139+ | See the documentation for the method :data:`~dbldatagen.data_generator.DataGenerator.withStructColumn` for
140+ further details.
141+
142+
130143The mechanics of column data generation
131144---------------------------------------
132145The data set is generated when the ``build `` method is invoked on the data generation instance.
@@ -168,3 +181,4 @@ This has several implications:
168181 However it does not reorder the building sequence if there is a reference to a column that will be built later in the
169182 SQL expression.
170183 To enforce the dependency, you must use the `baseColumn ` attribute to indicate the dependency.
184+
0 commit comments