You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feature store allows you to define expectations on data being materialized into dataset instance.With a ``Dataset`` instance, You can save the expectation details using ``with_expectation_suite()`` with parameters
130
127
131
-
With a Dataset instance, we can save the expectation entity using ``save_expectation()``
128
+
- ``expectation_suite: ExpectationSuite``. ExpectationSuit of great expectation
129
+
- ``expectation_type: ExpectationType``. Type of expectation
130
+
- ``ExpectationType.STRICT``: Fail the job if expectation not met
131
+
- ``ExpectationType.LENIENT``: Pass the job even if expectation not met
132
132
133
133
.. note::
134
134
135
135
Great Expectations is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
136
136
137
137
.. image:: figures/validation.png
138
138
139
+
.. code-block:: python3
139
140
140
-
The ``.save_expectation()`` method takes the following optional parameter:
.with_query(f"SELECT * FROM `{entity_id}`.{feature_group_name}")
158
+
.with_expectation_suite(
159
+
expectation_suite=expectation_suite,
160
+
expectation_type=ExpectationType.STRICT,
161
+
)
162
+
)
163
+
164
+
You can call the ``get_validation_output()`` method of the Dataset instance to fetch validation results for a specific ingestion job.
165
+
The ``get_validation_output()`` method takes the following optional parameter:
141
166
142
-
- ``expectation_suite: ExpectationSuite``. Expectation suite of great expectation
143
-
- ``expectation_type: ExpectationType``. Type of expectation
144
-
- ``ExpectationType.STRICT``: Fail the job if expectation not met
145
-
- ``ExpectationType.LENIENT``: Pass the job even if expectation not met
167
+
- ``job_id: string``. Id of dataset job
168
+
169
+
``get_validation_output().to_pandas()`` will output the validation results for each expectation as pandas dataframe
170
+
171
+
.. image:: figures/dataset_validation_results.png
172
+
173
+
``get_validation_output().to_summary()`` will output the overall summary of validation as pandas dataframe.
174
+
175
+
.. image:: figures/dataset_validation_summary.png
176
+
177
+
.. seealso::
178
+
179
+
:ref:`Feature Validation`
180
+
181
+
Statistics Computation
182
+
========================
183
+
During the materialization feature store performs computation of statistical metrics for all the features by default. This can be configured using ``StatisticsConfig`` object which can be passed at the creation of
You can call the ``get_statistics()`` method of the Dataset instance to fetch feature statistics results of a dataset job.
192
+
This can be used with dataset instance.
155
193
156
-
.. note::
194
+
.. code-block:: python3
157
195
158
-
PyDeequ is a Python API for Deequ, a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Copy file name to clipboardExpand all lines: ads/feature_store/docs/source/feature_group.rst
+51-3Lines changed: 51 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -191,10 +191,54 @@ With a ``FeatureGroup`` instance, You can save the expectation details using ``w
191
191
)
192
192
193
193
You can call the ``get_validation_output()`` method of the FeatureGroup instance to fetch validation results for a specific ingestion job.
194
+
The ``get_validation_output()`` method takes the following optional parameter:
194
195
195
-
Statistics Results
196
-
==================
197
-
You can call the ``get_statistics()`` method of the FeatureGroup instance to fetch statistics for a specific ingestion job.
196
+
- ``job_id: string``. Id of feature group job
197
+
``get_validation_output().to_pandas()`` will output the validation results for each expectation as pandas dataframe
198
+
199
+
.. image:: figures/validation_results.png
200
+
201
+
``get_validation_output().to_summary()`` will output the overall summary of validation as pandas dataframe.
202
+
203
+
.. image:: figures/validation_summary.png
204
+
.. seealso::
205
+
206
+
:ref:`Feature Validation`
207
+
208
+
209
+
Statistics Computation
210
+
========================
211
+
During the materialization feature store performs computation of statistical metrics for all the features by default. This can be configured using ``StatisticsConfig`` object which can be passed at the creation of
212
+
feature group or it can be updated later as well.
213
+
214
+
.. code-block:: python3
215
+
216
+
# Define statistics configuration for selected features
Feature validation is the process of checking the quality and accuracy of the features used in a machine learning model. This is important because features that are not accurate or reliable can lead to poor model performance.
7
+
Feature store allows you to define expectation on the data which is being materialized into feature group and dataset. This is achieved using open source library Great Expectations.
8
+
9
+
.. note::
10
+
`Great Expectations <https://docs.greatexpectations.io/docs/0.15.50/>`_ is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
11
+
12
+
13
+
Expectations
14
+
============
15
+
An Expectation is a verifiable assertion about your data. You can define expectation as below:
16
+
17
+
.. code-block:: python3
18
+
19
+
from great_expectations.core.expectation_configuration import ExpectationConfiguration
0 commit comments