You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feature store allows you to define expectations on data being materialized into feature group instance. With a ``FeatureGroup`` instance, we can save the expectation entity using ``save_expectation()``
130
130
131
-
With a Dataset instance, we can save the expectation entity using ``save_expectation()``
132
-
133
-
.. note::
134
-
135
-
Great Expectations is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
136
131
137
132
.. image:: figures/validation.png
138
133
139
-
140
134
The ``.save_expectation()`` method takes the following optional parameter:
141
135
142
-
- ``expectation_suite: ExpectationSuite``. Expectation suite of great expectation
136
+
- ``expectation: Expectation``. Expectation of great expectation
143
137
- ``expectation_type: ExpectationType``. Type of expectation
144
138
- ``ExpectationType.STRICT``: Fail the job if expectation not met
145
139
- ``ExpectationType.LENIENT``: Pass the job even if expectation not met
During the materialization feature store performs computation of statistical metrics for all the features by default. This can be configured using ``StatisticsConfig`` object which can be passed at the creation of
152
+
dataset or it can be updated later as well.
153
+
154
+
.. code-block:: python3
155
+
156
+
# Define statistics configuration for selected features
You can call the ``get_statistics()`` method of the Dataset instance to fetch feature statistics results of a dataset job.
162
+
.. code-block:: python3
155
163
156
-
.. note::
164
+
from ads.feature_store.dataset import Dataset
157
165
158
-
PyDeequ is a Python API for Deequ, a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Copy file name to clipboardExpand all lines: ads/feature_store/docs/source/feature_group.rst
+45-30Lines changed: 45 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -152,49 +152,60 @@ Feature store provides an API similar to Pandas to join feature groups together
152
152
153
153
Save expectation entity
154
154
=======================
155
-
With a ``FeatureGroup`` instance, You can save the expectation details using ``with_expectation_suite()`` with parameters
155
+
Feature store allows you to define expectations on data being materialized into feature group instance. With a ``FeatureGroup`` instance, we can save the expectation entity using ``save_expectation()``
156
156
157
-
- ``expectation_suite: ExpectationSuite``. ExpectationSuit of great expectation
157
+
158
+
.. image:: figures/validation.png
159
+
160
+
The ``.save_expectation()`` method takes the following optional parameter:
161
+
162
+
- ``expectation: Expectation``. Expectation of great expectation
158
163
- ``expectation_type: ExpectationType``. Type of expectation
159
164
- ``ExpectationType.STRICT``: Fail the job if expectation not met
160
165
- ``ExpectationType.LENIENT``: Pass the job even if expectation not met
161
166
162
-
.. note::
167
+
.. code-block:: python3
163
168
164
-
Great Expectations is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
During the materialization feature store performs computation of statistical metrics for all the features by default. This can be configured using ``StatisticsConfig`` object which can be passed at the creation of
Feature validation is the process of checking the quality and accuracy of the features used in a machine learning model. This is important because features that are not accurate or reliable can lead to poor model performance.
7
+
Feature store allows you to define expectation on the data which is being materialized into feature group and dataset. This is achieved using open source library Great Expectations.
8
+
9
+
.. note::
10
+
`Great Expectations <https://docs.greatexpectations.io/docs/>`_ is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
11
+
12
+
13
+
Expectations
14
+
============
15
+
An Expectation is a verifiable assertion about your data. You can define expectation as below:
16
+
17
+
.. code-block:: python3
18
+
19
+
from great_expectations.core.expectation_configuration import ExpectationConfiguration
Feature Store provides functionality to compute statistics for feature groups as well as datasets and persist them along with the metadata. These statistics can help you
7
+
to derive insights about the data quality. These statistical metrics are computed during materialisation time and persisting with other metadata.
8
+
9
+
.. note::
10
+
11
+
Feature Store utilizes MLM Insights which is a Python API that helps evaluate and monitor data for entirety of ML Observability lifecycle. It performs data summarization which reduces a dataset into a set of descriptive statistics.
12
+
13
+
The statistical metrics that are computed by feature store depend on the feature type.
0 commit comments