Skip to content

Commit 9e2ab5b

Browse files
committed
review comment changes
1 parent dd5734c commit 9e2ab5b

File tree

6 files changed

+113
-125
lines changed

6 files changed

+113
-125
lines changed

ads/feature_store/docs/source/dataset.rst

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,71 @@ With a Dataset instance, we can get the last dataset job details using ``get_las
124124
df = dataset_job.get_validation_output().to_dataframe()
125125
df.show()
126126
127+
Save expectation entity
128+
=======================
129+
Feature store allows you to define expectations on data being materialized into feature group instance. With a ``FeatureGroup`` instance, we can save the expectation entity using ``save_expectation()``
130+
131+
132+
.. image:: figures/validation.png
133+
134+
The ``.save_expectation()`` method takes the following optional parameter:
135+
136+
- ``expectation: Expectation``. Expectation of great expectation
137+
- ``expectation_type: ExpectationType``. Type of expectation
138+
- ``ExpectationType.STRICT``: Fail the job if expectation not met
139+
- ``ExpectationType.LENIENT``: Pass the job even if expectation not met
140+
141+
.. code-block:: python3
142+
143+
feature_group.save_expectation(expectation_suite, expectation_type="STRICT")
144+
145+
For more details on expectation please refer :ref:`Feature Validation`
146+
147+
Statistics Computation
148+
========================
149+
During the materialization feature store performs computation of statistical metrics for all the features by default. This can be configured using ``StatisticsConfig`` object which can be passed at the creation of
150+
dataset or it can be updated later as well.
151+
152+
.. code-block:: python3
153+
154+
# Define statistics configuration for selected features
155+
stats_config = StatisticsConfig().with_is_enabled(True).with_columns(["column1", "column2"])
156+
157+
158+
This can be used with dataset instance.
159+
160+
.. code-block:: python3
161+
162+
from ads.feature_store.dataset import Dataset
163+
164+
dataset = (
165+
Dataset
166+
.with_name("<dataset_name>")
167+
.with_entity_id(<entity_id>)
168+
.with_feature_store_id("<feature_store_id>")
169+
.with_description("<dataset_description>")
170+
.with_compartment_id("<compartment_id>")
171+
.with_dataset_ingestion_mode(DatasetIngestionMode.SQL)
172+
.with_query('SELECT col FROM <entity_id>.<feature_group_name>')
173+
.with_statistics_config(stats_config)
174+
)
175+
176+
You can call the ``get_statistics()`` method of the dataset to fetch metrics for a specific ingestion job.
177+
178+
The ``get_statistics()`` method takes the following optional parameter:
179+
180+
- ``job_id: string``. Id of feature group job
181+
182+
.. code-block:: python3
183+
184+
# Fetch stats results for a dataset job
185+
df = dataset.get_statistics(job_id).to_pandas()
186+
187+
.. image:: figures/stats_1.png
188+
189+
For more details on statistics computation please refer :ref:`Statistics`
190+
191+
127192
Get features
128193
============
129194
You can call the ``get_features_dataframe()`` method of the Dataset instance to fetch features in a dataset.

ads/feature_store/docs/source/feature_group.rst

Lines changed: 41 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -150,61 +150,69 @@ Feature store provides an API similar to Pandas to join feature groups together
150150
.join(feature_group_c.select(), left_on=['b_1'], right_on=['c_1'])
151151
query.show(5)
152152
153-
<<<<<<< Updated upstream
154153
Save expectation entity
155154
=======================
156-
With a ``FeatureGroup`` instance, You can save the expectation details using ``with_expectation_suite()`` with parameters
155+
Feature store allows you to define expectations on data being materialized into feature group instance. With a ``FeatureGroup`` instance, we can save the expectation entity using ``save_expectation()``
157156

158-
- ``expectation_suite: ExpectationSuite``. ExpectationSuit of great expectation
157+
158+
.. image:: figures/validation.png
159+
160+
The ``.save_expectation()`` method takes the following optional parameter:
161+
162+
- ``expectation: Expectation``. Expectation of great expectation
159163
- ``expectation_type: ExpectationType``. Type of expectation
160164
- ``ExpectationType.STRICT``: Fail the job if expectation not met
161165
- ``ExpectationType.LENIENT``: Pass the job even if expectation not met
162166

163-
.. note::
167+
.. code-block:: python3
164168
165-
Great Expectations is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
169+
feature_group.save_expectation(expectation_suite, expectation_type="STRICT")
166170
167-
.. image:: figures/validation.png
171+
For more details on expectation please refer :ref:`Feature Validation`
172+
173+
174+
Statistics Computation
175+
========================
176+
During the materialization feature store performs computation of statistical metrics for all the features by default. This can be configured using ``StatisticsConfig`` object which can be passed at the creation of
177+
feature group or it can be updated later as well.
168178

169179
.. code-block:: python3
170180
171-
expectation_suite = ExpectationSuite(
172-
expectation_suite_name="expectation_suite_name"
173-
)
174-
expectation_suite.add_expectation(
175-
ExpectationConfiguration(
176-
expectation_type="expect_column_values_to_not_be_null",
177-
kwargs={"column": "<column>"},
178-
)
181+
# Define statistics configuration for selected features
182+
stats_config = StatisticsConfig().with_is_enabled(True).with_columns(["column1", "column2"])
179183
180-
feature_group_resource = (
181-
FeatureGroup()
182-
.with_feature_store_id(feature_store.id)
183-
.with_primary_keys(["<key>"])
184-
.with_name("<name>")
185-
.with_entity_id(entity.id)
186-
.with_compartment_id(<compartment_id>)
187-
.with_schema_details_from_dataframe(<datframe>)
188-
.with_expectation_suite(
189-
expectation_suite=expectation_suite,
190-
expectation_type=ExpectationType.STRICT,
191-
)
192-
)
193184
194-
You can call the ``get_validation_output()`` method of the FeatureGroup instance to fetch validation results for a specific ingestion job.
185+
This can be used with feature group instance.
186+
187+
.. code-block:: python3
188+
189+
# Fetch stats results for a feature group job
190+
from ads.feature_store.feature_group import FeatureGroup
195191
196-
Statistics Results
197-
==================
198-
You can call the ``get_statistics()`` method of the FeatureGroup instance to fetch statistics for a specific ingestion job.
192+
feature_group_resource = (
193+
FeatureGroup()
194+
.with_feature_store_id(feature_store.id)
195+
.with_primary_keys(["<key>"])
196+
.with_name("<name>")
197+
.with_entity_id(entity.id)
198+
.with_compartment_id(<compartment_id>)
199+
.with_schema_details_from_dataframe(<dataframe>)
200+
.with_statistics_config(stats_config)
201+
202+
You can call the ``get_statistics()`` method of the feature group to fetch metrics for a specific ingestion job.
203+
204+
The ``get_statistics()`` method takes the following optional parameter:
205+
206+
- ``job_id: string``. Id of feature group job
199207

200208
.. code-block:: python3
201209
202210
# Fetch stats results for a feature group job
203211
df = feature_group.get_statistics(job_id).to_pandas()
204212
205213
.. image:: figures/stats_1.png
206-
=======
207-
>>>>>>> Stashed changes
214+
215+
For more details on statistics computation please refer :ref:`Statistics`
208216

209217
Get last feature group job
210218
==========================
Lines changed: 3 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,9 @@
1+
.. _Feature Validation:
2+
13
Feature Validation
24
*************
35

4-
Save expectation entity
5-
=======================
6-
With a ``FeatureGroup`` or ``Dataset`` instance, we can save the expectation entity using ``save_expectation()``
7-
86
.. note::
7+
`Great Expectations <https://docs.greatexpectations.io/docs/>_` is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
98

10-
Great Expectations is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
11-
12-
.. image:: figures/validation.png
13-
14-
The ``.save_expectation()`` method takes the following optional parameter:
15-
16-
- ``expectation: Expectation``. Expectation of great expectation
17-
- ``expectation_type: ExpectationType``. Type of expectation
18-
- ``ExpectationType.STRICT``: Fail the job if expectation not met
19-
- ``ExpectationType.LENIENT``: Pass the job even if expectation not met
20-
21-
.. code-block:: python3
22-
23-
feature_group.save_expectation(expectation_suite, expectation_type="STRICT")
24-
dataset.save_expectation(expectation_suite, expectation_type="STRICT")
259

ads/feature_store/docs/source/overview.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,5 @@ Oracle feature store is a stack based solution that is deployed in the customer
3333
- .. image:: https://img.shields.io/badge/delta-2.0.1-blue?style=for-the-badge&logo=pypi&logoColor=white
3434
* - pyspark
3535
- .. image:: https://img.shields.io/badge/pyspark-3.2.1-blue?style=for-the-badge&logo=pypi&logoColor=white
36+
37+
Please contact #oci-feature-store_early-preview for getting your tenancy whitelisted for early access of feature store.
Lines changed: 2 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _Statistics:
2+
13
Statistics
24
*************
35

@@ -7,72 +9,3 @@ to derive insights about the data quality.
79
.. note::
810

911
Feature Store utilizes MLM Insights which is a Python API that helps evaluate & monitor data for entirety of ML Observability lifecycle. It performs data summarization which reduces a dataset into a set of descriptive statistics.
10-
11-
12-
Statistics Configuration
13-
========================
14-
Computation of statistical metrics happens by default for all the features but you can configure it using ``StatisticsConfig`` object. This object can be passed at the creation of
15-
feature group or dataset or it can be later updated as well.
16-
17-
.. code-block:: python3
18-
19-
# Define statistics configuration for selected features
20-
stats_config = StatisticsConfig().with_is_enabled(True).with_columns(["column1", "column2"])
21-
22-
23-
This can be used with feature group instance.
24-
25-
.. code-block:: python3
26-
27-
# Fetch stats results for a feature group job
28-
from ads.feature_store.feature_group import FeatureGroup
29-
30-
feature_group_resource = (
31-
FeatureGroup()
32-
.with_feature_store_id(feature_store.id)
33-
.with_primary_keys(["<key>"])
34-
.with_name("<name>")
35-
.with_entity_id(entity.id)
36-
.with_compartment_id(<compartment_id>)
37-
.with_schema_details_from_dataframe(<dataframe>)
38-
.with_statistics_config(stats_config)
39-
40-
Similarly for dataset instance.
41-
42-
.. code-block:: python3
43-
44-
from ads.feature_store.dataset import Dataset
45-
46-
dataset = (
47-
Dataset
48-
.with_name("<dataset_name>")
49-
.with_entity_id(<entity_id>)
50-
.with_feature_store_id("<feature_store_id>")
51-
.with_description("<dataset_description>")
52-
.with_compartment_id("<compartment_id>")
53-
.with_dataset_ingestion_mode(DatasetIngestionMode.SQL)
54-
.with_query('SELECT col FROM <entity_id>.<feature_group_name>')
55-
.with_statistics_config(stats_config)
56-
)
57-
58-
Statistics Results
59-
==================
60-
You can call the ``get_statistics()`` method of the FeatureGroup or Dataset instance to fetch validation results for a specific ingestion job.
61-
62-
The ``get_statistics()`` method takes the following optional parameter:
63-
64-
- ``job_id: string``. Id of feature group/dataset job
65-
66-
.. code-block:: python3
67-
68-
# Fetch stats results for a feature group job
69-
df = feature_group.get_statistics(job_id).to_pandas()
70-
71-
similarly for dataset instance
72-
73-
.. code-block:: python3
74-
75-
# Fetch stats results for a dataset job
76-
df = dataset.get_statistics(job_id).to_pandas()
77-
78-
.. image:: figures/stats_1.png

ads/feature_store/docs/source/terraform.rst

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,6 @@ Oracle feature store is a stack based solution that is deployed in the customer
66
Customer can stand up the service with infrastructure in their own tenancy. The service consists of API in customer
77
tenancy using resource manager.
88

9-
.. note::
10-
11-
Please contact #oci-feature-store_early-preview for getting your tenancy whitelisted for early access of feature store.
12-
139
Below is the terraform stack deployment diagram of the feature store resources.
1410

1511
.. figure:: figures/feature_store_deployment.png

0 commit comments

Comments
 (0)