Skip to content

Commit dd5734c

Browse files
committed
Update documentation for feature store
1 parent e9e76b0 commit dd5734c

File tree

7 files changed

+112
-44
lines changed

7 files changed

+112
-44
lines changed

ads/feature_store/docs/source/dataset.rst

Lines changed: 0 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -124,50 +124,6 @@ With a Dataset instance, we can get the last dataset job details using ``get_las
124124
df = dataset_job.get_validation_output().to_dataframe()
125125
df.show()
126126
127-
128-
Save expectation entity
129-
=======================
130-
131-
With a Dataset instance, we can save the expectation entity using ``save_expectation()``
132-
133-
.. note::
134-
135-
Great Expectations is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
136-
137-
.. image:: figures/validation.png
138-
139-
140-
The ``.save_expectation()`` method takes the following optional parameter:
141-
142-
- ``expectation_suite: ExpectationSuite``. Expectation suite of great expectation
143-
- ``expectation_type: ExpectationType``. Type of expectation
144-
- ``ExpectationType.STRICT``: Fail the job if expectation not met
145-
- ``ExpectationType.LENIENT``: Pass the job even if expectation not met
146-
147-
.. code-block:: python3
148-
149-
dataset.save_expectation(expectation_suite, expectation_type="STRICT")
150-
151-
152-
Statistics Results
153-
==================
154-
You can call the ``get_statistics()`` method of the Dataset instance to fetch feature statistics results of a dataset job.
155-
156-
.. note::
157-
158-
PyDeequ is a Python API for Deequ, a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
159-
160-
161-
The ``.get_statistics()`` method takes the following optional parameter:
162-
163-
- ``job_id: string``. Id of dataset job
164-
165-
.. code-block:: python3
166-
167-
# Fetch stats results for a dataset job
168-
df = dataset.get_statistics(job_id).to_pandas()
169-
170-
171127
Get features
172128
============
173129
You can call the ``get_features_dataframe()`` method of the Dataset instance to fetch features in a dataset.

ads/feature_store/docs/source/feature_group.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,7 @@ Feature store provides an API similar to Pandas to join feature groups together
150150
.join(feature_group_c.select(), left_on=['b_1'], right_on=['c_1'])
151151
query.show(5)
152152
153+
<<<<<<< Updated upstream
153154
Save expectation entity
154155
=======================
155156
With a ``FeatureGroup`` instance, You can save the expectation details using ``with_expectation_suite()`` with parameters
@@ -202,6 +203,8 @@ You can call the ``get_statistics()`` method of the FeatureGroup instance to fet
202203
df = feature_group.get_statistics(job_id).to_pandas()
203204
204205
.. image:: figures/stats_1.png
206+
=======
207+
>>>>>>> Stashed changes
205208

206209
Get last feature group job
207210
==========================
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
Feature Validation
2+
*************
3+
4+
Save expectation entity
5+
=======================
6+
With a ``FeatureGroup`` or ``Dataset`` instance, we can save the expectation entity using ``save_expectation()``
7+
8+
.. note::
9+
10+
Great Expectations is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
11+
12+
.. image:: figures/validation.png
13+
14+
The ``.save_expectation()`` method takes the following optional parameter:
15+
16+
- ``expectation: Expectation``. Expectation of great expectation
17+
- ``expectation_type: ExpectationType``. Type of expectation
18+
- ``ExpectationType.STRICT``: Fail the job if expectation not met
19+
- ``ExpectationType.LENIENT``: Pass the job even if expectation not met
20+
21+
.. code-block:: python3
22+
23+
feature_group.save_expectation(expectation_suite, expectation_type="STRICT")
24+
dataset.save_expectation(expectation_suite, expectation_type="STRICT")
25+
5.01 KB
Loading

ads/feature_store/docs/source/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ Welcome to oci-feature-store's documentation!
1616
feature_group_job
1717
dataset
1818
dataset_job
19+
statistics
20+
feature_validation
1921
demo
2022
notebook
2123
release_notes
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
Statistics
2+
*************
3+
4+
Feature Store provides functionality to compute statistics for feature groups & datasets and persist them along with the metadata. These statistics can help you
5+
to derive insights about the data quality.
6+
7+
.. note::
8+
9+
Feature Store utilizes MLM Insights which is a Python API that helps evaluate & monitor data for entirety of ML Observability lifecycle. It performs data summarization which reduces a dataset into a set of descriptive statistics.
10+
11+
12+
Statistics Configuration
13+
========================
14+
Computation of statistical metrics happens by default for all the features but you can configure it using ``StatisticsConfig`` object. This object can be passed at the creation of
15+
feature group or dataset or it can be later updated as well.
16+
17+
.. code-block:: python3
18+
19+
# Define statistics configuration for selected features
20+
stats_config = StatisticsConfig().with_is_enabled(True).with_columns(["column1", "column2"])
21+
22+
23+
This can be used with feature group instance.
24+
25+
.. code-block:: python3
26+
27+
# Fetch stats results for a feature group job
28+
from ads.feature_store.feature_group import FeatureGroup
29+
30+
feature_group_resource = (
31+
FeatureGroup()
32+
.with_feature_store_id(feature_store.id)
33+
.with_primary_keys(["<key>"])
34+
.with_name("<name>")
35+
.with_entity_id(entity.id)
36+
.with_compartment_id(<compartment_id>)
37+
.with_schema_details_from_dataframe(<dataframe>)
38+
.with_statistics_config(stats_config)
39+
40+
Similarly for dataset instance.
41+
42+
.. code-block:: python3
43+
44+
from ads.feature_store.dataset import Dataset
45+
46+
dataset = (
47+
Dataset
48+
.with_name("<dataset_name>")
49+
.with_entity_id(<entity_id>)
50+
.with_feature_store_id("<feature_store_id>")
51+
.with_description("<dataset_description>")
52+
.with_compartment_id("<compartment_id>")
53+
.with_dataset_ingestion_mode(DatasetIngestionMode.SQL)
54+
.with_query('SELECT col FROM <entity_id>.<feature_group_name>')
55+
.with_statistics_config(stats_config)
56+
)
57+
58+
Statistics Results
59+
==================
60+
You can call the ``get_statistics()`` method of the FeatureGroup or Dataset instance to fetch validation results for a specific ingestion job.
61+
62+
The ``get_statistics()`` method takes the following optional parameter:
63+
64+
- ``job_id: string``. Id of feature group/dataset job
65+
66+
.. code-block:: python3
67+
68+
# Fetch stats results for a feature group job
69+
df = feature_group.get_statistics(job_id).to_pandas()
70+
71+
similarly for dataset instance
72+
73+
.. code-block:: python3
74+
75+
# Fetch stats results for a dataset job
76+
df = dataset.get_statistics(job_id).to_pandas()
77+
78+
.. image:: figures/stats_1.png

ads/feature_store/docs/source/terraform.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ Oracle feature store is a stack based solution that is deployed in the customer
66
Customer can stand up the service with infrastructure in their own tenancy. The service consists of API in customer
77
tenancy using resource manager.
88

9+
.. note::
10+
11+
Please contact #oci-feature-store_early-preview for getting your tenancy whitelisted for early access of feature store.
12+
913
Below is the terraform stack deployment diagram of the feature store resources.
1014

1115
.. figure:: figures/feature_store_deployment.png

0 commit comments

Comments
 (0)