Update documentation for feature store

guptadivyank · guptadivyank · commit dd5734c73d67 · 2023-07-10T14:09:19.000+05:30
diff --git a/ads/feature_store/docs/source/dataset.rst b/ads/feature_store/docs/source/dataset.rst
@@ -124,50 +124,6 @@ With a Dataset instance, we can get the last dataset job details using ``get_las
   df = dataset_job.get_validation_output().to_dataframe()
   df.show()
 
-
-Save expectation entity
-=======================
-
-With a Dataset instance, we can save the expectation entity using ``save_expectation()``
-
-.. note::
-
-  Great Expectations is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
-
-.. image:: figures/validation.png
-
-
-The ``.save_expectation()`` method takes the following optional parameter:
-
-- ``expectation_suite: ExpectationSuite``. Expectation suite of great expectation
-- ``expectation_type: ExpectationType``. Type of expectation
-        - ``ExpectationType.STRICT``: Fail the job if expectation not met
-        - ``ExpectationType.LENIENT``: Pass the job even if expectation not met
-
-.. code-block:: python3
-
-  dataset.save_expectation(expectation_suite, expectation_type="STRICT")
-
-
-Statistics Results
-==================
-You can call the ``get_statistics()`` method of the Dataset instance to fetch feature statistics results of a dataset job.
-
-.. note::
-
-  PyDeequ is a Python API for Deequ, a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
-
-
-The ``.get_statistics()`` method takes the following optional parameter:
-
-- ``job_id: string``. Id of dataset job
-
-.. code-block:: python3
-
-  # Fetch stats results for a dataset job
-  df = dataset.get_statistics(job_id).to_pandas()
-
-
 Get features
 ============
 You can call the ``get_features_dataframe()`` method of the Dataset instance to fetch features in a dataset.
diff --git a/ads/feature_store/docs/source/feature_group.rst b/ads/feature_store/docs/source/feature_group.rst
@@ -150,6 +150,7 @@ Feature store provides an API similar to Pandas to join feature groups together
                 .join(feature_group_c.select(), left_on=['b_1'], right_on=['c_1'])
   query.show(5)
 
+<<<<<<< Updated upstream
 Save expectation entity
 =======================
 With a ``FeatureGroup`` instance, You can save the expectation details using ``with_expectation_suite()`` with parameters
@@ -202,6 +203,8 @@ You can call the ``get_statistics()`` method of the FeatureGroup instance to fet
   df = feature_group.get_statistics(job_id).to_pandas()
 
 .. image:: figures/stats_1.png
+=======
+>>>>>>> Stashed changes
 
 Get last feature group job
 ==========================
diff --git a/ads/feature_store/docs/source/feature_validation.rst b/ads/feature_store/docs/source/feature_validation.rst
@@ -0,0 +1,25 @@
+Feature Validation
+*************
+
+Save expectation entity
+=======================
+With a ``FeatureGroup`` or ``Dataset`` instance, we can save the expectation entity using ``save_expectation()``
+
+.. note::
+
+  Great Expectations is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
+
+.. image:: figures/validation.png
+
+The ``.save_expectation()`` method takes the following optional parameter:
+
+- ``expectation: Expectation``. Expectation of great expectation
+- ``expectation_type: ExpectationType``. Type of expectation
+        - ``ExpectationType.STRICT``: Fail the job if expectation not met
+        - ``ExpectationType.LENIENT``: Pass the job even if expectation not met
+
+.. code-block:: python3
+
+  feature_group.save_expectation(expectation_suite, expectation_type="STRICT")
+  dataset.save_expectation(expectation_suite, expectation_type="STRICT")
+
diff --git a/ads/feature_store/docs/source/figures/stats_1.png b/ads/feature_store/docs/source/figures/stats_1.png
diff --git a/ads/feature_store/docs/source/index.rst b/ads/feature_store/docs/source/index.rst
@@ -16,6 +16,8 @@ Welcome to oci-feature-store's documentation!
     feature_group_job
     dataset
     dataset_job
+    statistics
+    feature_validation
     demo
     notebook
     release_notes
diff --git a/ads/feature_store/docs/source/statistics.rst b/ads/feature_store/docs/source/statistics.rst
@@ -0,0 +1,78 @@
+Statistics
+*************
+
+Feature Store provides functionality to compute statistics for feature groups & datasets and persist them along with the metadata. These statistics can help you
+to derive insights about the data quality.
+
+.. note::
+
+  Feature Store utilizes MLM Insights which is a Python API that helps evaluate & monitor data for entirety of ML Observability lifecycle. It performs data summarization which reduces a dataset into a set of descriptive statistics.
+
+
+Statistics Configuration
+========================
+Computation of statistical metrics happens by default for all the features but you can configure it using ``StatisticsConfig`` object. This object can be passed at the creation of
+feature group or dataset or it can be later updated as well.
+
+.. code-block:: python3
+
+  # Define statistics configuration for selected features
+  stats_config = StatisticsConfig().with_is_enabled(True).with_columns(["column1", "column2"])
+
+
+This can be used with feature group instance.
+
+.. code-block:: python3
+
+  # Fetch stats results for a feature group job
+  from ads.feature_store.feature_group import FeatureGroup
+
+  feature_group_resource = (
+    FeatureGroup()
+    .with_feature_store_id(feature_store.id)
+    .with_primary_keys(["<key>"])
+    .with_name("<name>")
+    .with_entity_id(entity.id)
+    .with_compartment_id(<compartment_id>)
+    .with_schema_details_from_dataframe(<dataframe>)
+    .with_statistics_config(stats_config)
+
+Similarly for dataset instance.
+
+.. code-block:: python3
+
+  from ads.feature_store.dataset import Dataset
+
+  dataset = (
+        Dataset
+        .with_name("<dataset_name>")
+        .with_entity_id(<entity_id>)
+        .with_feature_store_id("<feature_store_id>")
+        .with_description("<dataset_description>")
+        .with_compartment_id("<compartment_id>")
+        .with_dataset_ingestion_mode(DatasetIngestionMode.SQL)
+        .with_query('SELECT col FROM <entity_id>.<feature_group_name>')
+        .with_statistics_config(stats_config)
+  )
+
+Statistics Results
+==================
+You can call the ``get_statistics()`` method of the FeatureGroup or Dataset instance to fetch validation results for a specific ingestion job.
+
+The ``get_statistics()`` method takes the following optional parameter:
+
+- ``job_id: string``. Id of feature group/dataset job
+
+.. code-block:: python3
+
+  # Fetch stats results for a feature group job
+  df = feature_group.get_statistics(job_id).to_pandas()
+
+similarly for dataset instance
+
+.. code-block:: python3
+
+  # Fetch stats results for a dataset job
+  df = dataset.get_statistics(job_id).to_pandas()
+
+.. image:: figures/stats_1.png
diff --git a/ads/feature_store/docs/source/terraform.rst b/ads/feature_store/docs/source/terraform.rst
@@ -6,6 +6,10 @@ Oracle feature store is a stack based solution that is deployed in the customer
 Customer can stand up the service with infrastructure in their own tenancy. The service consists of API in customer
 tenancy using resource manager.
 
+.. note::
+
+    Please contact #oci-feature-store_early-preview for getting your tenancy whitelisted for early access of feature store.
+
 Below is the terraform stack deployment diagram of the feature store resources.
 
 .. figure:: figures/feature_store_deployment.png