Skip to content

Commit 5950ac1

Browse files
committed
add more doc content
1 parent 16c0cbc commit 5950ac1

File tree

5 files changed

+99
-12
lines changed

5 files changed

+99
-12
lines changed

ads/feature_store/docs/source/dataset.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,9 @@ The ``.save_expectation()`` method takes the following optional parameter:
142142
143143
feature_group.save_expectation(expectation_suite, expectation_type="STRICT")
144144
145-
For more details on expectation please refer :ref:`Feature Validation`
145+
.. seealso::
146+
147+
:ref:`Feature Validation`
146148

147149
Statistics Computation
148150
========================
@@ -177,7 +179,7 @@ You can call the ``get_statistics()`` method of the dataset to fetch metrics for
177179

178180
The ``get_statistics()`` method takes the following optional parameter:
179181

180-
- ``job_id: string``. Id of feature group job
182+
- ``job_id: string``. Id of dataset job
181183

182184
.. code-block:: python3
183185
@@ -186,7 +188,9 @@ The ``get_statistics()`` method takes the following optional parameter:
186188
187189
.. image:: figures/stats_1.png
188190

189-
For more details on statistics computation please refer :ref:`Statistics`
191+
.. seealso::
192+
193+
:ref:`Statistics`
190194

191195

192196
Get features

ads/feature_store/docs/source/feature_group.rst

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,9 @@ The ``.save_expectation()`` method takes the following optional parameter:
168168
169169
feature_group.save_expectation(expectation_suite, expectation_type="STRICT")
170170
171-
For more details on expectation please refer :ref:`Feature Validation`
171+
.. seealso::
172+
173+
:ref:`Feature Validation`
172174

173175

174176
Statistics Computation
@@ -212,7 +214,9 @@ The ``get_statistics()`` method takes the following optional parameter:
212214
213215
.. image:: figures/stats_1.png
214216

215-
For more details on statistics computation please refer :ref:`Statistics`
217+
.. seealso::
218+
219+
:ref:`Statistics`
216220

217221
Get last feature group job
218222
==========================

ads/feature_store/docs/source/feature_validation.rst

Lines changed: 46 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,52 @@
33
Feature Validation
44
*************
55

6+
Feature validation is the process of checking the quality and accuracy of the features used in a machine learning model. This is important because features that are not accurate or reliable can lead to poor model performance.
7+
Feature store allows you to define expectation on the data which is being materialized into feature group & dataset. This is achieved using open source library Great Expectations.
8+
69
.. note::
7-
`Great Expectations <https://docs.greatexpectations.io/docs/>_` is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
10+
`Great Expectations <https://docs.greatexpectations.io/docs/>`_ is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
11+
12+
13+
Expectations
14+
============
15+
An Expectation is a verifiable assertion about your data. You can define expectation as below:
16+
17+
.. code-block:: python3
18+
19+
from great_expectations.core.expectation_configuration import ExpectationConfiguration
20+
21+
# Create an Expectation
22+
expect_config = ExpectationConfiguration(
23+
# Name of expectation type being added
24+
expectation_type="expect_table_columns_to_match_ordered_list",
25+
# These are the arguments of the expectation
26+
# The keys allowed in the dictionary are Parameters and
27+
# Keyword Arguments of this Expectation Type
28+
kwargs={
29+
"column_list": [
30+
"column1",
31+
"column2",
32+
"column3",
33+
"column4",
34+
]
35+
},
36+
# This is how you can optionally add a comment about this expectation.
37+
meta={
38+
"notes": {
39+
"format": "markdown",
40+
"content": "details about this expectation. **Markdown** `Supported`",
41+
}
42+
},
43+
)
44+
45+
Expectations Suite
46+
============
47+
48+
Expectation Suite is a collection of verifiable assertions i.e. expectations about your data. You can define expectation suite as below:
849

50+
.. code-block:: python3
951
52+
# Create an Expectation Suite
53+
suite = context.add_expectation_suite(expectation_suite_name="example_suite")
54+
suite.add_expectation(expect_config)

ads/feature_store/docs/source/overview.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,10 @@ Oracle feature store is a stack based solution that is deployed in the customer
1515
- ``Dataset``: A dataset is a collection of feature that are used together to either train a model or perform model inference.
1616
- ``Dataset Job``: Dataset job is the execution instance of a dataset. Each dataset job will include validation results and statistics results.
1717

18+
.. important::
19+
20+
Prerequisite : Please contact #oci-feature-store_early-preview for getting your tenancy whitelisted for early access of feature store.
21+
1822
.. important::
1923

2024
The OCI Feature Store support following versions
@@ -32,6 +36,4 @@ Oracle feature store is a stack based solution that is deployed in the customer
3236
* - delta-spark
3337
- .. image:: https://img.shields.io/badge/delta-2.0.1-blue?style=for-the-badge&logo=pypi&logoColor=white
3438
* - pyspark
35-
- .. image:: https://img.shields.io/badge/pyspark-3.2.1-blue?style=for-the-badge&logo=pypi&logoColor=white
36-
37-
Please contact #oci-feature-store_early-preview for getting your tenancy whitelisted for early access of feature store.
39+
- .. image:: https://img.shields.io/badge/pyspark-3.2.1-blue?style=for-the-badge&logo=pypi&logoColor=white

ads/feature_store/docs/source/statistics.rst

Lines changed: 35 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,41 @@
33
Statistics
44
*************
55

6-
Feature Store provides functionality to compute statistics for feature groups & datasets and persist them along with the metadata. These statistics can help you
7-
to derive insights about the data quality.
6+
Feature Store provides functionality to compute statistics for feature groups as well as datasets and persist them along with the metadata. These statistics can help you
7+
to derive insights about the data quality. These statistical metrics are computed during materialisation time and persisting with other metadata.
88

99
.. note::
1010

11-
Feature Store utilizes MLM Insights which is a Python API that helps evaluate & monitor data for entirety of ML Observability lifecycle. It performs data summarization which reduces a dataset into a set of descriptive statistics.
11+
Feature Store utilizes MLM Insights which is a Python API that helps evaluate and monitor data for entirety of ML Observability lifecycle. It performs data summarization which reduces a dataset into a set of descriptive statistics.
12+
13+
The statistical metrics that are computed by feature store depend on the feature type.
14+
15+
Metrics for categorical data
16+
17+
- Count
18+
- TopKFrequentElements
19+
- TypeMetric
20+
- DuplicateCount
21+
- Mode
22+
- DistinctCount
23+
24+
Metrics for numerical data
25+
26+
- Skewness
27+
- StandardDeviation
28+
- Min
29+
- IsConstantFeature
30+
- IQR
31+
- Range
32+
- ProbabilityDistribution
33+
- Variance
34+
- TypeMetric
35+
- FrequencyDistribution
36+
- Count
37+
- Max
38+
- DistinctCount
39+
- Sum
40+
- IsQuasiConstantFeature
41+
- Quartiles
42+
- Mean
43+
- Kurtosis

0 commit comments

Comments
 (0)