diff --git a/nemo/NeMo-Safe-Synthesizer/advanced/extrinsic_evaluation.ipynb b/nemo/NeMo-Safe-Synthesizer/advanced/extrinsic_evaluation.ipynb
new file mode 100644
index 00000000..83da15e5
--- /dev/null
+++ b/nemo/NeMo-Safe-Synthesizer/advanced/extrinsic_evaluation.ipynb
@@ -0,0 +1,504 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "630e3e17",
+   "metadata": {},
+   "source": [
+    "# 🎛️ NeMo Safe Synthesizer 101: Extrinsic Evaluation\n",
+    "\n",
+    "> ⚠️ **Warning**: NeMo Safe Synthesizer is in Early Access and not recommended for production use.\n",
+    "\n",
+    "<br> \n",
+    "\n",
+    "In this notebook, we build off the foundational concepts from the *NeMo Safe Synthesizer 101: Data Generation* notebook. While the first notebook focused on *how* to generate synthetic data, this one focuses on **how to measure its quality and utility** for real-world applications.\n",
+    "\n",
+    "We'll do this using a common method called **extrinsic evaluation**, which involves testing the synthetic data's performance on a downstream machine learning task.\n",
+    "\n",
+    "---\n",
+    "\n",
+    "## 🎯 What is Extrinsic Evaluation?\n",
+    "\n",
+    "Extrinsic evaluation measures the **utility** of synthetic data by using it to train a model for a specific task. This contrasts with *intrinsic* evaluation, which might only measure the statistical similarity between the synthetic and real data.\n",
+    "\n",
+    "In this notebook, we'll use a **simple classification task** as our benchmark. The core idea is to answer the question:\n",
+    "\n",
+    "> \"Can a model trained **only** on our *synthetic data* achieve comparable performance to a model trained on the *real data*?\"\n",
+    "\n",
+    "If the answer is yes, it's a strong signal that our synthetic data has successfully captured the important patterns, relationships, and statistical properties of the original dataset. This is the \"Train-on-Synthetic, Test-on-Real\" approach."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8be84f5d",
+   "metadata": {},
+   "source": [
+    "#### 💾 Install dependencies\n",
+    "\n",
+    "**IMPORTANT** 👉 Ensure you have a NeMo Microservices Platform deployment available. Follow the quickstart or Helm chart instructions in your environment's setup guide. You may need to restart your kernel after installing dependencies.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9f5d6f5a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "from nemo_microservices import NeMoMicroservices\n",
+    "from nemo_microservices.beta.safe_synthesizer.builder import SafeSynthesizerBuilder\n",
+    "\n",
+    "import logging\n",
+    "logging.basicConfig(level=logging.WARNING)\n",
+    "logging.getLogger(\"httpx\").setLevel(logging.WARNING)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53bb2807",
+   "metadata": {},
+   "source": [
+    "### ⚙️ Initialize the NeMo Safe Synthesizer Client\n",
+    "\n",
+    "- The Python SDK provides a wrapper around the NeMo Microservices Platform APIs.\n",
+    "- `http://localhost:8080` is the default url for the client's `base_url` in the quickstart.\n",
+    "- If using a managed or remote deployment, ensure correct base URLs and tokens.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8c15ab93",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client = NeMoMicroservices(\n",
+    "    base_url=\"http://localhost:8080\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74d72ef7",
+   "metadata": {},
+   "source": [
+    "NeMo DataStore is launched as one of the services, and we'll use it to manage our storage. so we'll set the following:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ab037a3a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "datastore_config = {\n",
+    "    \"endpoint\": \"http://localhost:3000/v1/hf\",\n",
+    "    \"token\": \"placeholder\",\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d66c819",
+   "metadata": {},
+   "source": [
+    "## 📥 Load input data\n",
+    "\n",
+    "Safe Synthesizer learns the patterns and correlations in your input dataset to produce synthetic data with similar properties. For this tutorial, we will use a small public sample dataset. Replace it with your own data if desired.\n",
+    "\n",
+    "The sample dataset used here is a set of women's clothing reviews, including age, product category, rating, and review text. Some of the reviews contain Personally Identifiable Information (PII), such as height, weight, age, and location."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "daa955b6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# %uv pip install kagglehub, scikit-learn, tabulate"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7204f213",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import kagglehub\n",
+    "import pandas as pd\n",
+    "\n",
+    "# Download latest version\n",
+    "path = kagglehub.dataset_download(\"nicapotato/womens-ecommerce-clothing-reviews\")\n",
+    "raw_df = pd.read_csv(f\"{path}/Womens Clothing E-Commerce Reviews.csv\", index_col=0)\n",
+    "raw_df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c6c331b7",
+   "metadata": {},
+   "source": [
+    "We create a holdout dataset that will only be used for evaluating the end classifier"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "162876c3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "df, test_df = train_test_split(raw_df, test_size=0.2, random_state=42)\n",
+    "\n",
+    "print(f\"Original df length: {len(raw_df)}\")\n",
+    "print(f\"Training df length: {len(df)}\")\n",
+    "print(f\"Testing df length:  {len(test_df)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87d72c68",
+   "metadata": {},
+   "source": [
+    "## 🏗️ Create a Safe Synthesizer job\n",
+    "\n",
+    "The `SafeSynthesizerBuilder` provides a fluent interface to configure and submit jobs.\n",
+    "\n",
+    "The following code creates and submits a job:\n",
+    "- `SafeSynthesizerBuilder(client)`: initialize with the NeMo Microservices client.\n",
+    "- `.from_data_source(df)`: set the input data source.\n",
+    "- `.with_datastore(datastore_config)`: configure model artifact storage.\n",
+    "- `.with_replace_pii()`: enable automatic replacement of PII.\n",
+    "- `.synthesize()`: train and generate synthetic data.\n",
+    "- `.create_job()`: submit the job to the platform.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "85d9de56",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "job = (\n",
+    "    SafeSynthesizerBuilder(client)\n",
+    "    .from_data_source(df)\n",
+    "    .with_datastore(datastore_config)\n",
+    "    .with_replace_pii()\n",
+    "    .synthesize()\n",
+    "    .with_generate(num_records=15000)\n",
+    "    .create_job()\n",
+    ")\n",
+    "\n",
+    "print(f\"job_id = {job.job_id}\")\n",
+    "job.wait_for_completion()\n",
+    "\n",
+    "print(f\"Job finished with status {job.fetch_status()}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fa2eacb2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# If your notebook shuts down, it's okay, your job is still running on the microservices platform.\n",
+    "# You can get the same job object and interact with it again by uncommenting the following code\n",
+    "# snippet, and modifying it with the job id from the previous cell output.\n",
+    "\n",
+    "# from nemo_microservices.beta.safe_synthesizer.sdk.job import SafeSynthesizerJob\n",
+    "# job = SafeSynthesizerJob(job_id=\"<job id>\", client=client)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "285d4a9d",
+   "metadata": {},
+   "source": [
+    "## 👀 View synthetic data\n",
+    "\n",
+    "After the job completes, fetch the generated synthetic dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7f25574a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Fetch the synthetic data created by the job\n",
+    "synthetic_df = job.fetch_data()\n",
+    "synthetic_df\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b25f152",
+   "metadata": {},
+   "source": [
+    "## 📊 View evaluation report\n",
+    "\n",
+    "An evaluation comparing the synthetic data to the input data is performed automatically. You can:\n",
+    "\n",
+    "- **Inspect key scores**: overall synthetic data quality and privacy.\n",
+    "- **Download the full HTML report**: includes charts and detailed metrics.\n",
+    "- **Display the report inline**: useful when viewing in notebook environments.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7b691127",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Print selected information from the job summary\n",
+    "summary = job.fetch_summary()\n",
+    "print(\n",
+    "    f\"Synthetic data quality score (0-10, higher is better): {summary.synthetic_data_quality_score}\"\n",
+    ")\n",
+    "print(f\"Data privacy score (0-10, higher is better): {summary.data_privacy_score}\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "39e62ea9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Download the full evaluation report to your local machine\n",
+    "job.save_report(\"evaluation_report.html\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "45f7e22b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Fetch and display the full evaluation report inline\n",
+    "# job.display_report_in_notebook()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd1e4925-3620-4b31-bc17-16f74d10fbb5",
+   "metadata": {},
+   "source": [
+    "## 🧪 Extrinsic Evaluation \n",
+    "\n",
+    "This section details the **extrinsic evaluation** process, where the quality of the synthetic data is assessed based on how well a model trained on it performs on a real-world task. This comparison is critical for validating the synthetic data's utility.\n",
+    "\n",
+    "- **Train Benchmark Model**: A model is trained on a small, fixed subset of the **original data** to establish a performance baseline.\n",
+    "- **Train Synthetic Model**: A second model, using the same structure, is trained on the **entire synthetic dataset**.\n",
+    "- **Compare Performance**: Both models are evaluated against the same **fixed holdout test set** ($\\mathbf{X_{test}, y_{test}}$).\n",
+    "- **Inspect Key Metrics**: The comparison focuses on key metrics like **ROC AUC** and **F1-Score** to determine if the synthetic model performs comparably to the benchmark."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "37b6df30-6627-4a40-8604-e905ada571b7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# This script defines a scikit-learn pipeline for a classification task.\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import train_test_split \n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+    "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
+    "from sklearn.compose import ColumnTransformer\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "from sklearn.metrics import classification_report, accuracy_score, roc_auc_score\n",
+    "from sklearn.base import clone\n",
+    "\n",
+    "X_train = df.drop('Recommended IND', axis=1)\n",
+    "y_train = df['Recommended IND']\n",
+    "\n",
+    "X_train['Review Text'] = X_train['Review Text'].fillna('')\n",
+    "X_train['Title'] = X_train['Title'].fillna('')\n",
+    "\n",
+    "X_test = test_df.drop('Recommended IND', axis=1)\n",
+    "y_test = test_df['Recommended IND']\n",
+    "\n",
+    "X_test['Review Text'] = X_test['Review Text'].fillna('')\n",
+    "X_test['Title'] = X_test['Title'].fillna('')\n",
+    "\n",
+    "text_features = ['Review Text']\n",
+    "numerical_features = ['Age', 'Rating', 'Positive Feedback Count']\n",
+    "categorical_features = ['Division Name', 'Department Name', 'Class Name']\n",
+    "\n",
+    "text_transformer = TfidfVectorizer(stop_words='english', max_features=5000)\n",
+    "numerical_transformer = StandardScaler()\n",
+    "categorical_transformer = OneHotEncoder(handle_unknown='ignore') \n",
+    "\n",
+    "preprocessor = ColumnTransformer(\n",
+    "    transformers=[\n",
+    "        ('text', text_transformer, text_features[0]), \n",
+    "        ('num', numerical_transformer, numerical_features),\n",
+    "        ('cat', categorical_transformer, categorical_features)\n",
+    "    ],\n",
+    "    remainder='drop' \n",
+    ")\n",
+    "\n",
+    "model = LogisticRegression(solver='liblinear', random_state=42)\n",
+    "\n",
+    "full_pipeline = Pipeline(steps=[\n",
+    "    ('preprocessor', preprocessor),\n",
+    "    ('classifier', model)\n",
+    "])\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ee747c80-d42f-4ec5-b27b-2b2462436b92",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Train and evaluate a benchmark model pipeline, storing its performance metrics.\n",
+    "from sklearn.metrics import classification_report, accuracy_score, roc_auc_score\n",
+    "\n",
+    "original_pipeline = full_pipeline \n",
+    "print(\"\\n--- Training Benchmark Model on Original Data (1000 rows) ---\")\n",
+    "original_pipeline.fit(X_train, y_train)\n",
+    "\n",
+    "y_pred_original = original_pipeline.predict(X_test)\n",
+    "y_prob_original = original_pipeline.predict_proba(X_test)[:, 1]\n",
+    "\n",
+    "results = {}\n",
+    "results['Original'] = {\n",
+    "    'Accuracy': accuracy_score(y_test, y_pred_original),\n",
+    "    'ROC AUC': roc_auc_score(y_test, y_prob_original),\n",
+    "    'Classification Report': classification_report(y_test, y_pred_original, output_dict=True)\n",
+    "}\n",
+    "print(\"Benchmark training and evaluation complete.\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cf3f1d59-8c46-4d84-b813-a4adf88a3422",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Train a new model pipeline on synthetic data and evaluates it against the test set.\n",
+    "from sklearn.base import clone\n",
+    "from sklearn.metrics import classification_report, accuracy_score, roc_auc_score\n",
+    "\n",
+    "X_synthetic = synthetic_df.drop('Recommended IND', axis=1).fillna({'Review Text': '', 'Title': ''})\n",
+    "y_synthetic = synthetic_df['Recommended IND']\n",
+    "\n",
+    "synthetic_pipeline = clone(full_pipeline) \n",
+    "\n",
+    "print(\"\\n--- Training Model on Synthetic Data ---\")\n",
+    "synthetic_pipeline.fit(X_synthetic, y_synthetic)\n",
+    "\n",
+    "y_pred_synthetic = synthetic_pipeline.predict(X_test)\n",
+    "y_prob_synthetic = synthetic_pipeline.predict_proba(X_test)[:, 1]\n",
+    "\n",
+    "results['Synthetic'] = {\n",
+    "    'Accuracy': accuracy_score(y_test, y_pred_synthetic),\n",
+    "    'ROC AUC': roc_auc_score(y_test, y_prob_synthetic),\n",
+    "    'Classification Report': classification_report(y_test, y_pred_synthetic, output_dict=True)\n",
+    "}\n",
+    "print(\"Synthetic training and evaluation complete.\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d83e681e-aac2-44d0-83cb-1d93002a725d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Compare the performance of the original and synthetic models and prints a summary.\n",
+    "import pandas as pd\n",
+    "\n",
+    "print(\"\\n\" + \"=\"*60)\n",
+    "print(\"             SIDE-BY-SIDE MODEL COMPARISON\")\n",
+    "print(f\"             (Tested on {len(test_df)}-Row Holdout Set)\")\n",
+    "print(\"=\"*60)\n",
+    "\n",
+    "summary_data = {\n",
+    "    'Model': ['Original (Benchmark)', 'Synthetic'],\n",
+    "    'Train Size': [len(X_train), len(X_synthetic)],\n",
+    "    'Accuracy': [results['Original']['Accuracy'], results['Synthetic']['Accuracy']],\n",
+    "    'ROC AUC Score': [results['Original']['ROC AUC'], results['Synthetic']['ROC AUC']],\n",
+    "    'Precision (Class 1)': [results['Original']['Classification Report']['1']['precision'], results['Synthetic']['Classification Report']['1']['precision']],\n",
+    "    'Recall (Class 1)': [results['Original']['Classification Report']['1']['recall'], results['Synthetic']['Classification Report']['1']['recall']],\n",
+    "}\n",
+    "\n",
+    "summary_df = pd.DataFrame(summary_data).set_index('Model').T\n",
+    "summary_df.columns.name = 'Metric'\n",
+    "\n",
+    "print(summary_df.to_markdown(floatfmt=\".4f\"))\n",
+    "\n",
+    "print(\"\\n\" + \"=\"*60)\n",
+    "\n",
+    "print(\"Key Finding:\")\n",
+    "if results['Synthetic']['ROC AUC'] >= results['Original']['ROC AUC']:\n",
+    "    print(\"The Synthetic Model performs AS WELL OR BETTER than the Original Benchmark.\")\n",
+    "else:\n",
+    "    print(\"The Synthetic Model's performance is slightly lower than the Original Benchmark.\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "169f443d",
+   "metadata": {},
+   "source": [
+    "Your end result should look similar to this:\n",
+    "\n",
+    "|                     |   Original (Benchmark) |   Synthetic |\n",
+    "|:--------------------|-----------------------:|------------:|\n",
+    "| Train Size          |                 18,788 |      15,000 |\n",
+    "| Accuracy            |                 0.9404 |      0.9278 |\n",
+    "| ROC AUC Score       |                 0.9782 |      0.9762 |\n",
+    "| Precision (Class 1) |                 0.9626 |      0.9423 |\n",
+    "| Recall (Class 1)    |                 0.9646 |      0.9714 |\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c9b55961-ddac-4d91-aa4d-9646fb72c7be",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "My Virtual Env",
+   "language": "python",
+   "name": "myenv"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.14"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}