Skip to content

Commit e04f201

Browse files
authored
feat: enhance and fix README.md for embedding fine tuning tutorial (#386)
* feat: improvements for readme in embedding tutorial * fix: typo
1 parent 32e592a commit e04f201

File tree

1 file changed

+47
-17
lines changed
  • nemo/data-flywheel/embedding-finetuning

1 file changed

+47
-17
lines changed

nemo/data-flywheel/embedding-finetuning/README.md

Lines changed: 47 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,46 +4,71 @@
44

55
This guide shows how to fine-tune embedding models using the NVIDIA NeMo Microservices platform to improve performance on domain-specific tasks.
66

7+
### Why Fine-tune Embedding Models?
8+
9+
Retrieval quality determines AI application quality. Better retrieval means more accurate RAG (Retrieval Augmented Generation) responses, smarter agents, and more relevant search results. Embedding models power this retrieval by converting text into semantic vectors, but pre-trained models aren't optimized for your domain's specific vocabulary and context.
10+
11+
Fine-tuning adapts embedding models to your data (whether scientific literature, legal documents, or enterprise knowledge bases) to achieve measurably better retrieval performance. NeMo Microservices makes this practical by providing production-ready infrastructure that handles data preparation, training, deployment, and evaluation, letting you focus on improving models rather than building pipelines.
12+
13+
> **New to NeMo Microservices?** Learn about Data Flywheel workflows in the [main repository README](../../../README.md#data-flywheel) or explore the [NeMo Microservices documentation](https://docs.nvidia.com/nemo/microservices/latest/about/index.html).
14+
715
<div style="text-align: center;">
816
<img src="./img/e2e-embedding-ft.png" alt="Embedding Fine-tuning workflow with NeMo Microservices" width="80%" />
9-
<p><strong>Figure 1:</strong> Workflow for fine-tuning embedding models using NeMo Microservices</p>
17+
<p><strong>Figure 1:</strong> End-to-end workflow for fine-tuning embedding models using NeMo Microservices</p>
1018
</div>
1119

20+
The diagram above shows the embedding fine-tuning workflow that NeMo Microservices orchestrates:
21+
22+
1. **Data Preparation**: Download and format raw data locally into query-document triplets, then upload to the NeMo Data Store.
23+
2. **Fine-tuning**: The NeMo Customizer service orchestrates training by launching a dedicated job that retrieves the base model and training data, performs supervised fine-tuning on GPU(s), and saves the fine-tuned weights to the Entity Store (model registry).
24+
3. **Deployment**: The Deployment Management Service deploys the fine-tuned model as a NVIDIA Inference Microservice (NIM). It retrieves the model weights from the Entity Store and starts the NIM inference service.
25+
4. **Evaluation**: The NeMo Evaluator service measures performance by querying the deployed NIM with benchmark tasks (such as Benchmarking Information Retrieval (BEIR) SciDocs) and calculating retrieval metrics like recall and NDCG.
26+
27+
This modular architecture enables each component to be independently scaled and managed.
28+
1229
## Objectives
1330

14-
This tutorial shows how to leverage the NeMo Microservices platform for finetuning a [nvidia/llama-3.2-nv-embedqa-1b-v2](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2/modelcard) embedding model using the [SPECTER](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) dataset, then evaluating its accuracy on the (somewhat related) zero-shot [BeIR Scidocs](https://huggingface.co/datasets/BeIR/scidocs) benchmark.
31+
This tutorial shows how to use the NeMo Microservices platform to fine-tune the [nvidia/llama-3.2-nv-embedqa-1b-v2](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2/modelcard) embedding model using the [SPECTER](https://huggingface.co/datasets/sentence-transformers/specter) dataset, then evaluate its performance on the [BEIR Scidocs](https://huggingface.co/datasets/BeIR/scidocs) benchmark against baseline metrics.
32+
33+
By the end of this tutorial, you will:
34+
- Fine-tune an embedding model on scientific domain data
35+
- Deploy the fine-tuned model as a NIM
36+
- Evaluate retrieval performance on the [BEIR Scidocs](https://huggingface.co/datasets/BeIR/scidocs) benchmark
37+
- Compare results against baseline model metrics to demonstrate measurable improvement
1538

1639
The tutorial covers the following steps:
1740

1841
1. [Download and prepare data for fine-tuning](./1_data_preparation.ipynb)
19-
2. [Fine-tune the embedding model with SFT](./2_finetuning_and_inference.ipynb)
42+
2. [Fine-tune the embedding model with SFT (Supervised Fine-Tuning)](./2_finetuning_and_inference.ipynb)
2043
3. [Evaluate the model on a zero-shot Scidocs task](./3_evaluation.ipynb)
2144

22-
**Note:** A typical workflow involves creating query, positive document, and negative document triplets from a text corpus. This may include synthetic data generation (SDG) and hard-negative mining. For a quick demonstration, we use an existing open dataset from Hugging Face.
45+
### About the SPECTER Dataset
2346

47+
The [SPECTER](https://huggingface.co/datasets/embedding-data/SPECTER) dataset contains approximately 684K triplets from the scientific domain designed for training embedding models. Each triplet consists of:
2448

25-
### About NVIDIA NeMo Microservices
49+
- **Query**: A paper title representing a search query
50+
- **Positive**: A related paper that should be retrieved (e.g., papers that cite each other)
51+
- **Negative**: An unrelated paper that should not be retrieved
2652

27-
NVIDIA NeMo is a modular, enterprise-ready software suite for managing the AI agent lifecycle, enabling enterprises to build, deploy, and optimize agentic systems.
28-
29-
NVIDIA NeMo microservices, part of the [NVIDIA NeMo software suite](https://www.nvidia.com/en-us/ai-data-science/products/nemo/), are an API-first modular set of tools that you can use to customize, evaluate, and secure large language models (LLMs) and embedding models while optimizing AI applications across on-premises or cloud-based Kubernetes clusters.
30-
31-
Refer to the [NVIDIA NeMo microservices documentation](https://docs.nvidia.com/nemo/microservices/latest/about/index.html) for further information.
32-
33-
### About the SPECTER dataset
53+
**Example triplet:**
54+
```
55+
Query: "Deep Residual Learning for Image Recognition"
56+
Positive: "Identity Mappings in Deep Residual Networks"
57+
Negative: "Attention Is All You Need"
58+
```
3459

35-
The [SPECTER](https://huggingface.co/datasets/embedding-data/SPECTER) dataset contains approximately 684K triplets pertaining to the scientific domain (titles of papers), which can be used to train embedding models. We will use the SPECTER data for finetuning.
60+
During fine-tuning, the model learns through **contrastive learning** to maximize the similarity between the query and positive document while minimizing similarity with negative documents. This trains the model to produce embeddings that effectively capture semantic relationships in the scientific literature domain.
3661

3762
## Prerequisites
3863

3964
### Deploy NeMo Microservices
4065

41-
To follow this tutorial, you will need at least two NVIDIA GPUs, which will be allocated as follows:
66+
To follow this tutorial, you will need at least two NVIDIA GPUs:
4267

43-
- **Fine-tuning:** One GPU for fine-tuning the `llama-3.2-nv-embedqa-1b-v2` model using NeMo Customizer.
44-
- **Inference:** One GPU for deploying the `llama-3.2-nv-embedqa-1b-v2` NIM for inference.
68+
- **Fine-tuning:** One GPU for fine-tuning the `llama-3.2-nv-embedqa-1b-v2` model with NeMo Customizer.
69+
- **Inference:** One GPU for deploying the fine-tuned model as a NIM.
4570

46-
Refer to the [platform prerequisites and installation guide](https://docs.nvidia.com/nemo/microservices/latest/get-started/platform-prereq.html) to deploy NeMo Microservices.
71+
If you're new to NeMo Microservices, follow the [Demo Cluster Setup on Minikube](https://docs.nvidia.com/nemo/microservices/latest/get-started/setup/minikube/index.html) guide to get started. For production deployments, refer to the [platform prerequisites and installation guide](https://docs.nvidia.com/nemo/microservices/latest/get-started/platform-prereq.html).
4772

4873
> **NOTE:** Fine-tuning for embedding models is supported starting with NeMo Microservices version 25.8.0. Please ensure you deploy NeMo Microservices Helm chart version 25.8.0 or later to use these notebooks.
4974
@@ -89,6 +114,11 @@ Ensure you have access to:
89114

90115
3. Update the following variables in [config.py](./config.py) with your specific URLs and API keys.
91116

117+
**How to obtain the required values:**
118+
119+
- **NeMo Microservices URLs**: If you followed the [Demo Cluster Setup on Minikube](https://docs.nvidia.com/nemo/microservices/latest/get-started/setup/minikube/index.html) guide, run `cat /etc/hosts` on your deployment machine to view the configured service hostnames and IP addresses.
120+
- **Hugging Face Token**: Generate a token at [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) to download the SPECTER dataset.
121+
92122
```python
93123
# (Required) NeMo Microservices URLs
94124
NDS_URL = "http://data-store.test" # Data Store

0 commit comments

Comments
 (0)