|
4 | 4 |
|
5 | 5 | This guide shows how to fine-tune embedding models using the NVIDIA NeMo Microservices platform to improve performance on domain-specific tasks. |
6 | 6 |
|
| 7 | +### Why Fine-tune Embedding Models? |
| 8 | + |
| 9 | +Retrieval quality determines AI application quality. Better retrieval means more accurate RAG (Retrieval Augmented Generation) responses, smarter agents, and more relevant search results. Embedding models power this retrieval by converting text into semantic vectors, but pre-trained models aren't optimized for your domain's specific vocabulary and context. |
| 10 | + |
| 11 | +Fine-tuning adapts embedding models to your data (whether scientific literature, legal documents, or enterprise knowledge bases) to achieve measurably better retrieval performance. NeMo Microservices makes this practical by providing production-ready infrastructure that handles data preparation, training, deployment, and evaluation, letting you focus on improving models rather than building pipelines. |
| 12 | + |
| 13 | +> **New to NeMo Microservices?** Learn about Data Flywheel workflows in the [main repository README](../../../README.md#data-flywheel) or explore the [NeMo Microservices documentation](https://docs.nvidia.com/nemo/microservices/latest/about/index.html). |
| 14 | +
|
7 | 15 | <div style="text-align: center;"> |
8 | 16 | <img src="./img/e2e-embedding-ft.png" alt="Embedding Fine-tuning workflow with NeMo Microservices" width="80%" /> |
9 | | - <p><strong>Figure 1:</strong> Workflow for fine-tuning embedding models using NeMo Microservices</p> |
| 17 | + <p><strong>Figure 1:</strong> End-to-end workflow for fine-tuning embedding models using NeMo Microservices</p> |
10 | 18 | </div> |
11 | 19 |
|
| 20 | +The diagram above shows the embedding fine-tuning workflow that NeMo Microservices orchestrates: |
| 21 | + |
| 22 | +1. **Data Preparation**: Download and format raw data locally into query-document triplets, then upload to the NeMo Data Store. |
| 23 | +2. **Fine-tuning**: The NeMo Customizer service orchestrates training by launching a dedicated job that retrieves the base model and training data, performs supervised fine-tuning on GPU(s), and saves the fine-tuned weights to the Entity Store (model registry). |
| 24 | +3. **Deployment**: The Deployment Management Service deploys the fine-tuned model as a NVIDIA Inference Microservice (NIM). It retrieves the model weights from the Entity Store and starts the NIM inference service. |
| 25 | +4. **Evaluation**: The NeMo Evaluator service measures performance by querying the deployed NIM with benchmark tasks (such as Benchmarking Information Retrieval (BEIR) SciDocs) and calculating retrieval metrics like recall and NDCG. |
| 26 | + |
| 27 | +This modular architecture enables each component to be independently scaled and managed. |
| 28 | + |
12 | 29 | ## Objectives |
13 | 30 |
|
14 | | -This tutorial shows how to leverage the NeMo Microservices platform for finetuning a [nvidia/llama-3.2-nv-embedqa-1b-v2](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2/modelcard) embedding model using the [SPECTER](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) dataset, then evaluating its accuracy on the (somewhat related) zero-shot [BeIR Scidocs](https://huggingface.co/datasets/BeIR/scidocs) benchmark. |
| 31 | +This tutorial shows how to use the NeMo Microservices platform to fine-tune the [nvidia/llama-3.2-nv-embedqa-1b-v2](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2/modelcard) embedding model using the [SPECTER](https://huggingface.co/datasets/sentence-transformers/specter) dataset, then evaluate its performance on the [BEIR Scidocs](https://huggingface.co/datasets/BeIR/scidocs) benchmark against baseline metrics. |
| 32 | + |
| 33 | +By the end of this tutorial, you will: |
| 34 | +- Fine-tune an embedding model on scientific domain data |
| 35 | +- Deploy the fine-tuned model as a NIM |
| 36 | +- Evaluate retrieval performance on the [BEIR Scidocs](https://huggingface.co/datasets/BeIR/scidocs) benchmark |
| 37 | +- Compare results against baseline model metrics to demonstrate measurable improvement |
15 | 38 |
|
16 | 39 | The tutorial covers the following steps: |
17 | 40 |
|
18 | 41 | 1. [Download and prepare data for fine-tuning](./1_data_preparation.ipynb) |
19 | | -2. [Fine-tune the embedding model with SFT](./2_finetuning_and_inference.ipynb) |
| 42 | +2. [Fine-tune the embedding model with SFT (Supervised Fine-Tuning)](./2_finetuning_and_inference.ipynb) |
20 | 43 | 3. [Evaluate the model on a zero-shot Scidocs task](./3_evaluation.ipynb) |
21 | 44 |
|
22 | | -**Note:** A typical workflow involves creating query, positive document, and negative document triplets from a text corpus. This may include synthetic data generation (SDG) and hard-negative mining. For a quick demonstration, we use an existing open dataset from Hugging Face. |
| 45 | +### About the SPECTER Dataset |
23 | 46 |
|
| 47 | +The [SPECTER](https://huggingface.co/datasets/embedding-data/SPECTER) dataset contains approximately 684K triplets from the scientific domain designed for training embedding models. Each triplet consists of: |
24 | 48 |
|
25 | | -### About NVIDIA NeMo Microservices |
| 49 | +- **Query**: A paper title representing a search query |
| 50 | +- **Positive**: A related paper that should be retrieved (e.g., papers that cite each other) |
| 51 | +- **Negative**: An unrelated paper that should not be retrieved |
26 | 52 |
|
27 | | -NVIDIA NeMo is a modular, enterprise-ready software suite for managing the AI agent lifecycle, enabling enterprises to build, deploy, and optimize agentic systems. |
28 | | - |
29 | | -NVIDIA NeMo microservices, part of the [NVIDIA NeMo software suite](https://www.nvidia.com/en-us/ai-data-science/products/nemo/), are an API-first modular set of tools that you can use to customize, evaluate, and secure large language models (LLMs) and embedding models while optimizing AI applications across on-premises or cloud-based Kubernetes clusters. |
30 | | - |
31 | | -Refer to the [NVIDIA NeMo microservices documentation](https://docs.nvidia.com/nemo/microservices/latest/about/index.html) for further information. |
32 | | - |
33 | | -### About the SPECTER dataset |
| 53 | +**Example triplet:** |
| 54 | +``` |
| 55 | +Query: "Deep Residual Learning for Image Recognition" |
| 56 | +Positive: "Identity Mappings in Deep Residual Networks" |
| 57 | +Negative: "Attention Is All You Need" |
| 58 | +``` |
34 | 59 |
|
35 | | -The [SPECTER](https://huggingface.co/datasets/embedding-data/SPECTER) dataset contains approximately 684K triplets pertaining to the scientific domain (titles of papers), which can be used to train embedding models. We will use the SPECTER data for finetuning. |
| 60 | +During fine-tuning, the model learns through **contrastive learning** to maximize the similarity between the query and positive document while minimizing similarity with negative documents. This trains the model to produce embeddings that effectively capture semantic relationships in the scientific literature domain. |
36 | 61 |
|
37 | 62 | ## Prerequisites |
38 | 63 |
|
39 | 64 | ### Deploy NeMo Microservices |
40 | 65 |
|
41 | | -To follow this tutorial, you will need at least two NVIDIA GPUs, which will be allocated as follows: |
| 66 | +To follow this tutorial, you will need at least two NVIDIA GPUs: |
42 | 67 |
|
43 | | -- **Fine-tuning:** One GPU for fine-tuning the `llama-3.2-nv-embedqa-1b-v2` model using NeMo Customizer. |
44 | | -- **Inference:** One GPU for deploying the `llama-3.2-nv-embedqa-1b-v2` NIM for inference. |
| 68 | +- **Fine-tuning:** One GPU for fine-tuning the `llama-3.2-nv-embedqa-1b-v2` model with NeMo Customizer. |
| 69 | +- **Inference:** One GPU for deploying the fine-tuned model as a NIM. |
45 | 70 |
|
46 | | -Refer to the [platform prerequisites and installation guide](https://docs.nvidia.com/nemo/microservices/latest/get-started/platform-prereq.html) to deploy NeMo Microservices. |
| 71 | +If you're new to NeMo Microservices, follow the [Demo Cluster Setup on Minikube](https://docs.nvidia.com/nemo/microservices/latest/get-started/setup/minikube/index.html) guide to get started. For production deployments, refer to the [platform prerequisites and installation guide](https://docs.nvidia.com/nemo/microservices/latest/get-started/platform-prereq.html). |
47 | 72 |
|
48 | 73 | > **NOTE:** Fine-tuning for embedding models is supported starting with NeMo Microservices version 25.8.0. Please ensure you deploy NeMo Microservices Helm chart version 25.8.0 or later to use these notebooks. |
49 | 74 |
|
@@ -89,6 +114,11 @@ Ensure you have access to: |
89 | 114 |
|
90 | 115 | 3. Update the following variables in [config.py](./config.py) with your specific URLs and API keys. |
91 | 116 |
|
| 117 | + **How to obtain the required values:** |
| 118 | + |
| 119 | + - **NeMo Microservices URLs**: If you followed the [Demo Cluster Setup on Minikube](https://docs.nvidia.com/nemo/microservices/latest/get-started/setup/minikube/index.html) guide, run `cat /etc/hosts` on your deployment machine to view the configured service hostnames and IP addresses. |
| 120 | + - **Hugging Face Token**: Generate a token at [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) to download the SPECTER dataset. |
| 121 | + |
92 | 122 | ```python |
93 | 123 | # (Required) NeMo Microservices URLs |
94 | 124 | NDS_URL = "http://data-store.test" # Data Store |
|
0 commit comments