|
1 | 1 | # Overview |
2 | 2 |
|
3 | | -| Developed by | Guardrails AI | |
4 | | -| --- | --- | |
5 | | -| Date of development | Feb 15, 2024 | |
6 | | -| Validator type | Format | |
7 | | -| Blog | | |
8 | | -| License | Apache 2 | |
9 | | -| Input/Output | Output | |
| 3 | +| Developed by | ML cube | |
| 4 | +| ------------------- | ------------- | |
| 5 | +| Date of development | Sep 9, 2025 | |
| 6 | +| Validator type | RAG Retrieved Context | |
| 7 | +| Blog | | |
| 8 | +| License | Apache 2 | |
| 9 | +| Input/Output | Rag Retrieval | |
10 | 10 |
|
11 | 11 | ## Description |
12 | 12 |
|
13 | | -### Intended Use |
14 | | -This validator is a template for creating other validators, but for demonstrative purposes it ensures that a generated output is the literal `pass`. |
15 | | - |
16 | | -### Requirements |
| 13 | +This validator checks whether the retrieved context in a RAG (Retrieval-Augmented Generation) system relates to the user's query. It can be used in two ways: |
| 14 | +- `RAG Context Relevance`: Validates if the retrieved context is relevant to the user's query. Relevant means that the context is related to the question, even if it does not directly contain the answer. |
| 15 | +- `RAG Context Usefulness`: Validates if the retrieved context is useful for answering the user's query. Useful means that the context contains information that can help answer the question. |
17 | 16 |
|
18 | | -* Dependencies: |
19 | | - - guardrails-ai>=0.4.0 |
| 17 | +### Intended Use |
20 | 18 |
|
21 | | -* Foundation model access keys: |
22 | | - - OPENAI_API_KEY |
| 19 | +It can be used in a RAG system to prevent the model from hallucinating or generating incorrect responses based on irrelevant context. |
23 | 20 |
|
24 | | -## Installation |
| 21 | +### Requirements |
25 | 22 |
|
26 | | -```bash |
27 | | -$ guardrails hub install hub://guardrails/validator_template |
28 | | -``` |
| 23 | +- Dependencies: |
| 24 | + - guardrails-ai >= 0.5.15 |
| 25 | + - langchain[openai] >= 0.3.27 |
29 | 26 |
|
30 | | -## Usage Examples |
| 27 | +- OpenAI Foundation model access keys: |
| 28 | + - OPENAI_API_KEY |
31 | 29 |
|
32 | | -### Validating string output via Python |
| 30 | +## Examples |
33 | 31 |
|
34 | | -In this example, we apply the validator to a string output generated by an LLM. |
| 32 | +In this example we apply the RagContextValidator to validate the relevance of the retrieved context to the user's query. |
35 | 33 |
|
36 | 34 | ```python |
37 | | -# Import Guard and Validator |
38 | | -from guardrails.hub import ValidatorTemplate |
| 35 | +from validator.prompts.prompts import RagContextRelevancePrompt |
| 36 | +from validator.main import MLcubeRagContextValidator |
39 | 37 | from guardrails import Guard |
40 | 38 |
|
41 | | -# Setup Guard |
| 39 | +# Define the guard with the MLcubeRagContextValidator, |
| 40 | +# specifying the relevance prompt generator to enable |
| 41 | +# context relevance evaluation. |
42 | 42 | guard = Guard().use( |
43 | | - ValidatorTemplate |
| 43 | + MLcubeRagContextValidator( |
| 44 | + rag_context_eval_prompt=RagContextRelevancePrompt(), |
| 45 | + pass_threshold=1, |
| 46 | + model_name="gpt-4o-mini", |
| 47 | + on_fail="noop", |
| 48 | + on="prompt", |
| 49 | + ) |
44 | 50 | ) |
45 | 51 |
|
46 | | -guard.validate("pass") # Validator passes |
47 | | -guard.validate("fail") # Validator fails |
48 | | -``` |
| 52 | +# Sample metadata. Retrieved context is relevant to the user query. |
| 53 | +metadata = { |
| 54 | + "user_input": "What's the weather in Milan, today?", |
| 55 | + "retrieved_context": "Milan, what a beautiful day. Sunny and warm.", |
| 56 | +} |
49 | 57 |
|
50 | | -### Validating JSON output via Python |
| 58 | +# Make a call to the LLM with the guardrail in place. |
| 59 | +response = guard( |
| 60 | + llm_api=openai.chat.completions.create, |
| 61 | + prompt=metadata["user_input"], |
| 62 | + model="gpt-4o-mini", |
| 63 | + max_tokens=1024, |
| 64 | + temperature=0, |
| 65 | + metadata=metadata, |
| 66 | +) |
51 | 67 |
|
52 | | -In this example, we apply the validator to a string field of a JSON output generated by an LLM. |
| 68 | +# Assert that the validation passed since the context is relevant. |
| 69 | +assert response.validation_passed |
53 | 70 |
|
54 | | -```python |
55 | | -# Import Guard and Validator |
56 | | -from pydantic import BaseModel, Field |
57 | | -from guardrails.hub import ValidatorTemplate |
58 | | -from guardrails import Guard |
| 71 | +# We now change the retrieved context to be irrelevant to the user query. |
| 72 | +metadata["retrieved_context"] = "The capital of Italy is Rome." |
| 73 | + |
| 74 | +response = guard( |
| 75 | + llm_api=openai.chat.completions.create, |
| 76 | + prompt=metadata["user_input"], |
| 77 | + model="gpt-4o-mini", |
| 78 | + max_tokens=1024, |
| 79 | + temperature=0, |
| 80 | + metadata=metadata, |
| 81 | +) |
59 | 82 |
|
60 | | -# Initialize Validator |
61 | | -val = ValidatorTemplate() |
| 83 | +# We assert that the validation failed since the context is irrelevant. |
| 84 | +assert not response.validation_passed |
| 85 | +``` |
62 | 86 |
|
63 | | -# Create Pydantic BaseModel |
64 | | -class Process(BaseModel): |
65 | | - process_name: str |
66 | | - status: str = Field(validators=[val]) |
| 87 | +In this example we evaluate the usefulness of the retrieved context to the user's query. This time we call the `parse` method of the guard directly. |
67 | 88 |
|
68 | | -# Create a Guard to check for valid Pydantic output |
69 | | -guard = Guard.from_pydantic(output_class=Process) |
| 89 | +```python |
70 | 90 |
|
71 | | -# Run LLM output generating JSON through guard |
72 | | -guard.parse(""" |
73 | | -{ |
74 | | - "process_name": "templating", |
75 | | - "status": "pass" |
| 91 | +from validator.prompts.prompts import ( |
| 92 | + RagContextRelevancePrompt, |
| 93 | + RagContextUsefulnessPrompt, |
| 94 | +) |
| 95 | + |
| 96 | +guard = Guard().use( |
| 97 | + MLcubeRagContextValidator( |
| 98 | + rag_context_eval_prompt=RagContextUsefulnessPrompt(), |
| 99 | + pass_threshold=1, |
| 100 | + model_name="gpt-4o-mini", |
| 101 | + on_fail="noop", |
| 102 | + on="prompt", |
| 103 | + ) |
| 104 | +) |
| 105 | + |
| 106 | +# Sample metadata. Retrieved context is not useful to the user query |
| 107 | +# since it talks about a different city. |
| 108 | +metadata = { |
| 109 | + "user_input": "What's the weather in Milan, today?", |
| 110 | + "retrieved_context": "Roma, what a beautiful day. Sunny and warm.", |
76 | 111 | } |
77 | | -""") |
| 112 | + |
| 113 | +resp = guard.parse( |
| 114 | + metadata["user_input"], |
| 115 | + metadata=metadata, |
| 116 | +) |
| 117 | + |
| 118 | +# Assert that the validation failed since the context is not useful. |
| 119 | +assert not resp.validation_passed |
78 | 120 | ``` |
| 121 | +## Benchmark |
| 122 | + |
| 123 | +We benchmark the validator on a subset of the [WikiQA](https://www.microsoft.com/en-us/research/project/wikiqa/) dataset. You can find the benchmark script, dataset and a summary of the results in the `benchmark` folder. |
79 | 124 |
|
80 | 125 | # API Reference |
81 | 126 |
|
82 | | -**`__init__(self, on_fail="noop")`** |
| 127 | +**`__init__(self, rag_context_eval_prompt, pass_threshold, model_name, on_fail="noop", default_min=0, default_max=1, **kwargs)`** |
| 128 | + |
83 | 129 | <ul> |
84 | | -Initializes a new instance of the ValidatorTemplate class. |
| 130 | +Initializes a new instance of the MLcubeRagContextValidator class for evaluating RAG context. |
85 | 131 |
|
86 | 132 | **Parameters** |
87 | | -- **`arg_1`** *(str)*: A placeholder argument to demonstrate how to use init arguments. |
88 | | -- **`arg_2`** *(str)*: Another placeholder argument to demonstrate how to use init arguments. |
89 | | -- **`on_fail`** *(str, Callable)*: The policy to enact when a validator fails. If `str`, must be one of `reask`, `fix`, `filter`, `refrain`, `noop`, `exception` or `fix_reask`. Otherwise, must be a function that is called when the validator fails. |
| 133 | + |
| 134 | +- **`rag_context_eval_prompt`** _(Ml3RagContextEvalBasePrompt)_: The prompt generator used to create evaluation prompts for the LLM judge. |
| 135 | +- **`pass_threshold `** _(str)_: The minimum rating score required for the validation to pass. |
| 136 | +- **`model_name`** _(str)_: The name of the LLM model to use for evaluation (es: `gpt-4o-mini`). |
| 137 | +- **`default_min`** _(int)_: The default minimum value for the rating range. Default is `0`. |
| 138 | +- **`default_max`** _(int)_: The default maximum value for the rating range. Default is `1`. |
| 139 | +- **`on_fail`** _(str, Callable)_: The policy to enact when a validator fails. If `str`, must be one of `reask`, `fix`, `filter`, `refrain`, `noop`, `exception` or `fix_reask`. Otherwise, must be a function that is called when the validator fails. |
| 140 | +- **`kwargs`** _(dict)_: Additional keyword arguments to pass to the base Validator class. |
90 | 141 | </ul> |
91 | 142 | <br/> |
92 | 143 |
|
93 | 144 | **`validate(self, value, metadata) -> ValidationResult`** |
| 145 | + |
94 | 146 | <ul> |
95 | | -Validates the given `value` using the rules defined in this validator, relying on the `metadata` provided to customize the validation process. This method is automatically invoked by `guard.parse(...)`, ensuring the validation logic is applied to the input data. |
| 147 | +Validates the retrieved context with respect to the user query and the specified prompt generator The validator uses structured output to get a rating and explanation from the LLM, then compares the rating against the pass threshold. |
96 | 148 |
|
97 | 149 | Note: |
98 | 150 |
|
99 | 151 | 1. This method should not be called directly by the user. Instead, invoke `guard.parse(...)` where this method will be called internally for each associated Validator. |
100 | | -2. When invoking `guard.parse(...)`, ensure to pass the appropriate `metadata` dictionary that includes keys and values required by this validator. If `guard` is associated with multiple validators, combine all necessary metadata into a single dictionary. |
| 152 | +2. When invoking `guard.parse(...)`, ensure to pass the appropriate `metadata` dictionary that includes keys and values required by this validator (see below). If `guard` is associated with multiple validators, combine all necessary metadata into a single dictionary. |
101 | 153 |
|
102 | 154 | **Parameters** |
103 | | -- **`value`** *(Any)*: The input value to validate. |
104 | | -- **`metadata`** *(dict)*: A dictionary containing metadata required for validation. Keys and values must match the expectations of this validator. |
105 | | - |
106 | | - |
107 | | - | Key | Type | Description | Default | |
108 | | - | --- | --- | --- | --- | |
109 | | - | `key1` | String | Description of key1's role. | N/A | |
110 | | -</ul> |
| 155 | + |
| 156 | +- **`value`** _(Any)_: The input value to validate. |
| 157 | +- **`metadata`** _(dict)_: A dictionary containing metadata required for validation. Keys and values must match the expectations of this validator. |
| 158 | + | Key | Type | Description | Default | |
| 159 | + | --- | --- | --- | --- | |
| 160 | + | `user_input` | String | The original user query passed into the RAG system. | N/A (Required) | |
| 161 | + | `retrieved_context` | String | The context retrieved and used by the RAG system. | N/A (Required) | |
| 162 | + | `min_range_value` | String | The minimum value for the rating range used by the LLM judge. | 0 (the default of the validator class) | |
| 163 | + | `max_range_value` | String | The maximum value for the rating range used by the LLM judge. | 1 (the default of the validator class) | |
| 164 | + </ul> |
| 165 | + |
| 166 | +**Returns** |
| 167 | + |
| 168 | +**`ValidationResult`**: Returns a `PassResult` if the LLM judge's rating meets or exceeds the pass threshold, or a `FailResult` with detailed explanation if the rating is below the threshold. |
0 commit comments