Skip to content

Commit 1c2a534

Browse files
Code, benchmark, README.md
1 parent ebc0f37 commit 1c2a534

File tree

18 files changed

+3682
-142
lines changed

18 files changed

+3682
-142
lines changed

.env

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
OPENAI_API_KEY=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

.env-template

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
OPENAI_API_KEY=<your_api_key>

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,6 @@ build
66
.pytest_cache
77
.ruff_cache
88
.vscode
9-
.idea
9+
.idea
10+
.env
11+
evaluation_dataset/

Makefile

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
1+
2+
# Use uv as dependency manager, pip should work anyway
13
dev:
2-
pip install -e ".[dev]"
4+
# pip install -e ".[dev]"
5+
uv sync --all-extras --no-cache
36

47
lint:
58
ruff check .
@@ -13,4 +16,4 @@ type:
1316
qa:
1417
make lint
1518
make type
16-
make test
19+
make test

README.md

Lines changed: 123 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,110 +1,168 @@
11
# Overview
22

3-
| Developed by | Guardrails AI |
4-
| --- | --- |
5-
| Date of development | Feb 15, 2024 |
6-
| Validator type | Format |
7-
| Blog | |
8-
| License | Apache 2 |
9-
| Input/Output | Output |
3+
| Developed by | ML cube |
4+
| ------------------- | ------------- |
5+
| Date of development | Sep 9, 2025 |
6+
| Validator type | RAG Retrieved Context |
7+
| Blog | |
8+
| License | Apache 2 |
9+
| Input/Output | Rag Retrieval |
1010

1111
## Description
1212

13-
### Intended Use
14-
This validator is a template for creating other validators, but for demonstrative purposes it ensures that a generated output is the literal `pass`.
15-
16-
### Requirements
13+
This validator checks whether the retrieved context in a RAG (Retrieval-Augmented Generation) system relates to the user's query. It can be used in two ways:
14+
- `RAG Context Relevance`: Validates if the retrieved context is relevant to the user's query. Relevant means that the context is related to the question, even if it does not directly contain the answer.
15+
- `RAG Context Usefulness`: Validates if the retrieved context is useful for answering the user's query. Useful means that the context contains information that can help answer the question.
1716

18-
* Dependencies:
19-
- guardrails-ai>=0.4.0
17+
### Intended Use
2018

21-
* Foundation model access keys:
22-
- OPENAI_API_KEY
19+
It can be used in a RAG system to prevent the model from hallucinating or generating incorrect responses based on irrelevant context.
2320

24-
## Installation
21+
### Requirements
2522

26-
```bash
27-
$ guardrails hub install hub://guardrails/validator_template
28-
```
23+
- Dependencies:
24+
- guardrails-ai >= 0.5.15
25+
- langchain[openai] >= 0.3.27
2926

30-
## Usage Examples
27+
- OpenAI Foundation model access keys:
28+
- OPENAI_API_KEY
3129

32-
### Validating string output via Python
30+
## Examples
3331

34-
In this example, we apply the validator to a string output generated by an LLM.
32+
In this example we apply the RagContextValidator to validate the relevance of the retrieved context to the user's query.
3533

3634
```python
37-
# Import Guard and Validator
38-
from guardrails.hub import ValidatorTemplate
35+
from validator.prompts.prompts import RagContextRelevancePrompt
36+
from validator.main import MLcubeRagContextValidator
3937
from guardrails import Guard
4038

41-
# Setup Guard
39+
# Define the guard with the MLcubeRagContextValidator,
40+
# specifying the relevance prompt generator to enable
41+
# context relevance evaluation.
4242
guard = Guard().use(
43-
ValidatorTemplate
43+
MLcubeRagContextValidator(
44+
rag_context_eval_prompt=RagContextRelevancePrompt(),
45+
pass_threshold=1,
46+
model_name="gpt-4o-mini",
47+
on_fail="noop",
48+
on="prompt",
49+
)
4450
)
4551

46-
guard.validate("pass") # Validator passes
47-
guard.validate("fail") # Validator fails
48-
```
52+
# Sample metadata. Retrieved context is relevant to the user query.
53+
metadata = {
54+
"user_input": "What's the weather in Milan, today?",
55+
"retrieved_context": "Milan, what a beautiful day. Sunny and warm.",
56+
}
4957

50-
### Validating JSON output via Python
58+
# Make a call to the LLM with the guardrail in place.
59+
response = guard(
60+
llm_api=openai.chat.completions.create,
61+
prompt=metadata["user_input"],
62+
model="gpt-4o-mini",
63+
max_tokens=1024,
64+
temperature=0,
65+
metadata=metadata,
66+
)
5167

52-
In this example, we apply the validator to a string field of a JSON output generated by an LLM.
68+
# Assert that the validation passed since the context is relevant.
69+
assert response.validation_passed
5370

54-
```python
55-
# Import Guard and Validator
56-
from pydantic import BaseModel, Field
57-
from guardrails.hub import ValidatorTemplate
58-
from guardrails import Guard
71+
# We now change the retrieved context to be irrelevant to the user query.
72+
metadata["retrieved_context"] = "The capital of Italy is Rome."
73+
74+
response = guard(
75+
llm_api=openai.chat.completions.create,
76+
prompt=metadata["user_input"],
77+
model="gpt-4o-mini",
78+
max_tokens=1024,
79+
temperature=0,
80+
metadata=metadata,
81+
)
5982

60-
# Initialize Validator
61-
val = ValidatorTemplate()
83+
# We assert that the validation failed since the context is irrelevant.
84+
assert not response.validation_passed
85+
```
6286

63-
# Create Pydantic BaseModel
64-
class Process(BaseModel):
65-
process_name: str
66-
status: str = Field(validators=[val])
87+
In this example we evaluate the usefulness of the retrieved context to the user's query. This time we call the `parse` method of the guard directly.
6788

68-
# Create a Guard to check for valid Pydantic output
69-
guard = Guard.from_pydantic(output_class=Process)
89+
```python
7090

71-
# Run LLM output generating JSON through guard
72-
guard.parse("""
73-
{
74-
"process_name": "templating",
75-
"status": "pass"
91+
from validator.prompts.prompts import (
92+
RagContextRelevancePrompt,
93+
RagContextUsefulnessPrompt,
94+
)
95+
96+
guard = Guard().use(
97+
MLcubeRagContextValidator(
98+
rag_context_eval_prompt=RagContextUsefulnessPrompt(),
99+
pass_threshold=1,
100+
model_name="gpt-4o-mini",
101+
on_fail="noop",
102+
on="prompt",
103+
)
104+
)
105+
106+
# Sample metadata. Retrieved context is not useful to the user query
107+
# since it talks about a different city.
108+
metadata = {
109+
"user_input": "What's the weather in Milan, today?",
110+
"retrieved_context": "Roma, what a beautiful day. Sunny and warm.",
76111
}
77-
""")
112+
113+
resp = guard.parse(
114+
metadata["user_input"],
115+
metadata=metadata,
116+
)
117+
118+
# Assert that the validation failed since the context is not useful.
119+
assert not resp.validation_passed
78120
```
121+
## Benchmark
122+
123+
We benchmark the validator on a subset of the [WikiQA](https://www.microsoft.com/en-us/research/project/wikiqa/) dataset. You can find the benchmark script, dataset and a summary of the results in the `benchmark` folder.
79124

80125
# API Reference
81126

82-
**`__init__(self, on_fail="noop")`**
127+
**`__init__(self, rag_context_eval_prompt, pass_threshold, model_name, on_fail="noop", default_min=0, default_max=1, **kwargs)`**
128+
83129
<ul>
84-
Initializes a new instance of the ValidatorTemplate class.
130+
Initializes a new instance of the MLcubeRagContextValidator class for evaluating RAG context.
85131

86132
**Parameters**
87-
- **`arg_1`** *(str)*: A placeholder argument to demonstrate how to use init arguments.
88-
- **`arg_2`** *(str)*: Another placeholder argument to demonstrate how to use init arguments.
89-
- **`on_fail`** *(str, Callable)*: The policy to enact when a validator fails. If `str`, must be one of `reask`, `fix`, `filter`, `refrain`, `noop`, `exception` or `fix_reask`. Otherwise, must be a function that is called when the validator fails.
133+
134+
- **`rag_context_eval_prompt`** _(Ml3RagContextEvalBasePrompt)_: The prompt generator used to create evaluation prompts for the LLM judge.
135+
- **`pass_threshold `** _(str)_: The minimum rating score required for the validation to pass.
136+
- **`model_name`** _(str)_: The name of the LLM model to use for evaluation (es: `gpt-4o-mini`).
137+
- **`default_min`** _(int)_: The default minimum value for the rating range. Default is `0`.
138+
- **`default_max`** _(int)_: The default maximum value for the rating range. Default is `1`.
139+
- **`on_fail`** _(str, Callable)_: The policy to enact when a validator fails. If `str`, must be one of `reask`, `fix`, `filter`, `refrain`, `noop`, `exception` or `fix_reask`. Otherwise, must be a function that is called when the validator fails.
140+
- **`kwargs`** _(dict)_: Additional keyword arguments to pass to the base Validator class.
90141
</ul>
91142
<br/>
92143

93144
**`validate(self, value, metadata) -> ValidationResult`**
145+
94146
<ul>
95-
Validates the given `value` using the rules defined in this validator, relying on the `metadata` provided to customize the validation process. This method is automatically invoked by `guard.parse(...)`, ensuring the validation logic is applied to the input data.
147+
Validates the retrieved context with respect to the user query and the specified prompt generator The validator uses structured output to get a rating and explanation from the LLM, then compares the rating against the pass threshold.
96148

97149
Note:
98150

99151
1. This method should not be called directly by the user. Instead, invoke `guard.parse(...)` where this method will be called internally for each associated Validator.
100-
2. When invoking `guard.parse(...)`, ensure to pass the appropriate `metadata` dictionary that includes keys and values required by this validator. If `guard` is associated with multiple validators, combine all necessary metadata into a single dictionary.
152+
2. When invoking `guard.parse(...)`, ensure to pass the appropriate `metadata` dictionary that includes keys and values required by this validator (see below). If `guard` is associated with multiple validators, combine all necessary metadata into a single dictionary.
101153

102154
**Parameters**
103-
- **`value`** *(Any)*: The input value to validate.
104-
- **`metadata`** *(dict)*: A dictionary containing metadata required for validation. Keys and values must match the expectations of this validator.
105-
106-
107-
| Key | Type | Description | Default |
108-
| --- | --- | --- | --- |
109-
| `key1` | String | Description of key1's role. | N/A |
110-
</ul>
155+
156+
- **`value`** _(Any)_: The input value to validate.
157+
- **`metadata`** _(dict)_: A dictionary containing metadata required for validation. Keys and values must match the expectations of this validator.
158+
| Key | Type | Description | Default |
159+
| --- | --- | --- | --- |
160+
| `user_input` | String | The original user query passed into the RAG system. | N/A (Required) |
161+
| `retrieved_context` | String | The context retrieved and used by the RAG system. | N/A (Required) |
162+
| `min_range_value` | String | The minimum value for the rating range used by the LLM judge. | 0 (the default of the validator class) |
163+
| `max_range_value` | String | The maximum value for the rating range used by the LLM judge. | 1 (the default of the validator class) |
164+
</ul>
165+
166+
**Returns**
167+
168+
**`ValidationResult`**: Returns a `PassResult` if the LLM judge's rating meets or exceeds the pass threshold, or a `FailResult` with detailed explanation if the rating is below the threshold.

0 commit comments

Comments
 (0)