guardrails-ai
diff --git a/‎.env‎
Lines changed: 1 addition & 0 deletions b/‎.env‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎.env-template‎
Lines changed: 1 addition & 0 deletions b/‎.env-template‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 3 additions & 1 deletion b/‎.gitignore‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎Makefile‎
Lines changed: 5 additions & 2 deletions b/‎Makefile‎
Lines changed: 5 additions & 2 deletions
diff --git a/‎README.md‎
Lines changed: 123 additions & 65 deletions b/‎README.md‎
Lines changed: 123 additions & 65 deletions
@@ -0,0 +1 @@
+OPENAI_API_KEY=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
@@ -0,0 +1 @@
+OPENAI_API_KEY=<your_api_key>
@@ -6,4 +6,6 @@ build
 .pytest_cache
 .ruff_cache
 .vscode
-.idea
+.idea
+.env
+evaluation_dataset/
@@ -1,5 +1,8 @@
+
+# Use uv as dependency manager, pip should work anyway
 dev:
-	pip install -e ".[dev]"
+# 	pip install -e ".[dev]"
+	uv sync --all-extras --no-cache
 
 lint:
 	ruff check .
@@ -13,4 +16,4 @@ type:
 qa:
 	make lint
 	make type
-	make test
+	make test
@@ -1,110 +1,168 @@
 # Overview
 
-| Developed by | Guardrails AI |
-| --- | --- |
-| Date of development | Feb 15, 2024 |
-| Validator type | Format |
-| Blog |  |
-| License | Apache 2 |
-| Input/Output | Output |
+| Developed by        | ML cube |
+| ------------------- | ------------- |
+| Date of development | Sep 9, 2025  |
+| Validator type      | RAG Retrieved Context |
+| Blog                |               |
+| License             | Apache 2      |
+| Input/Output        | Rag Retrieval |
 
 ## Description
 
-### Intended Use
-This validator is a template for creating other validators, but for demonstrative purposes it ensures that a generated output is the literal `pass`.
-
-### Requirements
+This validator checks whether the retrieved context in a RAG (Retrieval-Augmented Generation) system relates to the user's query. It can be used in two ways:
+- `RAG Context Relevance`: Validates if the retrieved context is relevant to the user's query. Relevant means that the context is related to the question, even if it does not directly contain the answer.
+- `RAG Context Usefulness`: Validates if the retrieved context is useful for answering the user's query. Useful means that the context contains information that can help answer the question.
 
-* Dependencies:
-	- guardrails-ai>=0.4.0
+### Intended Use
 
-* Foundation model access keys:
-	- OPENAI_API_KEY
+It can be used in a RAG system to prevent the model from hallucinating or generating incorrect responses based on irrelevant context.
 
-## Installation
+### Requirements
 
-```bash
-$ guardrails hub install hub://guardrails/validator_template
-```
+- Dependencies:
+  - guardrails-ai >= 0.5.15
+  - langchain[openai] >= 0.3.27
 
-## Usage Examples
+- OpenAI Foundation model access keys:
+  - OPENAI_API_KEY
 
-### Validating string output via Python
+## Examples
 
-In this example, we apply the validator to a string output generated by an LLM.
+In this example we apply the RagContextValidator to validate the relevance of the retrieved context to the user's query.
 
 ```python
-# Import Guard and Validator
-from guardrails.hub import ValidatorTemplate
+from validator.prompts.prompts import RagContextRelevancePrompt
+from validator.main import MLcubeRagContextValidator
 from guardrails import Guard
 
-# Setup Guard
+# Define the guard with the MLcubeRagContextValidator,
+# specifying the relevance prompt generator to enable
+# context relevance evaluation.
 guard = Guard().use(
-    ValidatorTemplate
+    MLcubeRagContextValidator(
+        rag_context_eval_prompt=RagContextRelevancePrompt(),
+        pass_threshold=1,
+        model_name="gpt-4o-mini",
+        on_fail="noop",
+        on="prompt",
+    )
 )
 
-guard.validate("pass")  # Validator passes
-guard.validate("fail")  # Validator fails
-```
+# Sample metadata. Retrieved context is relevant to the user query.
+metadata = {
+    "user_input": "What's the weather in Milan, today?",
+    "retrieved_context": "Milan, what a beautiful day. Sunny and warm.",
+}
 
-### Validating JSON output via Python
+# Make a call to the LLM with the guardrail in place.
+response = guard(
+    llm_api=openai.chat.completions.create,
+    prompt=metadata["user_input"],
+    model="gpt-4o-mini",
+    max_tokens=1024,
+    temperature=0,
+    metadata=metadata,
+)
 
-In this example, we apply the validator to a string field of a JSON output generated by an LLM.
+# Assert that the validation passed since the context is relevant.
+assert response.validation_passed
 
-```python
-# Import Guard and Validator
-from pydantic import BaseModel, Field
-from guardrails.hub import ValidatorTemplate
-from guardrails import Guard
+# We now change the retrieved context to be irrelevant to the user query.
+metadata["retrieved_context"] = "The capital of Italy is Rome."
+
+response = guard(
+    llm_api=openai.chat.completions.create,
+    prompt=metadata["user_input"],
+    model="gpt-4o-mini",
+    max_tokens=1024,
+    temperature=0,
+    metadata=metadata,
+)
 
-# Initialize Validator
-val = ValidatorTemplate()
+# We assert that the validation failed since the context is irrelevant.
+assert not response.validation_passed
+```
 
-# Create Pydantic BaseModel
-class Process(BaseModel):
-		process_name: str
-		status: str = Field(validators=[val])
+In this example we evaluate the usefulness of the retrieved context to the user's query. This time we call the `parse` method of the guard directly.
 
-# Create a Guard to check for valid Pydantic output
-guard = Guard.from_pydantic(output_class=Process)
+```python
 
-# Run LLM output generating JSON through guard
-guard.parse("""
-{
-	"process_name": "templating",
-	"status": "pass"
+from validator.prompts.prompts import (
+    RagContextRelevancePrompt,
+    RagContextUsefulnessPrompt,
+)
+
+guard = Guard().use(
+    MLcubeRagContextValidator(
+        rag_context_eval_prompt=RagContextUsefulnessPrompt(),
+        pass_threshold=1,
+        model_name="gpt-4o-mini",
+        on_fail="noop",
+        on="prompt",
+    )
+)
+
+# Sample metadata. Retrieved context is not useful to the user query 
+# since it talks about a different city.
+metadata = {
+    "user_input": "What's the weather in Milan, today?",
+    "retrieved_context": "Roma, what a beautiful day. Sunny and warm.",
 }
-""")
+
+resp = guard.parse(
+    metadata["user_input"],
+    metadata=metadata,
+)
+
+# Assert that the validation failed since the context is not useful.
+assert not resp.validation_passed
 ```
+## Benchmark
+
+We benchmark the validator on a subset of the [WikiQA](https://www.microsoft.com/en-us/research/project/wikiqa/) dataset. You can find the benchmark script, dataset and a summary of the results in the `benchmark` folder.
 
 # API Reference
 
-**`__init__(self, on_fail="noop")`**
+**`__init__(self, rag_context_eval_prompt, pass_threshold, model_name, on_fail="noop", default_min=0, default_max=1, **kwargs)`**
+
 <ul>
-Initializes a new instance of the ValidatorTemplate class.
+Initializes a new instance of the MLcubeRagContextValidator class for evaluating RAG context.
 
 **Parameters**
-- **`arg_1`** *(str)*: A placeholder argument to demonstrate how to use init arguments.
-- **`arg_2`** *(str)*: Another placeholder argument to demonstrate how to use init arguments.
-- **`on_fail`** *(str, Callable)*: The policy to enact when a validator fails.  If `str`, must be one of `reask`, `fix`, `filter`, `refrain`, `noop`, `exception` or `fix_reask`. Otherwise, must be a function that is called when the validator fails.
+
+- **`rag_context_eval_prompt`** _(Ml3RagContextEvalBasePrompt)_: The prompt generator used to create evaluation prompts for the LLM judge.
+- **`pass_threshold `** _(str)_: The minimum rating score required for the validation to pass.
+- **`model_name`** _(str)_: The name of the LLM model to use for evaluation (es: `gpt-4o-mini`).
+- **`default_min`** _(int)_: The default minimum value for the rating range. Default is `0`.
+- **`default_max`** _(int)_: The default maximum value for the rating range. Default is `1`.
+- **`on_fail`** _(str, Callable)_: The policy to enact when a validator fails. If `str`, must be one of `reask`, `fix`, `filter`, `refrain`, `noop`, `exception` or `fix_reask`. Otherwise, must be a function that is called when the validator fails.
+- **`kwargs`** _(dict)_: Additional keyword arguments to pass to the base Validator class.
 </ul>
 <br/>
 
 **`validate(self, value, metadata) -> ValidationResult`**
+
 <ul>
-Validates the given `value` using the rules defined in this validator, relying on the `metadata` provided to customize the validation process. This method is automatically invoked by `guard.parse(...)`, ensuring the validation logic is applied to the input data.
+Validates the retrieved context with respect to the user query and the specified prompt generator The validator uses structured output to get a rating and explanation from the LLM, then compares the rating against the pass threshold.
 
 Note:
 
 1. This method should not be called directly by the user. Instead, invoke `guard.parse(...)` where this method will be called internally for each associated Validator.
-2. When invoking `guard.parse(...)`, ensure to pass the appropriate `metadata` dictionary that includes keys and values required by this validator. If `guard` is associated with multiple validators, combine all necessary metadata into a single dictionary.
+2. When invoking `guard.parse(...)`, ensure to pass the appropriate `metadata` dictionary that includes keys and values required by this validator (see below). If `guard` is associated with multiple validators, combine all necessary metadata into a single dictionary.
 
 **Parameters**
-- **`value`** *(Any)*: The input value to validate.
-- **`metadata`** *(dict)*: A dictionary containing metadata required for validation. Keys and values must match the expectations of this validator.
-    
-    
-    | Key | Type | Description | Default |
-    | --- | --- | --- | --- |
-    | `key1` | String | Description of key1's role. | N/A |
-</ul>
+
+- **`value`** _(Any)_: The input value to validate.
+- **`metadata`** _(dict)_: A dictionary containing metadata required for validation. Keys and values must match the expectations of this validator.
+  | Key | Type | Description | Default |
+  | --- | --- | --- | --- |
+  | `user_input` | String | The original user query passed into the RAG system. | N/A (Required) |
+  | `retrieved_context` | String | The context retrieved and used by the RAG system. | N/A (Required) |
+  | `min_range_value` | String | The minimum value for the rating range used by the LLM judge. | 0 (the default of the validator class) |
+  | `max_range_value` | String | The maximum value for the rating range used by the LLM judge. | 1 (the default of the validator class) |
+  </ul>
+
+**Returns**
+
+**`ValidationResult`**: Returns a `PassResult` if the LLM judge's rating meets or exceeds the pass threshold, or a `FailResult` with detailed explanation if the rating is below the threshold.
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+OPENAI_API_KEY=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`