diff --git a/nemoguardrails/benchmark/Procfile b/nemoguardrails/benchmark/Procfile
new file mode 100644
index 000000000..f177f52be
--- /dev/null
+++ b/nemoguardrails/benchmark/Procfile
@@ -0,0 +1,8 @@
+# Procfile
+
+# NeMo Guardrails server
+gr: poetry run nemoguardrails server --config configs/guardrail_configs --default-config-id content_safety_colang1 --port 9000
+
+# Guardrails NIMs for inference
+app_llm: poetry run python mock_llm_server/run_server.py --workers 4 --port 8000 --config-file configs/mock_configs/meta-llama-3.3-70b-instruct.env
+cs_llm: poetry run python mock_llm_server/run_server.py --workers 4 --port 8001 --config-file configs/mock_configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env
diff --git a/nemoguardrails/benchmark/README.md b/nemoguardrails/benchmark/README.md
new file mode 100644
index 000000000..914d1b47c
--- /dev/null
+++ b/nemoguardrails/benchmark/README.md
@@ -0,0 +1,165 @@
+# Guardrails Benchmarking
+
+NeMo Guardrails includes benchmarking tools to help users capacity-test their Guardrails applications.
+Adding guardrails to an LLM-based application improves safety and security, while adding some latency. These benchmarks allow users to quantify the tradeoff between security and latency, to make data-driven decisions.
+We currently have a simple testbench, which runs the Guardrails server with mocks as Guardrail and Application models. This can be used for performance-testing on a laptop without any GPUs, and run in a few minutes.
+
+-----
+
+## Guardrails Core Benchmarking
+
+This benchmark measures the performance of the Guardrails application, running on CPU-only laptop or instance.
+It doesn't require GPUs on which to run local models, or access to the internet to use models hosted by providers.
+All models use the [Mock LLM Server](mock_llm_server), which is a simplified model of an LLM used for inference.
+The aim of this benchmark is to detect performance-regressions as quickly as running unit-tests.
+
+## Quickstart: Running Guardrails with Mock LLMs
+To run Guardrails with mocks for both the content-safety and main LLM, follow the steps below.
+All commands must be run in the `nemoguardrails/benchmark` directory.
+These assume you already have a working environment after following the steps in [CONTRIBUTING.md](../../CONTRIBUTING.md).
+
+First, we need to install the honcho and langchain-nvidia-ai-endpoints packages.
+The `honcho` package is used to run Procfile-based applications, and is a Python port of [Foreman](https://github.com/ddollar/foreman).
+The `langchain-nvidia-ai-endpoints` package is used to communicate with Mock LLMs via Langchain.
+
+```shell
+# Install dependencies
+$ poetry run pip install honcho langchain-nvidia-ai-endpoints
+...
+Successfully installed filetype-1.2.0 honcho-2.0.0 langchain-nvidia-ai-endpoints-0.3.19
+```
+
+Now we can start up the processes that are part of the [Procfile](Procfile).
+As the Procfile processes spin up, they log to the console with a prefix. The `system` prefix is used by Honcho, `app_llm` is the Application or Main LLM mock, `cs_llm` is the content-safety mock, and `gr` is the Guardrails service. We'll explore the Procfile in more detail below.
+Once the three 'Uvicorn running on ...' messages are printed, you can move to the next step. Note these messages are likely not on consecutive lines.
+
+```
+# All commands must be run in the nemoguardrails/benchmark directory
+$ cd nemoguardrails/benchmark
+$ poetry run honcho start
+13:40:33 system    | gr.1 started (pid=93634)
+13:40:33 system    | app_llm.1 started (pid=93635)
+13:40:33 system    | cs_llm.1 started (pid=93636)
+...
+13:40:41 app_llm.1 | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
+...
+13:40:41 cs_llm.1  | INFO:     Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)
+...
+13:40:45 gr.1      | INFO:     Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)
+```
+
+Once Guardrails and the mock servers are up, we can use the `validate_mocks.py` script to check they're healthy and serving the correct models.
+
+```shell
+$ cd nemoguardrails/benchmark
+$ poetry run python validate_mocks.py
+Starting LLM endpoint health check...
+
+--- Checking Port: 8000 ---
+Checking http://localhost:8000/health ...
+HTTP Request: GET http://localhost:8000/health "HTTP/1.1 200 OK"
+Health Check PASSED: Status is 'healthy'.
+Checking http://localhost:8000/v1/models for 'meta/llama-3.3-70b-instruct'...
+HTTP Request: GET http://localhost:8000/v1/models "HTTP/1.1 200 OK"
+Model Check PASSED: Found 'meta/llama-3.3-70b-instruct' in model list.
+--- Port 8000: ALL CHECKS PASSED ---
+
+--- Checking Port: 8001 ---
+Checking http://localhost:8001/health ...
+HTTP Request: GET http://localhost:8001/health "HTTP/1.1 200 OK"
+Health Check PASSED: Status is 'healthy'.
+Checking http://localhost:8001/v1/models for 'nvidia/llama-3.1-nemoguard-8b-content-safety'...
+HTTP Request: GET http://localhost:8001/v1/models "HTTP/1.1 200 OK"
+Model Check PASSED: Found 'nvidia/llama-3.1-nemoguard-8b-content-safety' in model list.
+--- Port 8001: ALL CHECKS PASSED ---
+
+--- Checking Port: 9000 (Rails Config) ---
+Checking http://localhost:9000/v1/rails/configs ...
+HTTP Request: GET http://localhost:9000/v1/rails/configs "HTTP/1.1 200 OK"
+HTTP Status PASSED: Got 200.
+Body Check PASSED: Response is an array with at least one entry.
+--- Port 9000: ALL CHECKS PASSED ---
+
+--- Final Summary ---
+Port 8000 (meta/llama-3.3-70b-instruct): PASSED
+Port 8001 (nvidia/llama-3.1-nemoguard-8b-content-safety): PASSED
+Port 9000 (Rails Config): PASSED
+---------------------
+Overall Status: All endpoints are healthy!
+```
+
+Once the mocks and Guardrails are running and the script passes, we can issue curl requests against the Guardrails `/chat/completions` endpoint to generate a response and test the system end-to-end.
+
+```shell
+curl -s -X POST http://0.0.0.0:9000/v1/chat/completions \
+   -H 'Accept: application/json' \
+   -H 'Content-Type: application/json' \
+   -d '{
+      "model": "meta/llama-3.3-70b-instruct",
+      "messages": [
+         {
+            "role": "user",
+            "content": "what can you do for me?"
+         }
+      ],
+      "stream": false
+    }' | jq
+{
+  "messages": [
+    {
+      "role": "assistant",
+      "content": "I can provide information and help with a wide range of topics, from science and history to entertainment and culture. I can also help with language-related tasks, such as translation and text summarization. However, I can't assist with requests that involve harm or illegal activities."
+    }
+  ]
+}
+
+```
+
+------
+
+## Deep-Dive: Configuration
+
+In this section, we'll examine the configuration files used in the quickstart above. This gives more context on how the system works, and can be extended as needed.
+
+### Procfile
+
+The [Procfile](Procfile?raw=true) contains all the processes that make up the application.
+The Honcho package reads in this file, starts all the processes, and combines their logs to the console
+The `gr` line runs the Guardrails server on port 9000 and sets the default Guardrails configuration as [content_safety_colang1](configs/guardrail_configs/content_safety_colang1?raw=true).
+The `app_llm` line runs the Application or Main Mock LLM. Guardrails calls this LLM to generate a response to the user's query. This server uses 4 uvicorn workers and runs on port 8000. The configuration file here is a Mock LLM configuration, not a Guardrails configuration.
+The `cs_llm` line runs the Content-Safety Mock LLM. This uses 4 uvicorn workers and runs on port 8001.
+
+### Guardrails Configuration
+The [Guardrails Configuration](configs/guardrail_configs/content_safety_colang1/config.yml) is used by the Guardrails server.
+Under the `models` section, the `main` model is used to generate responses to the user queries. The base URL for this model is the `app_llm` Mock LLM from the Procfile, running on port 8000. The `model` field has to match the Mock LLM model name.
+The `content_safety` model is configured for use in an input and output rail. The `type` field matches the `$model` used in the input and output flows.
+
+### Mock LLM Endpoints
+The Mock LLM implements a subset of the OpenAI LLM API.
+There are two Mock LLM configurations, one for the Mock [main model](configs/mock_configs/meta-llama-3.3-70b-instruct.env), and another for the Mock [content-safety](configs/mock_configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env) model.
+The Mock LLM has the following OpenAI-compatible endpoints:
+
+* `/health`: Returns a JSON object with status set to healthy and timestamp in seconds-since-epoch. For example `{"status":"healthy","timestamp":1762781239}`
+* `/v1/models`: Returns the `MODEL` field from the Mock configuration (see below). For example `{"object":"list","data":[{"id":"meta/llama-3.3-70b-instruct","object":"model","created":1762781290,"owned_by":"system"}]}`
+* `/v1/completions`: Returns an [OpenAI completion object](https://platform.openai.com/docs/api-reference/completions/object) using the Mock configuration (see below).
+* `/v1/chat/completions`: Returns an [OpenAI chat completion object](https://platform.openai.com/docs/api-reference/chat/object) using the Mock configuration (see below).
+
+### Mock LLM Configuration
+Mock LLMs are configured using the `.env` file format. These files are passed to the Mock LLM using the `--config-file` argument.
+The Mock LLMs return either a `SAFE_TEXT` or `UNSAFE_TEXT` response to `/v1/completions` or `/v1/chat/completions` inference requests.
+The probability of the `UNSAFE_TEXT` being returned if given by `UNSAFE_PROBABILITY`.
+The latency of each response is also controllable, and works as follows:
+
+* Latency is first sampled from a normal distribution with mean `LATENCY_MEAN_SECONDS` and standard deviation `LATENCY_STD_SECONDS`.
+* If the sampled value is less than `LATENCY_MIN_SECONDS`, it is set to `LATENCY_MIN_SECONDS`.
+* If the sampled value is less than `LATENCY_MAX_SECONDS`, it is set to `LATENCY_MAX_SECONDS`.
+
+The full list of configuration fields is shown below:
+* `MODEL`: The Model name served by the Mock LLM. This will be returned on the `/v1/models` endpoint.
+* `UNSAFE_PROBABILITY`: Probability of an unsafe response. This is a probability, and must be in the range [0, 1].
+* `UNSAFE_TEXT`: String returned as an unsafe response.
+* `SAFE_TEXT`: String returned as a safe response.
+* `LATENCY_MIN_SECONDS`: Minimum latency in seconds.
+* `LATENCY_MAX_SECONDS`: Maximum latency in seconds.
+* `LATENCY_MEAN_SECONDS`: Normal distribution mean from which to sample latency.
+* `LATENCY_STD_SECONDS`: Normal distribution standard deviation from which to sample latency.
diff --git a/nemoguardrails/benchmark/mock_llm_server/configs/guardrail_configs/content_safety_colang1/config.yml b/nemoguardrails/benchmark/configs/guardrail_configs/content_safety_colang1/config.yml
similarity index 100%
rename from nemoguardrails/benchmark/mock_llm_server/configs/guardrail_configs/content_safety_colang1/config.yml
rename to nemoguardrails/benchmark/configs/guardrail_configs/content_safety_colang1/config.yml
diff --git a/nemoguardrails/benchmark/mock_llm_server/configs/guardrail_configs/content_safety_colang1/prompts.yml b/nemoguardrails/benchmark/configs/guardrail_configs/content_safety_colang1/prompts.yml
similarity index 100%
rename from nemoguardrails/benchmark/mock_llm_server/configs/guardrail_configs/content_safety_colang1/prompts.yml
rename to nemoguardrails/benchmark/configs/guardrail_configs/content_safety_colang1/prompts.yml
diff --git a/nemoguardrails/benchmark/mock_llm_server/configs/meta-llama-3.3-70b-instruct.env b/nemoguardrails/benchmark/configs/mock_configs/meta-llama-3.3-70b-instruct.env
similarity index 100%
rename from nemoguardrails/benchmark/mock_llm_server/configs/meta-llama-3.3-70b-instruct.env
rename to nemoguardrails/benchmark/configs/mock_configs/meta-llama-3.3-70b-instruct.env
diff --git a/nemoguardrails/benchmark/mock_llm_server/configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env b/nemoguardrails/benchmark/configs/mock_configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env
similarity index 100%
rename from nemoguardrails/benchmark/mock_llm_server/configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env
rename to nemoguardrails/benchmark/configs/mock_configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env
diff --git a/nemoguardrails/benchmark/mock_llm_server/run_server.py b/nemoguardrails/benchmark/mock_llm_server/run_server.py
index 14e0be02f..eae8bc032 100644
--- a/nemoguardrails/benchmark/mock_llm_server/run_server.py
+++ b/nemoguardrails/benchmark/mock_llm_server/run_server.py
@@ -71,7 +71,12 @@ def parse_arguments():
     parser.add_argument(
         "--config-file", help=".env file to configure model", required=True
     )
-
+    parser.add_argument(
+        "--workers",
+        type=int,
+        default=1,
+        help="Number of uvicorn worker processes (default: 1)",
+    )
     return parser.parse_args()
 
 
@@ -104,12 +109,13 @@ def main():  # pragma: no cover
 
     try:
         uvicorn.run(
-            "api:app",
+            "nemoguardrails.benchmark.mock_llm_server.api:app",
             host=args.host,
             port=args.port,
             reload=args.reload,
             log_level=args.log_level,
             env_file=config_file,
+            workers=args.workers,
         )
     except KeyboardInterrupt:
         log.info("\nServer stopped by user")
diff --git a/nemoguardrails/benchmark/validate_mocks.py b/nemoguardrails/benchmark/validate_mocks.py
new file mode 100644
index 000000000..795f1c671
--- /dev/null
+++ b/nemoguardrails/benchmark/validate_mocks.py
@@ -0,0 +1,227 @@
+#!/usr/bin/env python3
+
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+A script to check the health and model IDs of local OpenAI-compatible endpoints.
+Requires the 'httpx' library: pip install httpx
+"""
+
+import json
+import logging
+import sys
+
+import httpx
+
+# --- Logging Setup ---
+# Configure basic logging to print info-level messages
+logging.basicConfig(level=logging.INFO, format="%(message)s")
+
+
+def check_endpoint(port: int, expected_model: str):
+    """
+    Checks the /health and /v1/models endpoints for a standard
+    OpenAI-compatible server.
+    Returns a tuple: (bool success, str summary)
+    """
+    base_url = f"http://localhost:{port}"
+    all_ok = True
+
+    logging.info("\n--- Checking Port: %s ---", port)
+
+    # --- 1. Health Check ---
+    health_url = f"{base_url}/health"
+    logging.info("Checking %s ...", health_url)
+    try:
+        response = httpx.get(health_url, timeout=3)
+
+        if response.status_code != 200:
+            logging.error("Health Check FAILED: Status code %s", response.status_code)
+            all_ok = False
+        else:
+            try:
+                data = response.json()
+                status = data.get("status")
+                if status == "healthy":
+                    logging.info("Health Check PASSED: Status is 'healthy'.")
+                else:
+                    logging.warning(
+                        "Health Check FAILED: Expected 'healthy', got '%s'.", status
+                    )
+                    all_ok = False
+            except json.JSONDecodeError:
+                logging.error("Health Check FAILED: Could not decode JSON response.")
+                all_ok = False
+
+    except httpx.ConnectError:
+        logging.error("Health Check FAILED: No response from server on port %s.", port)
+        logging.error("--- Port %s: CHECKS FAILED ---", port)
+        return False, "Port %s (%s): FAILED (Connection Error)" % (port, expected_model)
+    except httpx.TimeoutException:
+        logging.error("Health Check FAILED: Connection timed out for port %s.", port)
+        logging.error("--- Port %s: CHECKS FAILED ---", port)
+        return False, "Port %s (%s): FAILED (Connection Timeout)" % (
+            port,
+            expected_model,
+        )
+
+    # --- 2. Model Check ---
+    models_url = f"{base_url}/v1/models"
+    logging.info("Checking %s for '%s'...", models_url, expected_model)
+    try:
+        response = httpx.get(models_url, timeout=3)
+
+        if response.status_code != 200:
+            logging.error("Model Check FAILED: Status code %s", response.status_code)
+            all_ok = False
+        else:
+            try:
+                data = response.json()
+                models = data.get("data", [])
+                model_ids = [model.get("id") for model in models]
+
+                if expected_model in model_ids:
+                    logging.info(
+                        "Model Check PASSED: Found '%s' in model list.", expected_model
+                    )
+                else:
+                    logging.warning(
+                        "Model Check FAILED: Expected '%s', but it was NOT found.",
+                        expected_model,
+                    )
+                    logging.warning("Available models:")
+                    for model_id in model_ids:
+                        logging.warning("  - %s", model_id)
+                    all_ok = False
+            except json.JSONDecodeError:
+                logging.error("Model Check FAILED: Could not decode JSON response.")
+                all_ok = False
+            except AttributeError:
+                logging.error(
+                    "Model Check FAILED: Unexpected JSON structure in response from %s.",
+                    models_url,
+                )
+                all_ok = False
+
+    except httpx.ConnectError:
+        logging.error("Model Check FAILED: No response from server on port %s.", port)
+        all_ok = False
+    except httpx.TimeoutException:
+        logging.error("Model Check FAILED: Connection timed out for port %s.", port)
+        all_ok = False
+
+    # --- Final Status ---
+    if all_ok:
+        logging.info("--- Port %s: ALL CHECKS PASSED ---", port)
+        return True, "Port %s (%s): PASSED" % (port, expected_model)
+    else:
+        logging.error("--- Port %s: CHECKS FAILED ---", port)
+        return False, "Port %s (%s): FAILED" % (port, expected_model)
+
+
+def check_rails_endpoint(port: int):
+    """
+    Checks the /v1/rails/configs endpoint for a specific 200 status
+    and a non-empty list response.
+    Returns a tuple: (bool success, str summary)
+    """
+    base_url = f"http://localhost:{port}"
+    endpoint = f"{base_url}/v1/rails/configs"
+    all_ok = True
+
+    logging.info("\n--- Checking Port: %s (Rails Config) ---", port)
+    logging.info("Checking %s ...", endpoint)
+
+    try:
+        response = httpx.get(endpoint, timeout=3)
+
+        # --- 1. HTTP Status Check ---
+        if response.status_code == 200:
+            logging.info("HTTP Status PASSED: Got %s.", response.status_code)
+        else:
+            logging.warning(
+                "HTTP Status FAILED: Expected 200, got '%s'.", response.status_code
+            )
+            all_ok = False
+
+        # --- 2. Body Content Check ---
+        try:
+            data = response.json()
+            if isinstance(data, list) and len(data) > 0:
+                logging.info(
+                    "Body Check PASSED: Response is an array with at least one entry."
+                )
+            else:
+                logging.warning(
+                    "Body Check FAILED: Response is not an array or is empty."
+                )
+                logging.debug(
+                    "Response body (first 200 chars): %s", str(response.text)[:200]
+                )
+                all_ok = False
+        except json.JSONDecodeError:
+            logging.error("Body Check FAILED: Could not decode JSON response.")
+            logging.debug(
+                "Response body (first 200 chars): %s", str(response.text)[:200]
+            )
+            all_ok = False
+
+    except httpx.ConnectError:
+        logging.error("Rails Check FAILED: No response from server on port %s.", port)
+        all_ok = False
+    except httpx.TimeoutException:
+        logging.error("Rails Check FAILED: Connection timed out for port %s.", port)
+        all_ok = False
+
+    # --- Final Status ---
+    if all_ok:
+        logging.info("--- Port %s: ALL CHECKS PASSED ---", port)
+        return True, "Port %s (Rails Config): PASSED" % port
+    else:
+        logging.error("--- Port %s: CHECKS FAILED ---", port)
+        return False, "Port %s (Rails Config): FAILED" % port
+
+
+def main():
+    """Run all health checks."""
+    logging.info("Starting LLM endpoint health check...")
+
+    check_results = [
+        check_endpoint(8000, "meta/llama-3.3-70b-instruct"),
+        check_endpoint(8001, "nvidia/llama-3.1-nemoguard-8b-content-safety"),
+        check_rails_endpoint(9000),
+    ]
+
+    logging.info("\n--- Final Summary ---")
+
+    all_passed = True
+    for success, summary in check_results:
+        logging.info(summary)
+        if not success:
+            all_passed = False
+
+    logging.info("---------------------")
+
+    if all_passed:
+        logging.info("Overall Status: All endpoints are healthy!")
+        sys.exit(0)
+    else:
+        logging.error("Overall Status: One or more checks FAILED.")
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()  # pragma: no cover
diff --git a/tests/benchmark/test_validate_mocks.py b/tests/benchmark/test_validate_mocks.py
new file mode 100644
index 000000000..d8a86c1fa
--- /dev/null
+++ b/tests/benchmark/test_validate_mocks.py
@@ -0,0 +1,434 @@
+#!/usr/bin/env python3
+
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Tests for validate_mocks.py script.
+"""
+
+import json
+from unittest.mock import MagicMock, patch
+
+import httpx
+import pytest
+
+from nemoguardrails.benchmark.validate_mocks import (
+    check_endpoint,
+    check_rails_endpoint,
+    main,
+)
+
+
+class TestCheckEndpoint:
+    """Tests for check_endpoint function."""
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_endpoint_success(self, mock_get):
+        """Test successful health and model checks."""
+        # Mock health check response
+        health_response = MagicMock()
+        health_response.status_code = 200
+        health_response.json.return_value = {"status": "healthy"}
+
+        # Mock models check response
+        models_response = MagicMock()
+        models_response.status_code = 200
+        models_response.json.return_value = {
+            "data": [
+                {"id": "meta/llama-3.3-70b-instruct"},
+                {"id": "other-model"},
+            ]
+        }
+
+        mock_get.side_effect = [health_response, models_response]
+
+        success, summary = check_endpoint(8000, "meta/llama-3.3-70b-instruct")
+
+        assert success
+        assert "PASSED" in summary
+        assert "8000" in summary
+        assert mock_get.call_count == 2
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_endpoint_health_check_failed_status(self, mock_get):
+        """Test health check with non-200 status code."""
+        health_response = MagicMock()
+        health_response.status_code = 404
+
+        mock_get.return_value = health_response
+
+        success, summary = check_endpoint(8000, "test-model")
+
+        assert not success
+        assert "FAILED" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_endpoint_health_check_unhealthy_status(self, mock_get):
+        """Test health check with unhealthy status."""
+        health_response = MagicMock()
+        health_response.status_code = 200
+        health_response.json.return_value = {"status": "unhealthy"}
+
+        models_response = MagicMock()
+        models_response.status_code = 200
+        models_response.json.return_value = {"data": [{"id": "test-model"}]}
+
+        mock_get.side_effect = [health_response, models_response]
+
+        success, summary = check_endpoint(8000, "test-model")
+
+        assert not success
+        assert "FAILED" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_endpoint_health_check_json_decode_error(self, mock_get):
+        """Test health check with invalid JSON."""
+        health_response = MagicMock()
+        health_response.status_code = 200
+        health_response.json.side_effect = json.JSONDecodeError(
+            "Expecting value", "", 0
+        )
+
+        mock_get.return_value = health_response
+
+        success, summary = check_endpoint(8000, "test-model")
+
+        assert not success
+        assert "FAILED" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_endpoint_health_connection_error(self, mock_get):
+        """Test health check with connection error."""
+        mock_get.side_effect = httpx.ConnectError("Connection failed")
+
+        success, summary = check_endpoint(8000, "test-model")
+
+        assert not success
+        assert "FAILED" in summary
+        assert "Connection Error" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_endpoint_health_timeout(self, mock_get):
+        """Test health check with timeout."""
+        mock_get.side_effect = httpx.TimeoutException("Request timed out")
+
+        success, summary = check_endpoint(8000, "test-model")
+
+        assert not success
+        assert "FAILED" in summary
+        assert "Connection Timeout" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_endpoint_model_check_failed_status(self, mock_get):
+        """Test model check with non-200 status code."""
+        health_response = MagicMock()
+        health_response.status_code = 200
+        health_response.json.return_value = {"status": "healthy"}
+
+        models_response = MagicMock()
+        models_response.status_code = 404
+
+        mock_get.side_effect = [health_response, models_response]
+
+        success, summary = check_endpoint(8000, "test-model")
+
+        assert not success
+        assert "FAILED" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_endpoint_model_not_found(self, mock_get):
+        """Test model check when expected model is not in the list."""
+        health_response = MagicMock()
+        health_response.status_code = 200
+        health_response.json.return_value = {"status": "healthy"}
+
+        models_response = MagicMock()
+        models_response.status_code = 200
+        models_response.json.return_value = {
+            "data": [
+                {"id": "other-model-1"},
+                {"id": "other-model-2"},
+            ]
+        }
+
+        mock_get.side_effect = [health_response, models_response]
+
+        success, summary = check_endpoint(8000, "test-model")
+
+        assert not success
+        assert "FAILED" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_endpoint_model_check_json_decode_error(self, mock_get):
+        """Test model check with invalid JSON."""
+        health_response = MagicMock()
+        health_response.status_code = 200
+        health_response.json.return_value = {"status": "healthy"}
+
+        models_response = MagicMock()
+        models_response.status_code = 200
+        models_response.json.side_effect = json.JSONDecodeError(
+            "Expecting value", "", 0
+        )
+
+        mock_get.side_effect = [health_response, models_response]
+
+        success, summary = check_endpoint(8000, "test-model")
+
+        assert not success
+        assert "FAILED" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_endpoint_model_check_unexpected_json_structure(self, mock_get):
+        """Test model check with unexpected JSON structure."""
+        health_response = MagicMock()
+        health_response.status_code = 200
+        health_response.json.return_value = {"status": "healthy"}
+
+        models_response = MagicMock()
+        models_response.status_code = 200
+        # Return invalid structure that will cause AttributeError
+        models_response.json.return_value = "invalid"
+
+        mock_get.side_effect = [health_response, models_response]
+
+        success, summary = check_endpoint(8000, "test-model")
+
+        assert not success
+        assert "FAILED" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_endpoint_model_check_connection_error(self, mock_get):
+        """Test model check with connection error."""
+        health_response = MagicMock()
+        health_response.status_code = 200
+        health_response.json.return_value = {"status": "healthy"}
+
+        mock_get.side_effect = [
+            health_response,
+            httpx.ConnectError("Connection failed"),
+        ]
+
+        success, summary = check_endpoint(8000, "test-model")
+
+        assert not success
+        assert "FAILED" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_endpoint_model_check_timeout(self, mock_get):
+        """Test model check with timeout."""
+        health_response = MagicMock()
+        health_response.status_code = 200
+        health_response.json.return_value = {"status": "healthy"}
+
+        mock_get.side_effect = [
+            health_response,
+            httpx.TimeoutException("Request timed out"),
+        ]
+
+        success, summary = check_endpoint(8000, "test-model")
+
+        assert not success
+        assert "FAILED" in summary
+
+
+class TestCheckRailsEndpoint:
+    """Tests for check_rails_endpoint function."""
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_rails_endpoint_success(self, mock_get):
+        """Test successful rails config check."""
+        response = MagicMock()
+        response.status_code = 200
+        response.json.return_value = [
+            {"id": "config1", "name": "Config 1"},
+            {"id": "config2", "name": "Config 2"},
+        ]
+
+        mock_get.return_value = response
+
+        success, summary = check_rails_endpoint(9000)
+
+        assert success
+        assert "PASSED" in summary
+        assert "9000" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_rails_endpoint_non_200_status(self, mock_get):
+        """Test rails config check with non-200 status."""
+        response = MagicMock()
+        response.status_code = 404
+        response.json.return_value = []
+
+        mock_get.return_value = response
+
+        success, summary = check_rails_endpoint(9000)
+
+        assert not success
+        assert "FAILED" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_rails_endpoint_empty_list(self, mock_get):
+        """Test rails config check with empty list response."""
+        response = MagicMock()
+        response.status_code = 200
+        response.json.return_value = []
+
+        mock_get.return_value = response
+
+        success, summary = check_rails_endpoint(9000)
+
+        assert not success
+        assert "FAILED" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_rails_endpoint_not_a_list(self, mock_get):
+        """Test rails config check with non-list response."""
+        response = MagicMock()
+        response.status_code = 200
+        response.json.return_value = {"error": "invalid"}
+
+        mock_get.return_value = response
+
+        success, summary = check_rails_endpoint(9000)
+
+        assert not success
+        assert "FAILED" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_rails_endpoint_json_decode_error(self, mock_get):
+        """Test rails config check with invalid JSON."""
+        response = MagicMock()
+        response.status_code = 200
+        response.text = "invalid json"
+        response.json.side_effect = json.JSONDecodeError("Expecting value", "", 0)
+
+        mock_get.return_value = response
+
+        success, summary = check_rails_endpoint(9000)
+
+        assert not success
+        assert "FAILED" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_rails_endpoint_connection_error(self, mock_get):
+        """Test rails config check with connection error."""
+        mock_get.side_effect = httpx.ConnectError("Connection failed")
+
+        success, summary = check_rails_endpoint(9000)
+
+        assert not success
+        assert "FAILED" in summary
+
+    @patch("nemoguardrails.benchmark.validate_mocks.httpx.get")
+    def test_check_rails_endpoint_timeout(self, mock_get):
+        """Test rails config check with timeout."""
+        mock_get.side_effect = httpx.TimeoutException("Request timed out")
+
+        success, summary = check_rails_endpoint(9000)
+
+        assert not success
+        assert "FAILED" in summary
+
+
+class TestMain:
+    """Tests for main function."""
+
+    @patch("nemoguardrails.benchmark.validate_mocks.check_rails_endpoint")
+    @patch("nemoguardrails.benchmark.validate_mocks.check_endpoint")
+    def test_main_all_passed(self, mock_check_endpoint, mock_check_rails_endpoint):
+        """Test main function when all checks pass."""
+        mock_check_endpoint.side_effect = [
+            (True, "Port 8000 (meta/llama-3.3-70b-instruct): PASSED"),
+            (
+                True,
+                "Port 8001 (nvidia/llama-3.1-nemoguard-8b-content-safety): PASSED",
+            ),
+        ]
+        mock_check_rails_endpoint.return_value = (
+            True,
+            "Port 9000 (Rails Config): PASSED",
+        )
+
+        with pytest.raises(SystemExit) as exc_info:
+            main()
+
+        assert exc_info.value.code == 0
+        assert mock_check_endpoint.call_count == 2
+        assert mock_check_rails_endpoint.call_count == 1
+
+    @patch("nemoguardrails.benchmark.validate_mocks.check_rails_endpoint")
+    @patch("nemoguardrails.benchmark.validate_mocks.check_endpoint")
+    def test_main_one_failed(self, mock_check_endpoint, mock_check_rails_endpoint):
+        """Test main function when one check fails."""
+        mock_check_endpoint.side_effect = [
+            (False, "Port 8000 (meta/llama-3.3-70b-instruct): FAILED"),
+            (
+                True,
+                "Port 8001 (nvidia/llama-3.1-nemoguard-8b-content-safety): PASSED",
+            ),
+        ]
+        mock_check_rails_endpoint.return_value = (
+            True,
+            "Port 9000 (Rails Config): PASSED",
+        )
+
+        with pytest.raises(SystemExit) as exc_info:
+            main()
+
+        assert exc_info.value.code == 1
+
+    @patch("nemoguardrails.benchmark.validate_mocks.check_rails_endpoint")
+    @patch("nemoguardrails.benchmark.validate_mocks.check_endpoint")
+    def test_main_all_failed(self, mock_check_endpoint, mock_check_rails_endpoint):
+        """Test main function when all checks fail."""
+        mock_check_endpoint.side_effect = [
+            (False, "Port 8000 (meta/llama-3.3-70b-instruct): FAILED"),
+            (
+                False,
+                "Port 8001 (nvidia/llama-3.1-nemoguard-8b-content-safety): FAILED",
+            ),
+        ]
+        mock_check_rails_endpoint.return_value = (
+            False,
+            "Port 9000 (Rails Config): FAILED",
+        )
+
+        with pytest.raises(SystemExit) as exc_info:
+            main()
+
+        assert exc_info.value.code == 1
+
+    @patch("nemoguardrails.benchmark.validate_mocks.check_rails_endpoint")
+    @patch("nemoguardrails.benchmark.validate_mocks.check_endpoint")
+    def test_main_rails_failed(self, mock_check_endpoint, mock_check_rails_endpoint):
+        """Test main function when only rails check fails."""
+        mock_check_endpoint.side_effect = [
+            (True, "Port 8000 (meta/llama-3.3-70b-instruct): PASSED"),
+            (
+                True,
+                "Port 8001 (nvidia/llama-3.1-nemoguard-8b-content-safety): PASSED",
+            ),
+        ]
+        mock_check_rails_endpoint.return_value = (
+            False,
+            "Port 9000 (Rails Config): FAILED",
+        )
+
+        with pytest.raises(SystemExit) as exc_info:
+            main()
+
+        assert exc_info.value.code == 1