diff --git a/nemoguardrails/benchmark/Procfile b/nemoguardrails/benchmark/Procfile new file mode 100644 index 000000000..f177f52be --- /dev/null +++ b/nemoguardrails/benchmark/Procfile @@ -0,0 +1,8 @@ +# Procfile + +# NeMo Guardrails server +gr: poetry run nemoguardrails server --config configs/guardrail_configs --default-config-id content_safety_colang1 --port 9000 + +# Guardrails NIMs for inference +app_llm: poetry run python mock_llm_server/run_server.py --workers 4 --port 8000 --config-file configs/mock_configs/meta-llama-3.3-70b-instruct.env +cs_llm: poetry run python mock_llm_server/run_server.py --workers 4 --port 8001 --config-file configs/mock_configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env diff --git a/nemoguardrails/benchmark/README.md b/nemoguardrails/benchmark/README.md new file mode 100644 index 000000000..914d1b47c --- /dev/null +++ b/nemoguardrails/benchmark/README.md @@ -0,0 +1,165 @@ +# Guardrails Benchmarking + +NeMo Guardrails includes benchmarking tools to help users capacity-test their Guardrails applications. +Adding guardrails to an LLM-based application improves safety and security, while adding some latency. These benchmarks allow users to quantify the tradeoff between security and latency, to make data-driven decisions. +We currently have a simple testbench, which runs the Guardrails server with mocks as Guardrail and Application models. This can be used for performance-testing on a laptop without any GPUs, and run in a few minutes. + +----- + +## Guardrails Core Benchmarking + +This benchmark measures the performance of the Guardrails application, running on CPU-only laptop or instance. +It doesn't require GPUs on which to run local models, or access to the internet to use models hosted by providers. +All models use the [Mock LLM Server](mock_llm_server), which is a simplified model of an LLM used for inference. +The aim of this benchmark is to detect performance-regressions as quickly as running unit-tests. + +## Quickstart: Running Guardrails with Mock LLMs +To run Guardrails with mocks for both the content-safety and main LLM, follow the steps below. +All commands must be run in the `nemoguardrails/benchmark` directory. +These assume you already have a working environment after following the steps in [CONTRIBUTING.md](../../CONTRIBUTING.md). + +First, we need to install the honcho and langchain-nvidia-ai-endpoints packages. +The `honcho` package is used to run Procfile-based applications, and is a Python port of [Foreman](https://github.com/ddollar/foreman). +The `langchain-nvidia-ai-endpoints` package is used to communicate with Mock LLMs via Langchain. + +```shell +# Install dependencies +$ poetry run pip install honcho langchain-nvidia-ai-endpoints +... +Successfully installed filetype-1.2.0 honcho-2.0.0 langchain-nvidia-ai-endpoints-0.3.19 +``` + +Now we can start up the processes that are part of the [Procfile](Procfile). +As the Procfile processes spin up, they log to the console with a prefix. The `system` prefix is used by Honcho, `app_llm` is the Application or Main LLM mock, `cs_llm` is the content-safety mock, and `gr` is the Guardrails service. We'll explore the Procfile in more detail below. +Once the three 'Uvicorn running on ...' messages are printed, you can move to the next step. Note these messages are likely not on consecutive lines. + +``` +# All commands must be run in the nemoguardrails/benchmark directory +$ cd nemoguardrails/benchmark +$ poetry run honcho start +13:40:33 system | gr.1 started (pid=93634) +13:40:33 system | app_llm.1 started (pid=93635) +13:40:33 system | cs_llm.1 started (pid=93636) +... +13:40:41 app_llm.1 | INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) +... +13:40:41 cs_llm.1 | INFO: Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit) +... +13:40:45 gr.1 | INFO: Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit) +``` + +Once Guardrails and the mock servers are up, we can use the `validate_mocks.py` script to check they're healthy and serving the correct models. + +```shell +$ cd nemoguardrails/benchmark +$ poetry run python validate_mocks.py +Starting LLM endpoint health check... + +--- Checking Port: 8000 --- +Checking http://localhost:8000/health ... +HTTP Request: GET http://localhost:8000/health "HTTP/1.1 200 OK" +Health Check PASSED: Status is 'healthy'. +Checking http://localhost:8000/v1/models for 'meta/llama-3.3-70b-instruct'... +HTTP Request: GET http://localhost:8000/v1/models "HTTP/1.1 200 OK" +Model Check PASSED: Found 'meta/llama-3.3-70b-instruct' in model list. +--- Port 8000: ALL CHECKS PASSED --- + +--- Checking Port: 8001 --- +Checking http://localhost:8001/health ... +HTTP Request: GET http://localhost:8001/health "HTTP/1.1 200 OK" +Health Check PASSED: Status is 'healthy'. +Checking http://localhost:8001/v1/models for 'nvidia/llama-3.1-nemoguard-8b-content-safety'... +HTTP Request: GET http://localhost:8001/v1/models "HTTP/1.1 200 OK" +Model Check PASSED: Found 'nvidia/llama-3.1-nemoguard-8b-content-safety' in model list. +--- Port 8001: ALL CHECKS PASSED --- + +--- Checking Port: 9000 (Rails Config) --- +Checking http://localhost:9000/v1/rails/configs ... +HTTP Request: GET http://localhost:9000/v1/rails/configs "HTTP/1.1 200 OK" +HTTP Status PASSED: Got 200. +Body Check PASSED: Response is an array with at least one entry. +--- Port 9000: ALL CHECKS PASSED --- + +--- Final Summary --- +Port 8000 (meta/llama-3.3-70b-instruct): PASSED +Port 8001 (nvidia/llama-3.1-nemoguard-8b-content-safety): PASSED +Port 9000 (Rails Config): PASSED +--------------------- +Overall Status: All endpoints are healthy! +``` + +Once the mocks and Guardrails are running and the script passes, we can issue curl requests against the Guardrails `/chat/completions` endpoint to generate a response and test the system end-to-end. + +```shell +curl -s -X POST http://0.0.0.0:9000/v1/chat/completions \ + -H 'Accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{ + "model": "meta/llama-3.3-70b-instruct", + "messages": [ + { + "role": "user", + "content": "what can you do for me?" + } + ], + "stream": false + }' | jq +{ + "messages": [ + { + "role": "assistant", + "content": "I can provide information and help with a wide range of topics, from science and history to entertainment and culture. I can also help with language-related tasks, such as translation and text summarization. However, I can't assist with requests that involve harm or illegal activities." + } + ] +} + +``` + +------ + +## Deep-Dive: Configuration + +In this section, we'll examine the configuration files used in the quickstart above. This gives more context on how the system works, and can be extended as needed. + +### Procfile + +The [Procfile](Procfile?raw=true) contains all the processes that make up the application. +The Honcho package reads in this file, starts all the processes, and combines their logs to the console +The `gr` line runs the Guardrails server on port 9000 and sets the default Guardrails configuration as [content_safety_colang1](configs/guardrail_configs/content_safety_colang1?raw=true). +The `app_llm` line runs the Application or Main Mock LLM. Guardrails calls this LLM to generate a response to the user's query. This server uses 4 uvicorn workers and runs on port 8000. The configuration file here is a Mock LLM configuration, not a Guardrails configuration. +The `cs_llm` line runs the Content-Safety Mock LLM. This uses 4 uvicorn workers and runs on port 8001. + +### Guardrails Configuration +The [Guardrails Configuration](configs/guardrail_configs/content_safety_colang1/config.yml) is used by the Guardrails server. +Under the `models` section, the `main` model is used to generate responses to the user queries. The base URL for this model is the `app_llm` Mock LLM from the Procfile, running on port 8000. The `model` field has to match the Mock LLM model name. +The `content_safety` model is configured for use in an input and output rail. The `type` field matches the `$model` used in the input and output flows. + +### Mock LLM Endpoints +The Mock LLM implements a subset of the OpenAI LLM API. +There are two Mock LLM configurations, one for the Mock [main model](configs/mock_configs/meta-llama-3.3-70b-instruct.env), and another for the Mock [content-safety](configs/mock_configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env) model. +The Mock LLM has the following OpenAI-compatible endpoints: + +* `/health`: Returns a JSON object with status set to healthy and timestamp in seconds-since-epoch. For example `{"status":"healthy","timestamp":1762781239}` +* `/v1/models`: Returns the `MODEL` field from the Mock configuration (see below). For example `{"object":"list","data":[{"id":"meta/llama-3.3-70b-instruct","object":"model","created":1762781290,"owned_by":"system"}]}` +* `/v1/completions`: Returns an [OpenAI completion object](https://platform.openai.com/docs/api-reference/completions/object) using the Mock configuration (see below). +* `/v1/chat/completions`: Returns an [OpenAI chat completion object](https://platform.openai.com/docs/api-reference/chat/object) using the Mock configuration (see below). + +### Mock LLM Configuration +Mock LLMs are configured using the `.env` file format. These files are passed to the Mock LLM using the `--config-file` argument. +The Mock LLMs return either a `SAFE_TEXT` or `UNSAFE_TEXT` response to `/v1/completions` or `/v1/chat/completions` inference requests. +The probability of the `UNSAFE_TEXT` being returned if given by `UNSAFE_PROBABILITY`. +The latency of each response is also controllable, and works as follows: + +* Latency is first sampled from a normal distribution with mean `LATENCY_MEAN_SECONDS` and standard deviation `LATENCY_STD_SECONDS`. +* If the sampled value is less than `LATENCY_MIN_SECONDS`, it is set to `LATENCY_MIN_SECONDS`. +* If the sampled value is less than `LATENCY_MAX_SECONDS`, it is set to `LATENCY_MAX_SECONDS`. + +The full list of configuration fields is shown below: +* `MODEL`: The Model name served by the Mock LLM. This will be returned on the `/v1/models` endpoint. +* `UNSAFE_PROBABILITY`: Probability of an unsafe response. This is a probability, and must be in the range [0, 1]. +* `UNSAFE_TEXT`: String returned as an unsafe response. +* `SAFE_TEXT`: String returned as a safe response. +* `LATENCY_MIN_SECONDS`: Minimum latency in seconds. +* `LATENCY_MAX_SECONDS`: Maximum latency in seconds. +* `LATENCY_MEAN_SECONDS`: Normal distribution mean from which to sample latency. +* `LATENCY_STD_SECONDS`: Normal distribution standard deviation from which to sample latency. diff --git a/nemoguardrails/benchmark/mock_llm_server/configs/guardrail_configs/content_safety_colang1/config.yml b/nemoguardrails/benchmark/configs/guardrail_configs/content_safety_colang1/config.yml similarity index 100% rename from nemoguardrails/benchmark/mock_llm_server/configs/guardrail_configs/content_safety_colang1/config.yml rename to nemoguardrails/benchmark/configs/guardrail_configs/content_safety_colang1/config.yml diff --git a/nemoguardrails/benchmark/mock_llm_server/configs/guardrail_configs/content_safety_colang1/prompts.yml b/nemoguardrails/benchmark/configs/guardrail_configs/content_safety_colang1/prompts.yml similarity index 100% rename from nemoguardrails/benchmark/mock_llm_server/configs/guardrail_configs/content_safety_colang1/prompts.yml rename to nemoguardrails/benchmark/configs/guardrail_configs/content_safety_colang1/prompts.yml diff --git a/nemoguardrails/benchmark/mock_llm_server/configs/meta-llama-3.3-70b-instruct.env b/nemoguardrails/benchmark/configs/mock_configs/meta-llama-3.3-70b-instruct.env similarity index 100% rename from nemoguardrails/benchmark/mock_llm_server/configs/meta-llama-3.3-70b-instruct.env rename to nemoguardrails/benchmark/configs/mock_configs/meta-llama-3.3-70b-instruct.env diff --git a/nemoguardrails/benchmark/mock_llm_server/configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env b/nemoguardrails/benchmark/configs/mock_configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env similarity index 100% rename from nemoguardrails/benchmark/mock_llm_server/configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env rename to nemoguardrails/benchmark/configs/mock_configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env diff --git a/nemoguardrails/benchmark/mock_llm_server/run_server.py b/nemoguardrails/benchmark/mock_llm_server/run_server.py index 14e0be02f..eae8bc032 100644 --- a/nemoguardrails/benchmark/mock_llm_server/run_server.py +++ b/nemoguardrails/benchmark/mock_llm_server/run_server.py @@ -71,7 +71,12 @@ def parse_arguments(): parser.add_argument( "--config-file", help=".env file to configure model", required=True ) - + parser.add_argument( + "--workers", + type=int, + default=1, + help="Number of uvicorn worker processes (default: 1)", + ) return parser.parse_args() @@ -104,12 +109,13 @@ def main(): # pragma: no cover try: uvicorn.run( - "api:app", + "nemoguardrails.benchmark.mock_llm_server.api:app", host=args.host, port=args.port, reload=args.reload, log_level=args.log_level, env_file=config_file, + workers=args.workers, ) except KeyboardInterrupt: log.info("\nServer stopped by user") diff --git a/nemoguardrails/benchmark/validate_mocks.py b/nemoguardrails/benchmark/validate_mocks.py new file mode 100644 index 000000000..795f1c671 --- /dev/null +++ b/nemoguardrails/benchmark/validate_mocks.py @@ -0,0 +1,227 @@ +#!/usr/bin/env python3 + +# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +A script to check the health and model IDs of local OpenAI-compatible endpoints. +Requires the 'httpx' library: pip install httpx +""" + +import json +import logging +import sys + +import httpx + +# --- Logging Setup --- +# Configure basic logging to print info-level messages +logging.basicConfig(level=logging.INFO, format="%(message)s") + + +def check_endpoint(port: int, expected_model: str): + """ + Checks the /health and /v1/models endpoints for a standard + OpenAI-compatible server. + Returns a tuple: (bool success, str summary) + """ + base_url = f"http://localhost:{port}" + all_ok = True + + logging.info("\n--- Checking Port: %s ---", port) + + # --- 1. Health Check --- + health_url = f"{base_url}/health" + logging.info("Checking %s ...", health_url) + try: + response = httpx.get(health_url, timeout=3) + + if response.status_code != 200: + logging.error("Health Check FAILED: Status code %s", response.status_code) + all_ok = False + else: + try: + data = response.json() + status = data.get("status") + if status == "healthy": + logging.info("Health Check PASSED: Status is 'healthy'.") + else: + logging.warning( + "Health Check FAILED: Expected 'healthy', got '%s'.", status + ) + all_ok = False + except json.JSONDecodeError: + logging.error("Health Check FAILED: Could not decode JSON response.") + all_ok = False + + except httpx.ConnectError: + logging.error("Health Check FAILED: No response from server on port %s.", port) + logging.error("--- Port %s: CHECKS FAILED ---", port) + return False, "Port %s (%s): FAILED (Connection Error)" % (port, expected_model) + except httpx.TimeoutException: + logging.error("Health Check FAILED: Connection timed out for port %s.", port) + logging.error("--- Port %s: CHECKS FAILED ---", port) + return False, "Port %s (%s): FAILED (Connection Timeout)" % ( + port, + expected_model, + ) + + # --- 2. Model Check --- + models_url = f"{base_url}/v1/models" + logging.info("Checking %s for '%s'...", models_url, expected_model) + try: + response = httpx.get(models_url, timeout=3) + + if response.status_code != 200: + logging.error("Model Check FAILED: Status code %s", response.status_code) + all_ok = False + else: + try: + data = response.json() + models = data.get("data", []) + model_ids = [model.get("id") for model in models] + + if expected_model in model_ids: + logging.info( + "Model Check PASSED: Found '%s' in model list.", expected_model + ) + else: + logging.warning( + "Model Check FAILED: Expected '%s', but it was NOT found.", + expected_model, + ) + logging.warning("Available models:") + for model_id in model_ids: + logging.warning(" - %s", model_id) + all_ok = False + except json.JSONDecodeError: + logging.error("Model Check FAILED: Could not decode JSON response.") + all_ok = False + except AttributeError: + logging.error( + "Model Check FAILED: Unexpected JSON structure in response from %s.", + models_url, + ) + all_ok = False + + except httpx.ConnectError: + logging.error("Model Check FAILED: No response from server on port %s.", port) + all_ok = False + except httpx.TimeoutException: + logging.error("Model Check FAILED: Connection timed out for port %s.", port) + all_ok = False + + # --- Final Status --- + if all_ok: + logging.info("--- Port %s: ALL CHECKS PASSED ---", port) + return True, "Port %s (%s): PASSED" % (port, expected_model) + else: + logging.error("--- Port %s: CHECKS FAILED ---", port) + return False, "Port %s (%s): FAILED" % (port, expected_model) + + +def check_rails_endpoint(port: int): + """ + Checks the /v1/rails/configs endpoint for a specific 200 status + and a non-empty list response. + Returns a tuple: (bool success, str summary) + """ + base_url = f"http://localhost:{port}" + endpoint = f"{base_url}/v1/rails/configs" + all_ok = True + + logging.info("\n--- Checking Port: %s (Rails Config) ---", port) + logging.info("Checking %s ...", endpoint) + + try: + response = httpx.get(endpoint, timeout=3) + + # --- 1. HTTP Status Check --- + if response.status_code == 200: + logging.info("HTTP Status PASSED: Got %s.", response.status_code) + else: + logging.warning( + "HTTP Status FAILED: Expected 200, got '%s'.", response.status_code + ) + all_ok = False + + # --- 2. Body Content Check --- + try: + data = response.json() + if isinstance(data, list) and len(data) > 0: + logging.info( + "Body Check PASSED: Response is an array with at least one entry." + ) + else: + logging.warning( + "Body Check FAILED: Response is not an array or is empty." + ) + logging.debug( + "Response body (first 200 chars): %s", str(response.text)[:200] + ) + all_ok = False + except json.JSONDecodeError: + logging.error("Body Check FAILED: Could not decode JSON response.") + logging.debug( + "Response body (first 200 chars): %s", str(response.text)[:200] + ) + all_ok = False + + except httpx.ConnectError: + logging.error("Rails Check FAILED: No response from server on port %s.", port) + all_ok = False + except httpx.TimeoutException: + logging.error("Rails Check FAILED: Connection timed out for port %s.", port) + all_ok = False + + # --- Final Status --- + if all_ok: + logging.info("--- Port %s: ALL CHECKS PASSED ---", port) + return True, "Port %s (Rails Config): PASSED" % port + else: + logging.error("--- Port %s: CHECKS FAILED ---", port) + return False, "Port %s (Rails Config): FAILED" % port + + +def main(): + """Run all health checks.""" + logging.info("Starting LLM endpoint health check...") + + check_results = [ + check_endpoint(8000, "meta/llama-3.3-70b-instruct"), + check_endpoint(8001, "nvidia/llama-3.1-nemoguard-8b-content-safety"), + check_rails_endpoint(9000), + ] + + logging.info("\n--- Final Summary ---") + + all_passed = True + for success, summary in check_results: + logging.info(summary) + if not success: + all_passed = False + + logging.info("---------------------") + + if all_passed: + logging.info("Overall Status: All endpoints are healthy!") + sys.exit(0) + else: + logging.error("Overall Status: One or more checks FAILED.") + sys.exit(1) + + +if __name__ == "__main__": + main() # pragma: no cover diff --git a/tests/benchmark/test_validate_mocks.py b/tests/benchmark/test_validate_mocks.py new file mode 100644 index 000000000..d8a86c1fa --- /dev/null +++ b/tests/benchmark/test_validate_mocks.py @@ -0,0 +1,434 @@ +#!/usr/bin/env python3 + +# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +Tests for validate_mocks.py script. +""" + +import json +from unittest.mock import MagicMock, patch + +import httpx +import pytest + +from nemoguardrails.benchmark.validate_mocks import ( + check_endpoint, + check_rails_endpoint, + main, +) + + +class TestCheckEndpoint: + """Tests for check_endpoint function.""" + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_endpoint_success(self, mock_get): + """Test successful health and model checks.""" + # Mock health check response + health_response = MagicMock() + health_response.status_code = 200 + health_response.json.return_value = {"status": "healthy"} + + # Mock models check response + models_response = MagicMock() + models_response.status_code = 200 + models_response.json.return_value = { + "data": [ + {"id": "meta/llama-3.3-70b-instruct"}, + {"id": "other-model"}, + ] + } + + mock_get.side_effect = [health_response, models_response] + + success, summary = check_endpoint(8000, "meta/llama-3.3-70b-instruct") + + assert success + assert "PASSED" in summary + assert "8000" in summary + assert mock_get.call_count == 2 + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_endpoint_health_check_failed_status(self, mock_get): + """Test health check with non-200 status code.""" + health_response = MagicMock() + health_response.status_code = 404 + + mock_get.return_value = health_response + + success, summary = check_endpoint(8000, "test-model") + + assert not success + assert "FAILED" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_endpoint_health_check_unhealthy_status(self, mock_get): + """Test health check with unhealthy status.""" + health_response = MagicMock() + health_response.status_code = 200 + health_response.json.return_value = {"status": "unhealthy"} + + models_response = MagicMock() + models_response.status_code = 200 + models_response.json.return_value = {"data": [{"id": "test-model"}]} + + mock_get.side_effect = [health_response, models_response] + + success, summary = check_endpoint(8000, "test-model") + + assert not success + assert "FAILED" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_endpoint_health_check_json_decode_error(self, mock_get): + """Test health check with invalid JSON.""" + health_response = MagicMock() + health_response.status_code = 200 + health_response.json.side_effect = json.JSONDecodeError( + "Expecting value", "", 0 + ) + + mock_get.return_value = health_response + + success, summary = check_endpoint(8000, "test-model") + + assert not success + assert "FAILED" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_endpoint_health_connection_error(self, mock_get): + """Test health check with connection error.""" + mock_get.side_effect = httpx.ConnectError("Connection failed") + + success, summary = check_endpoint(8000, "test-model") + + assert not success + assert "FAILED" in summary + assert "Connection Error" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_endpoint_health_timeout(self, mock_get): + """Test health check with timeout.""" + mock_get.side_effect = httpx.TimeoutException("Request timed out") + + success, summary = check_endpoint(8000, "test-model") + + assert not success + assert "FAILED" in summary + assert "Connection Timeout" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_endpoint_model_check_failed_status(self, mock_get): + """Test model check with non-200 status code.""" + health_response = MagicMock() + health_response.status_code = 200 + health_response.json.return_value = {"status": "healthy"} + + models_response = MagicMock() + models_response.status_code = 404 + + mock_get.side_effect = [health_response, models_response] + + success, summary = check_endpoint(8000, "test-model") + + assert not success + assert "FAILED" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_endpoint_model_not_found(self, mock_get): + """Test model check when expected model is not in the list.""" + health_response = MagicMock() + health_response.status_code = 200 + health_response.json.return_value = {"status": "healthy"} + + models_response = MagicMock() + models_response.status_code = 200 + models_response.json.return_value = { + "data": [ + {"id": "other-model-1"}, + {"id": "other-model-2"}, + ] + } + + mock_get.side_effect = [health_response, models_response] + + success, summary = check_endpoint(8000, "test-model") + + assert not success + assert "FAILED" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_endpoint_model_check_json_decode_error(self, mock_get): + """Test model check with invalid JSON.""" + health_response = MagicMock() + health_response.status_code = 200 + health_response.json.return_value = {"status": "healthy"} + + models_response = MagicMock() + models_response.status_code = 200 + models_response.json.side_effect = json.JSONDecodeError( + "Expecting value", "", 0 + ) + + mock_get.side_effect = [health_response, models_response] + + success, summary = check_endpoint(8000, "test-model") + + assert not success + assert "FAILED" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_endpoint_model_check_unexpected_json_structure(self, mock_get): + """Test model check with unexpected JSON structure.""" + health_response = MagicMock() + health_response.status_code = 200 + health_response.json.return_value = {"status": "healthy"} + + models_response = MagicMock() + models_response.status_code = 200 + # Return invalid structure that will cause AttributeError + models_response.json.return_value = "invalid" + + mock_get.side_effect = [health_response, models_response] + + success, summary = check_endpoint(8000, "test-model") + + assert not success + assert "FAILED" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_endpoint_model_check_connection_error(self, mock_get): + """Test model check with connection error.""" + health_response = MagicMock() + health_response.status_code = 200 + health_response.json.return_value = {"status": "healthy"} + + mock_get.side_effect = [ + health_response, + httpx.ConnectError("Connection failed"), + ] + + success, summary = check_endpoint(8000, "test-model") + + assert not success + assert "FAILED" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_endpoint_model_check_timeout(self, mock_get): + """Test model check with timeout.""" + health_response = MagicMock() + health_response.status_code = 200 + health_response.json.return_value = {"status": "healthy"} + + mock_get.side_effect = [ + health_response, + httpx.TimeoutException("Request timed out"), + ] + + success, summary = check_endpoint(8000, "test-model") + + assert not success + assert "FAILED" in summary + + +class TestCheckRailsEndpoint: + """Tests for check_rails_endpoint function.""" + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_rails_endpoint_success(self, mock_get): + """Test successful rails config check.""" + response = MagicMock() + response.status_code = 200 + response.json.return_value = [ + {"id": "config1", "name": "Config 1"}, + {"id": "config2", "name": "Config 2"}, + ] + + mock_get.return_value = response + + success, summary = check_rails_endpoint(9000) + + assert success + assert "PASSED" in summary + assert "9000" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_rails_endpoint_non_200_status(self, mock_get): + """Test rails config check with non-200 status.""" + response = MagicMock() + response.status_code = 404 + response.json.return_value = [] + + mock_get.return_value = response + + success, summary = check_rails_endpoint(9000) + + assert not success + assert "FAILED" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_rails_endpoint_empty_list(self, mock_get): + """Test rails config check with empty list response.""" + response = MagicMock() + response.status_code = 200 + response.json.return_value = [] + + mock_get.return_value = response + + success, summary = check_rails_endpoint(9000) + + assert not success + assert "FAILED" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_rails_endpoint_not_a_list(self, mock_get): + """Test rails config check with non-list response.""" + response = MagicMock() + response.status_code = 200 + response.json.return_value = {"error": "invalid"} + + mock_get.return_value = response + + success, summary = check_rails_endpoint(9000) + + assert not success + assert "FAILED" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_rails_endpoint_json_decode_error(self, mock_get): + """Test rails config check with invalid JSON.""" + response = MagicMock() + response.status_code = 200 + response.text = "invalid json" + response.json.side_effect = json.JSONDecodeError("Expecting value", "", 0) + + mock_get.return_value = response + + success, summary = check_rails_endpoint(9000) + + assert not success + assert "FAILED" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_rails_endpoint_connection_error(self, mock_get): + """Test rails config check with connection error.""" + mock_get.side_effect = httpx.ConnectError("Connection failed") + + success, summary = check_rails_endpoint(9000) + + assert not success + assert "FAILED" in summary + + @patch("nemoguardrails.benchmark.validate_mocks.httpx.get") + def test_check_rails_endpoint_timeout(self, mock_get): + """Test rails config check with timeout.""" + mock_get.side_effect = httpx.TimeoutException("Request timed out") + + success, summary = check_rails_endpoint(9000) + + assert not success + assert "FAILED" in summary + + +class TestMain: + """Tests for main function.""" + + @patch("nemoguardrails.benchmark.validate_mocks.check_rails_endpoint") + @patch("nemoguardrails.benchmark.validate_mocks.check_endpoint") + def test_main_all_passed(self, mock_check_endpoint, mock_check_rails_endpoint): + """Test main function when all checks pass.""" + mock_check_endpoint.side_effect = [ + (True, "Port 8000 (meta/llama-3.3-70b-instruct): PASSED"), + ( + True, + "Port 8001 (nvidia/llama-3.1-nemoguard-8b-content-safety): PASSED", + ), + ] + mock_check_rails_endpoint.return_value = ( + True, + "Port 9000 (Rails Config): PASSED", + ) + + with pytest.raises(SystemExit) as exc_info: + main() + + assert exc_info.value.code == 0 + assert mock_check_endpoint.call_count == 2 + assert mock_check_rails_endpoint.call_count == 1 + + @patch("nemoguardrails.benchmark.validate_mocks.check_rails_endpoint") + @patch("nemoguardrails.benchmark.validate_mocks.check_endpoint") + def test_main_one_failed(self, mock_check_endpoint, mock_check_rails_endpoint): + """Test main function when one check fails.""" + mock_check_endpoint.side_effect = [ + (False, "Port 8000 (meta/llama-3.3-70b-instruct): FAILED"), + ( + True, + "Port 8001 (nvidia/llama-3.1-nemoguard-8b-content-safety): PASSED", + ), + ] + mock_check_rails_endpoint.return_value = ( + True, + "Port 9000 (Rails Config): PASSED", + ) + + with pytest.raises(SystemExit) as exc_info: + main() + + assert exc_info.value.code == 1 + + @patch("nemoguardrails.benchmark.validate_mocks.check_rails_endpoint") + @patch("nemoguardrails.benchmark.validate_mocks.check_endpoint") + def test_main_all_failed(self, mock_check_endpoint, mock_check_rails_endpoint): + """Test main function when all checks fail.""" + mock_check_endpoint.side_effect = [ + (False, "Port 8000 (meta/llama-3.3-70b-instruct): FAILED"), + ( + False, + "Port 8001 (nvidia/llama-3.1-nemoguard-8b-content-safety): FAILED", + ), + ] + mock_check_rails_endpoint.return_value = ( + False, + "Port 9000 (Rails Config): FAILED", + ) + + with pytest.raises(SystemExit) as exc_info: + main() + + assert exc_info.value.code == 1 + + @patch("nemoguardrails.benchmark.validate_mocks.check_rails_endpoint") + @patch("nemoguardrails.benchmark.validate_mocks.check_endpoint") + def test_main_rails_failed(self, mock_check_endpoint, mock_check_rails_endpoint): + """Test main function when only rails check fails.""" + mock_check_endpoint.side_effect = [ + (True, "Port 8000 (meta/llama-3.3-70b-instruct): PASSED"), + ( + True, + "Port 8001 (nvidia/llama-3.1-nemoguard-8b-content-safety): PASSED", + ), + ] + mock_check_rails_endpoint.return_value = ( + False, + "Port 9000 (Rails Config): FAILED", + ) + + with pytest.raises(SystemExit) as exc_info: + main() + + assert exc_info.value.code == 1