Dev/steven/nsfw docs (#30)

steven10a · web-flow · commit 12c4adddeeb7 · 2025-10-29T09:57:35.000-07:00
* Updated prompt injection check
* Formatting changes
* Removed legacy code
* add nsfw docs
diff --git a/docs/benchmarking/NSFW_roc_curve.png b/docs/benchmarking/NSFW_roc_curve.png
diff --git a/docs/ref/checks/nsfw.md b/docs/ref/checks/nsfw.md
@@ -82,10 +82,12 @@ This benchmark evaluates model performance on a balanced set of social media pos
 
 | Model         | ROC AUC | Prec@R=0.80 | Prec@R=0.90 | Prec@R=0.95 | Recall@FPR=0.01 |
 |--------------|---------|-------------|-------------|-------------|-----------------|
-| gpt-4.1      | 0.989   | 0.976       | 0.962       | 0.962       | 0.717           |
-| gpt-4.1-mini (default) | 0.984   | 0.977       | 0.977       | 0.943       | 0.653           |
-| gpt-4.1-nano | 0.952   | 0.972       | 0.823       | 0.823       | 0.429           |
-| gpt-4o-mini  | 0.965   | 0.977       | 0.955       | 0.945       | 0.842           |
+| gpt-5        | 0.9532  | 0.9195      | 0.9096      | 0.9068      | 0.0339          |
+| gpt-5-mini   | 0.9629  | 0.9321      | 0.9168      | 0.9149      | 0.0998          |
+| gpt-5-nano   | 0.9600  | 0.9297      | 0.9216      | 0.9175      | 0.1078          |
+| gpt-4.1      | 0.9603  | 0.9312      | 0.9249      | 0.9192      | 0.0439          |
+| gpt-4.1-mini (default) | 0.9520  | 0.9180      | 0.9130      | 0.9049      | 0.0459          |
+| gpt-4.1-nano | 0.9502  | 0.9262      | 0.9094      | 0.9043      | 0.0379          |
 
 **Notes:**
 
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -58,13 +58,14 @@ nav:
     - "Streaming vs Blocking": streaming_output.md
     - Tripwires: tripwires.md
     - Checks:
-      - Prompt Injection Detection: ref/checks/prompt_injection_detection.md
       - Contains PII: ref/checks/pii.md
       - Custom Prompt Check: ref/checks/custom_prompt_check.md
       - Hallucination Detection: ref/checks/hallucination_detection.md
       - Jailbreak Detection: ref/checks/jailbreak.md
       - Moderation: ref/checks/moderation.md
+      - NSFW Text: ref/checks/nsfw.md
       - Off Topic Prompts: ref/checks/off_topic_prompts.md
+      - Prompt Injection Detection: ref/checks/prompt_injection_detection.md
       - URL Filter: ref/checks/urls.md
     - Evaluation Tool: evals.md
   - API Reference: