Skip to content

Commit 12c4add

Browse files
authored
Dev/steven/nsfw docs (#30)
* Updated prompt injection check * Formatting changes * Removed legacy code * add nsfw docs
1 parent ab3f458 commit 12c4add

File tree

3 files changed

+8
-5
lines changed

3 files changed

+8
-5
lines changed
193 KB
Loading

docs/ref/checks/nsfw.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -82,10 +82,12 @@ This benchmark evaluates model performance on a balanced set of social media pos
8282

8383
| Model | ROC AUC | Prec@R=0.80 | Prec@R=0.90 | Prec@R=0.95 | Recall@FPR=0.01 |
8484
|--------------|---------|-------------|-------------|-------------|-----------------|
85-
| gpt-4.1 | 0.989 | 0.976 | 0.962 | 0.962 | 0.717 |
86-
| gpt-4.1-mini (default) | 0.984 | 0.977 | 0.977 | 0.943 | 0.653 |
87-
| gpt-4.1-nano | 0.952 | 0.972 | 0.823 | 0.823 | 0.429 |
88-
| gpt-4o-mini | 0.965 | 0.977 | 0.955 | 0.945 | 0.842 |
85+
| gpt-5 | 0.9532 | 0.9195 | 0.9096 | 0.9068 | 0.0339 |
86+
| gpt-5-mini | 0.9629 | 0.9321 | 0.9168 | 0.9149 | 0.0998 |
87+
| gpt-5-nano | 0.9600 | 0.9297 | 0.9216 | 0.9175 | 0.1078 |
88+
| gpt-4.1 | 0.9603 | 0.9312 | 0.9249 | 0.9192 | 0.0439 |
89+
| gpt-4.1-mini (default) | 0.9520 | 0.9180 | 0.9130 | 0.9049 | 0.0459 |
90+
| gpt-4.1-nano | 0.9502 | 0.9262 | 0.9094 | 0.9043 | 0.0379 |
8991

9092
**Notes:**
9193

mkdocs.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,13 +58,14 @@ nav:
5858
- "Streaming vs Blocking": streaming_output.md
5959
- Tripwires: tripwires.md
6060
- Checks:
61-
- Prompt Injection Detection: ref/checks/prompt_injection_detection.md
6261
- Contains PII: ref/checks/pii.md
6362
- Custom Prompt Check: ref/checks/custom_prompt_check.md
6463
- Hallucination Detection: ref/checks/hallucination_detection.md
6564
- Jailbreak Detection: ref/checks/jailbreak.md
6665
- Moderation: ref/checks/moderation.md
66+
- NSFW Text: ref/checks/nsfw.md
6767
- Off Topic Prompts: ref/checks/off_topic_prompts.md
68+
- Prompt Injection Detection: ref/checks/prompt_injection_detection.md
6869
- URL Filter: ref/checks/urls.md
6970
- Evaluation Tool: evals.md
7071
- API Reference:

0 commit comments

Comments
 (0)