Skip to content

Commit 7137cce

Browse files
committed
listing cleanup
1 parent abe0fd1 commit 7137cce

File tree

2 files changed

+15
-15
lines changed

2 files changed

+15
-15
lines changed

tools/listing.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -227,4 +227,4 @@ def generate_readme() -> None:
227227

228228

229229
if __name__ == "__main__":
230-
generate_readme()
230+
generate_readme()

tools/listing.yaml

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,19 @@
6666
tasks: ["cybench"]
6767
tags: ["Agent"]
6868

69+
- title: "CyberSecEval_2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models"
70+
description: |
71+
Evaluates Large Language Models for risky capabilities in cybersecurity.
72+
path: src/inspect_evals/cyberseceval_2
73+
arxiv: https://arxiv.org/pdf/2404.13161
74+
group: Cybersecurity
75+
contributors: ["its-emile"]
76+
tasks: [
77+
"interpreter_abuse",
78+
"prompt_injection",
79+
"vulnerability_exploit"
80+
]
81+
6982
- title: "InterCode: Capture the Flag"
7083
description: |
7184
Measure expertise in coding, cryptography (i.e. binary exploitation, forensics), reverse engineering, and recognizing security vulnerabilities. Demonstrates tool use and sandboxing untrusted model code.
@@ -352,17 +365,4 @@
352365
"agie_sat_en",
353366
"agie_sat_en_without_passage",
354367
"agie_sat_math",
355-
]
356-
357-
- title: "CyberSecEval_2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models"
358-
description: |
359-
Evaluates Large Language Models for risky capabilities in cybersecurity.
360-
path: src/inspect_evals/cyberseceval_2
361-
arxiv: https://arxiv.org/pdf/2404.13161
362-
group: Cybersecurity
363-
contributors: ["its-emile"]
364-
tasks: [
365-
"interpreter_abuse",
366-
"prompt_injection",
367-
"vulnerability_exploit"
368-
]
368+
]

0 commit comments

Comments
 (0)