Skip to content

Commit 67834f2

Browse files
committed
Move SEvenLLM listing entry
1 parent f1ff37e commit 67834f2

File tree

2 files changed

+20
-20
lines changed

2 files changed

+20
-20
lines changed

README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -113,17 +113,6 @@ Inspect supports many model providers including OpenAI, Anthropic, Google, Mistr
113113
inspect eval inspect_evals/cybench
114114
```
115115

116-
- ### [SEvenLLM: A benchmark to elicit, and improve cybersecurity incident analysis and response abilities in LLMs for Security Events.](src/inspect_evals/sevenllm)
117-
Designed for analyzing cybersecurity incidents, which is comprised of two primary task categories: understanding and generation, with a further breakdown into 28 subcategories of tasks.
118-
<sub><sup>Contributed by: [@kingroryg](https://github.com/kingroryg)</sub></sup>
119-
120-
```bash
121-
inspect eval inspect_evals/sevenllm_mcq_zh
122-
inspect eval inspect_evals/sevenllm_mcq_en
123-
inspect eval inspect_evals/sevenllm_qa_zh
124-
inspect eval inspect_evals/sevenllm_qa_en
125-
```
126-
127116
- ### [CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge](src/inspect_evals/cybermetric)
128117
Datasets containing 80, 500, 2000 and 10000 multiple-choice questions, designed to evaluate understanding across nine domains within cybersecurity
129118
<sub><sup>Contributed by: [@neilshaabi](https://github.com/neilshaabi)</sub></sup>
@@ -150,6 +139,17 @@ Inspect supports many model providers including OpenAI, Anthropic, Google, Mistr
150139
inspect eval inspect_evals/gdm_in_house_ctf
151140
```
152141

142+
- ### [SEvenLLM: A benchmark to elicit, and improve cybersecurity incident analysis and response abilities in LLMs for Security Events.](src/inspect_evals/sevenllm)
143+
Designed for analyzing cybersecurity incidents, which is comprised of two primary task categories: understanding and generation, with a further breakdown into 28 subcategories of tasks.
144+
<sub><sup>Contributed by: [@kingroryg](https://github.com/kingroryg)</sub></sup>
145+
146+
```bash
147+
inspect eval inspect_evals/sevenllm_mcq_zh
148+
inspect eval inspect_evals/sevenllm_mcq_en
149+
inspect eval inspect_evals/sevenllm_qa_zh
150+
inspect eval inspect_evals/sevenllm_qa_en
151+
```
152+
153153
- ### [SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security](src/inspect_evals/sec_qa)
154154
"Security Question Answering" dataset to assess LLMs' understanding and application of security principles.
155155
SecQA has “v1” and “v2” datasets of multiple-choice questions that aim to provide two levels of cybersecurity evaluation criteria.

tools/listing.yaml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -66,15 +66,6 @@
6666
tasks: ["cybench"]
6767
tags: ["Agent"]
6868

69-
- title: "SEvenLLM: A benchmark to elicit, and improve cybersecurity incident analysis and response abilities in LLMs for Security Events."
70-
description: |
71-
Designed for analyzing cybersecurity incidents, which is comprised of two primary task categories: understanding and generation, with a further breakdown into 28 subcategories of tasks.
72-
path: src/inspect_evals/sevenllm
73-
group: Cybersecurity
74-
contributors: ["kingroryg"]
75-
arxiv: https://arxiv.org/abs/2405.03446
76-
tasks: ["sevenllm_mcq_zh", "sevenllm_mcq_en", "sevenllm_qa_zh", "sevenllm_qa_en"]
77-
7869
- title: "CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge"
7970
description: |
8071
Datasets containing 80, 500, 2000 and 10000 multiple-choice questions, designed to evaluate understanding across nine domains within cybersecurity
@@ -123,6 +114,15 @@
123114
tasks: ["gdm_in_house_ctf"]
124115
tags: ["Agent"]
125116

117+
- title: "SEvenLLM: A benchmark to elicit, and improve cybersecurity incident analysis and response abilities in LLMs for Security Events."
118+
description: |
119+
Designed for analyzing cybersecurity incidents, which is comprised of two primary task categories: understanding and generation, with a further breakdown into 28 subcategories of tasks.
120+
path: src/inspect_evals/sevenllm
121+
group: Cybersecurity
122+
contributors: ["kingroryg"]
123+
arxiv: https://arxiv.org/abs/2405.03446
124+
tasks: ["sevenllm_mcq_zh", "sevenllm_mcq_en", "sevenllm_qa_zh", "sevenllm_qa_en"]
125+
126126
- title: "SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security"
127127
description: |
128128
"Security Question Answering" dataset to assess LLMs' understanding and application of security principles.

0 commit comments

Comments
 (0)