Skip to content

Commit f2acc98

Browse files
committed
Update README
1 parent 824f456 commit f2acc98

File tree

1 file changed

+6
-4
lines changed

1 file changed

+6
-4
lines changed

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -352,17 +352,19 @@ The questions were generated by GPT-4 based on the "Computer Systems Security: P
352352
- ### [MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning](src/inspect_evals/musr)
353353
Evaluating models on multistep soft reasoning tasks in the form of free text narratives.
354354
<sub><sup>Contributed by: [@farrelmahaztra](https://github.com/farrelmahaztra)</sub></sup>
355-
```
355+
356+
```bash
356357
inspect eval inspect_evals/musr
357358
```
358359

359360
- ### [Needle in a Haystack (NIAH): In-Context Retrieval Benchmark for Long Context LLMs](src/inspect_evals/niah)
360361
NIAH evaluates in-context retrieval ability of long context LLMs by testing a model's ability to extract factual information from long-context inputs.
361-
```
362+
363+
364+
```bash
362365
inspect eval inspect_evals/niah
363366
```
364367

365-
366368
- ### [PAWS: Paraphrase Adversaries from Word Scrambling](src/inspect_evals/paws)
367369
Evaluating models on the task of paraphrase detection by providing pairs of sentences that are either paraphrases or not.
368370
<sub><sup>Contributed by: [@meltemkenis](https://github.com/meltemkenis)</sub></sup>
@@ -441,4 +443,4 @@ The questions were generated by GPT-4 based on the "Computer Systems Security: P
441443
inspect eval inspect_evals/agie_lsat_lr
442444
```
443445

444-
<!-- /Eval Listing: Automatically Generated -->
446+
<!-- /Eval Listing: Automatically Generated -->

0 commit comments

Comments
 (0)