You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tools/listing.yaml
+1-2Lines changed: 1 addition & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -118,7 +118,7 @@
118
118
119
119
- title: "GSM8K: Training Verifiers to Solve Math Word Problems"
120
120
description: |
121
-
Dataset of 8.5K high quality linguistically diverse grade school math word problems. Demostrates fewshot prompting.
121
+
Dataset of 8.5K high quality linguistically diverse grade school math word problems. Demonstrates fewshot prompting.
122
122
path: src/inspect_evals/gsm8k
123
123
arxiv: https://arxiv.org/abs/2110.14168
124
124
group: Mathematics
@@ -180,7 +180,6 @@
180
180
contributors: ["seddy-aisi"]
181
181
tasks: ["piqa"]
182
182
183
-
184
183
- title: "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens"
185
184
description: |
186
185
LLM benchmark featuring an average data length surpassing 100K tokens. Comprises synthetic and realistic tasks spanning diverse domains in English and Chinese.
0 commit comments