Skip to content

Commit 1b0147c

Browse files
authored
Merge pull request #223 from ganler/evalplus-maintain
refactor(evalplus): maintain mbpp+ v0.2.0
2 parents 642c57f + ae6c309 commit 1b0147c

File tree

2 files changed

+4
-7
lines changed

2 files changed

+4
-7
lines changed

bigcode_eval/tasks/humanevalplus.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,11 @@ class GeneralHumanEvalPlus(GeneralHumanEval):
2929

3030
DATASET_PATH = "evalplus/humanevalplus"
3131

32-
def __init__(self, strip_prompt, k=[1, 10, 100], num_workers=16, timeout=10.0):
33-
if timeout < 10.0:
32+
def __init__(self, strip_prompt, k=[1, 10, 100], num_workers=16, timeout=20.0):
33+
if timeout < 20.0:
3434
warn(
3535
"It is suggested to have a longer timeout as HumanEval+ has lots of tests. "
36-
f"The current timeout is {timeout}s while the suggested timeout is 10s."
36+
f"The current timeout is {timeout}s while the suggested timeout is 20s."
3737
)
3838
super().__init__(strip_prompt, k, num_workers, timeout)
3939

bigcode_eval/tasks/mbppplus.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
The MBPP+ dataset is created by the EvalPlus framework which extends the original MBPP dataset
55
by adding more automatically generated test cases to each problem. Note MBPP+ only includes 399
66
tasks which are a subset of the original MBPP dataset. The subset is selected from the sanitized
7-
MBPP (a subset of manually examined tasks by the original MBPP authors) and EvalPlus further
7+
MBPP (a subset of manually examined tasks by the original MBPP authors) and EvalPlus further
88
removes low-quality and ill-formed tasks for benchmark quality control.
99
1010
Homepage: https://github.com/evalplus/evalplus
@@ -56,9 +56,6 @@ def get_reference(self, doc):
5656
def get_dataset(self):
5757
"""Returns dataset for the task or an iterable of any object, that get_prompt can handle"""
5858
dataset = self.dataset["test"]
59-
assert (
60-
len(dataset) == 399
61-
), "MBPP+ only has 399 problems. Please retry by deleting its old cache"
6259
return dataset
6360

6461
def process_results(self, generations, references):

0 commit comments

Comments
 (0)