Skip to content

Commit 5d4bc98

Browse files
committed
docs
1 parent 5d9a4c7 commit 5d4bc98

File tree

1 file changed

+11
-1
lines changed

1 file changed

+11
-1
lines changed

docs/README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,7 @@ The generation and evaluation follows the same approach as [MBPP](#mbpp). One on
216216
> The subset is selected from the sanitized MBPP (a subset of ~427 manually examined tasks by the original MBPP authors)
217217
> and EvalPlus further removes low-quality and ill-formed one for benchmark quality control to get MBPP+.
218218
219-
```python
219+
```bash
220220
accelerate launch main.py \
221221
--model <MODEL_NAME> \
222222
--max_length_generation <MAX_LENGTH> \
@@ -227,6 +227,16 @@ accelerate launch main.py \
227227
--allow_code_execution
228228
```
229229

230+
By setting `MBBPPLUS_USE_MBPP_TESTS=1` when running MBPP+, one can run the 399 MBPP+ tasks (a subset of the 500 MBPP evaluation tasks) with the original MBPP base tests:
231+
232+
```bash
233+
MBBPPLUS_USE_MBPP_TESTS=1 accelerate launch main.py \
234+
--tasks mbppplus \
235+
--allow_code_execution \
236+
--load_generations_path generations_mbppplus.json \
237+
--model <MODEL_NAME>
238+
```
239+
230240
### DS-1000
231241
[DS-1000](https://ds1000-code-gen.github.io/): Code generation benchmark with 1000 data science questions spanning seven Python libraries that (1) reflects diverse, realistic, and practical use cases, (2) has a reliable metric, (3) defends against memorization by perturbing questions.
232242

0 commit comments

Comments
 (0)