Skip to content

Commit a90b5d6

Browse files
committed
Update usage doc
1 parent 67c4409 commit a90b5d6

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

docs/README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -404,6 +404,28 @@ accelerate launch main.py \
404404
--allow_code_execution
405405
```
406406

407+
## Mercury
408+
[Mercury](https://huggingface.co/datasets/Elfsong/Mercury) is a Code-LLM computational efficiency benchmark. It comprises 1,889 Python programming tasks with three difficulty stratification, which is divided into two datasets for model evaluation and fine-tuning separately. For each evaluation task, we assign a test case generator to remedy the shortfall of test case coverage. More details can be found in the [paper](https://arxiv.org/abs/2402.07844).
409+
410+
```shell
411+
# Install these libraries before runing Mercury
412+
pip install lctk sortedcontainers
413+
```
414+
415+
```python
416+
accelerate launch main.py \
417+
--model <MODEL_NAME> \
418+
--load_in_4bit \
419+
--max_length_generation 2048 \
420+
--tasks mercury \
421+
--n_samples 5 \
422+
--temperature 0.2 \
423+
--batch_size 5 \
424+
--allow_code_execution \
425+
--save_generations \
426+
--metric_output_path <MODEL_NAME>.json
427+
```
428+
407429
## Code generation benchmarks without unit tests
408430

409431
For these tasks, we do single generations and compare the generated code against reference solutions and compute BLEU score. For the following tasks, we use a two-shot setting where we include 2 inputs and their solutions in the prompt, all preceded by an instruction such as: ` "Answer the following instructions in a one line SQL query:\n"`. The solutions consist of one line so we stop the generation when a new line is generated. 3 languages are present: Python, SQL and Java.

0 commit comments

Comments
 (0)