Skip to content

Commit 8d6a6a9

Browse files
committed
update readmes
1 parent 7cfcf5d commit 8d6a6a9

File tree

3 files changed

+29
-3
lines changed

3 files changed

+29
-3
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,9 @@ Below are the features and tasks of this framework:
3535
- [CoNaLa](https://huggingface.co/datasets/neulab/conala) for **Python** code generation (2-shot setting and evaluation with BLEU score).
3636
- [Concode](https://huggingface.co/datasets/code_x_glue_tc_text_to_code) for **Java** code generation (2-shot setting and evaluation with BLEU score).
3737
- 3 multilingual downstream classification tasks: [Java Complexity prediction](https://huggingface.co/datasets/codeparrot/codecomplex), [Java code equivalence prediction](https://huggingface.co/datasets/code_x_glue_cc_clone_detection_big_clone_bench), [C code defect prediction](https://huggingface.co/datasets/code_x_glue_cc_defect_detection).
38+
- [SantaCoder-FIM](https://huggingface.co/datasets/bigcode/santacoder-fim-task) for evaluating FIM on **Python** code using Exact Match. Further details are described in [SantaCoder](https://arxiv.org/abs/2301.03988). Includes two tasks:
39+
- `StarCoderFIM`: which uses the default FIM tokens `"<fim_prefix>", "<fim_middle>", "<fim_suffix>"`, and
40+
- `SantaCoderFIM`: which uses SantaCoder FIM tokens `"<fim-prefix>", "<fim-middle>", "<fim-suffix>"`
3841

3942
More details about each task can be found in the documentation in [`docs/README.md`](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/docs/README.md).
4043
## Setup

bigcode_eval/tasks/santacoder_fim.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -121,9 +121,9 @@ class StarCoderFIM(SantaCoderFIM):
121121
DATASET_PATH = "bigcode/santacoder-fim-task"
122122

123123
def __init__(self):
124-
fim_prefix = ("<fim_prefix>",)
125-
fim_middle = ("<fim_middle>",)
126-
fim_suffix = ("<fim_suffix>",)
124+
fim_prefix = "<fim_prefix>"
125+
fim_middle = "<fim_middle>"
126+
fim_suffix = "<fim_suffix>"
127127
stop_words = ["<|endoftext|>"]
128128
super().__init__(
129129
stop_words=stop_words,

docs/README.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -357,6 +357,29 @@ accelerate launch main.py \
357357
```
358358
If you ever get index out-of-range errors try using a number of problems `limit` that is proportional to the number of devices you are using.
359359

360+
### SantaCoder-FIM
361+
[SantaCoder-FIM](https://huggingface.co/datasets/bigcode/santacoder-fim-task): 4,792 tasks for FIM insertion described in [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988). The tasks are similar to other tasks without unit tests, with two key differences:
362+
1. Instead of BLEU Score, Exact Match is used to score the generations.
363+
2. Use zero-shot setting instead of 2-shot
364+
365+
SantaCoder-FIM includes 2 tasks:
366+
- `StarCoderFIM`: which uses the default FIM tokens `"<fim_prefix>", "<fim_middle>", "<fim_suffix>"`, and
367+
- `SantaCoderFIM`: which uses SantaCoder FIM tokens `"<fim-prefix>", "<fim-middle>", "<fim-suffix>"`
368+
So depending on the FIM tokens used to train the model, you will need to select the appropriate task for evaluation.
369+
370+
We only do single generation `n_samples=1`, and use the same generation settings as before.
371+
Below are the commands to run the evaluation:
372+
```python
373+
accelerate launch main.py \
374+
--model <MODEL_NAME> \
375+
--max_length_generation <MAX_LENGTH> \
376+
--tasks <TASK> \
377+
--n_samples 1 \
378+
--temperature 0.2 \
379+
--batch_size 1
380+
```
381+
If you ever get index out-of-range errors try using a number of problems `limit` that is proportional to the number of devices you are using.
382+
360383
## Documentation generation task
361384
Code to text task from [CodeXGLUE](https://huggingface.co/datasets/code_x_glue_ct_code_to_text): is a benchmark for English documentation generation from for 6 programming languages: Python, Go, Ruby, Java, JavaScript and PHP.
362385

0 commit comments

Comments
 (0)