update readmes

maxmatical · maxmatical · commit 8d6a6a9065a4 · 2023-11-15T12:16:06.000-05:00
diff --git a/README.md b/README.md
@@ -35,6 +35,9 @@ Below are the features and tasks of this framework:
     - [CoNaLa](https://huggingface.co/datasets/neulab/conala) for **Python** code generation (2-shot setting and evaluation with BLEU score).
     - [Concode](https://huggingface.co/datasets/code_x_glue_tc_text_to_code) for **Java** code generation (2-shot setting and evaluation with BLEU score).
     - 3 multilingual downstream classification tasks: [Java Complexity prediction](https://huggingface.co/datasets/codeparrot/codecomplex), [Java code equivalence prediction](https://huggingface.co/datasets/code_x_glue_cc_clone_detection_big_clone_bench), [C code defect prediction](https://huggingface.co/datasets/code_x_glue_cc_defect_detection).
+    - [SantaCoder-FIM](https://huggingface.co/datasets/bigcode/santacoder-fim-task) for evaluating FIM on **Python** code using Exact Match. Further details are described in [SantaCoder](https://arxiv.org/abs/2301.03988). Includes two tasks:
+        - `StarCoderFIM`: which uses the default FIM tokens `"<fim_prefix>", "<fim_middle>", "<fim_suffix>"`, and
+        - `SantaCoderFIM`: which uses SantaCoder FIM tokens `"<fim-prefix>", "<fim-middle>", "<fim-suffix>"`
 
 More details about each task can be found in  the documentation in [`docs/README.md`](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/docs/README.md).
 ## Setup
diff --git a/bigcode_eval/tasks/santacoder_fim.py b/bigcode_eval/tasks/santacoder_fim.py
@@ -121,9 +121,9 @@ class StarCoderFIM(SantaCoderFIM):
     DATASET_PATH = "bigcode/santacoder-fim-task"
 
     def __init__(self):
-        fim_prefix = ("<fim_prefix>",)
-        fim_middle = ("<fim_middle>",)
-        fim_suffix = ("<fim_suffix>",)
+        fim_prefix = "<fim_prefix>"
+        fim_middle = "<fim_middle>"
+        fim_suffix = "<fim_suffix>"
         stop_words = ["<|endoftext|>"]
         super().__init__(
             stop_words=stop_words,
diff --git a/docs/README.md b/docs/README.md
@@ -357,6 +357,29 @@ accelerate launch  main.py \
 ```
 If you ever get index out-of-range errors try using a number of problems `limit` that is proportional to the number of devices you are using.
 
+### SantaCoder-FIM
+[SantaCoder-FIM](https://huggingface.co/datasets/bigcode/santacoder-fim-task): 4,792 tasks for FIM insertion described in [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988). The tasks are similar to other tasks without unit tests, with two key differences:
+1. Instead of BLEU Score, Exact Match is used to score the generations.
+2. Use zero-shot setting instead of 2-shot
+
+SantaCoder-FIM includes 2 tasks:
+- `StarCoderFIM`: which uses the default FIM tokens `"<fim_prefix>", "<fim_middle>", "<fim_suffix>"`, and
+- `SantaCoderFIM`: which uses SantaCoder FIM tokens `"<fim-prefix>", "<fim-middle>", "<fim-suffix>"`
+So depending on the FIM tokens used to train the model, you will need to select the appropriate task for evaluation.
+
+We only do single generation `n_samples=1`, and use the same generation settings as before.
+Below are the commands to run the evaluation:
+```python
+accelerate launch  main.py \
+  --model <MODEL_NAME> \
+  --max_length_generation <MAX_LENGTH> \
+  --tasks <TASK> \
+  --n_samples 1 \
+  --temperature 0.2 \
+  --batch_size 1 
+```
+If you ever get index out-of-range errors try using a number of problems `limit` that is proportional to the number of devices you are using.
+
 ## Documentation generation task
 Code to text task from [CodeXGLUE](https://huggingface.co/datasets/code_x_glue_ct_code_to_text): is a benchmark for English documentation generation from for 6 programming languages: Python, Go, Ruby, Java, JavaScript and PHP.