Skip to content

Commit 51833a2

Browse files
committed
updating last lm_eval mentions
1 parent 92c81a0 commit 51833a2

File tree

7 files changed

+13
-13
lines changed

7 files changed

+13
-13
lines changed

bigcode_eval/tasks/codexglue_text_to_text.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ def get_dataset(self):
6464
def fewshot_examples(self):
6565
"""Loads and returns the few-shot examples for the task if they exist."""
6666
with open(
67-
"lm_eval/tasks/few_shot_examples/codexglue_text_to_text_few_shot_prompts.json",
67+
"bigcode_eval/tasks/few_shot_examples/codexglue_text_to_text_few_shot_prompts.json",
6868
"r",
6969
) as file:
7070
examples = json.load(file)

bigcode_eval/tasks/conala.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ def get_dataset(self):
4747
def fewshot_examples(self):
4848
"""Loads and returns the few-shot examples for the task if they exist."""
4949
with open(
50-
"lm_eval/tasks/few_shot_examples/conala_few_shot_prompts.json", "r"
50+
"bigcode_eval/tasks/few_shot_examples/conala_few_shot_prompts.json", "r"
5151
) as file:
5252
examples = json.load(file)
5353
return examples

bigcode_eval/tasks/concode.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ def get_dataset(self):
4747
def fewshot_examples(self):
4848
"""Loads and returns the few-shot examples for the task if they exist."""
4949
with open(
50-
"lm_eval/tasks/few_shot_examples/concode_few_shot_prompts.json", "r"
50+
"bigcode_eval/tasks/few_shot_examples/concode_few_shot_prompts.json", "r"
5151
) as file:
5252
examples = json.load(file)
5353
return examples

bigcode_eval/tasks/gsm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ def get_dataset(self):
105105
def fewshot_examples(self):
106106
"""Loads and returns the few-shot examples for the task if they exist."""
107107
with open(
108-
"lm_eval/tasks/few_shot_examples/gsm8k_few_shot_prompts.json",
108+
"bigcode_eval/tasks/few_shot_examples/gsm8k_few_shot_prompts.json",
109109
"r",
110110
) as file:
111111
examples = json.load(file)

docs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ accelerate launch main.py \
110110
```
111111

112112

113-
There is also a version to run the OpenAI API on HumanEvalPack at `lm_eval/tasks/humanevalpack_openai.py`. It requires the `openai` package that can be installed via `pip install openai`. You will need to set the environment variables `OPENAI_ORGANIZATION` and `OPENAI_API_KEY`. Then you may want to modify the global variables defined in the script, such as `LANGUAGE`. Finally, you can run it with `python lm_eval/tasks/humanevalpack_openai.py`.
113+
There is also a version to run the OpenAI API on HumanEvalPack at `bigcode_eval/tasks/humanevalpack_openai.py`. It requires the `openai` package that can be installed via `pip install openai`. You will need to set the environment variables `OPENAI_ORGANIZATION` and `OPENAI_API_KEY`. Then you may want to modify the global variables defined in the script, such as `LANGUAGE`. Finally, you can run it with `python bigcode_eval/tasks/humanevalpack_openai.py`.
114114

115115

116116
### InstructHumanEval

docs/guide.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ pip install -r requirements.txt
1616

1717
## Creating Your Task File
1818

19-
From the `bigcode-evaluation-harness` project root, copy over the `new_task.py` template to `lm_eval/tasks`.
19+
From the `bigcode-evaluation-harness` project root, copy over the `new_task.py` template to `bigcode_eval/tasks`.
2020

2121
```sh
22-
cp template/new_task.py lm_eval/tasks/<task-name>.py
22+
cp template/new_task.py bigcode_eval/tasks/<task-name>.py
2323
```
2424

2525
## Task Heading
@@ -81,11 +81,11 @@ def get_prompt(self, doc):
8181
return ""
8282
```
8383

84-
If the prompt involves few-shot examples, you first need to save them in a json `<task_name>_few_shot_prompts.json` in `lm_eval/tasks/few_shot_example` and then load them in `fewshot_examples` method like this:
84+
If the prompt involves few-shot examples, you first need to save them in a json `<task_name>_few_shot_prompts.json` in `bigcode_eval/tasks/few_shot_example` and then load them in `fewshot_examples` method like this:
8585

8686
```python
8787
def fewshot_examples(self):
88-
with open("lm_eval/tasks/few_shot_examples/<task_name>_few_shot_prompts.json", "r") as file:
88+
with open("bigcode_eval/tasks/few_shot_examples/<task_name>_few_shot_prompts.json", "r") as file:
8989
examples = json.load(file)
9090
return examples
9191
```
@@ -113,12 +113,12 @@ def process_results(self, generations, references):
113113
return {}
114114
```
115115

116-
You need to load your metric and run it. Check Hugging Face `evaluate` [library](https://huggingface.co/docs/evaluate/index) for the available metrics. For example [code_eval](https://huggingface.co/spaces/evaluate-metric/code_eval) for pass@k, [BLEU](https://huggingface.co/spaces/evaluate-metric/bleu) for BLEU score and [apps_metric](https://huggingface.co/spaces/codeparrot/apps_metric) are implemented. If you cannot find your desired metric, you can either add it to the `evaluate` library or implement it in the `lm_eval/tasks/custom_metrics` folder and import it from there.
116+
You need to load your metric and run it. Check Hugging Face `evaluate` [library](https://huggingface.co/docs/evaluate/index) for the available metrics. For example [code_eval](https://huggingface.co/spaces/evaluate-metric/code_eval) for pass@k, [BLEU](https://huggingface.co/spaces/evaluate-metric/bleu) for BLEU score and [apps_metric](https://huggingface.co/spaces/codeparrot/apps_metric) are implemented. If you cannot find your desired metric, you can either add it to the `evaluate` library or implement it in the `bigcode_eval/tasks/custom_metrics` folder and import it from there.
117117

118118

119119
### Registering Your Task
120120

121-
Now's a good time to register your task to expose it for usage. All you'll need to do is import your task module in `lm_eval/tasks/__init__.py` and provide an entry in the `TASK_REGISTRY` dictionary with the key as the name of your benchmark task (in the form it'll be referred to in the command line) and the value as the task class. See how it's done for other tasks in the [file](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/lm_eval/tasks/__init__.py).
121+
Now's a good time to register your task to expose it for usage. All you'll need to do is import your task module in `bigcode_eval/tasks/__init__.py` and provide an entry in the `TASK_REGISTRY` dictionary with the key as the name of your benchmark task (in the form it'll be referred to in the command line) and the value as the task class. See how it's done for other tasks in the [file](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/bigcode_eval/tasks/__init__.py).
122122

123123
## Task submission
124124

@@ -136,7 +136,7 @@ Few-shot tasks are easier to conduct, but if you need to add the finetuning scri
136136
## Code formatting
137137
You can format your changes and perform `black` standard checks
138138
```sh
139-
black lm_eval/tasks/<task-name>.py
139+
black bigcode_eval/tasks/<task-name>.py
140140
```
141141
## Task documentation
142142
Please document your task with advised parameters for execution from litterature in the [docs](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/docs/README.md) like it's done for the other benchamrks.

templates/new_task.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ def get_dataset(self):
3737
return []
3838

3939
def fewshot_examples(self):
40-
# TODO: load few-shot examples (from lm_eval/tasks/fewshot_examples) if they exist
40+
# TODO: load few-shot examples (from bigcode_eval/tasks/fewshot_examples) if they exist
4141
"""Loads and returns the few-shot examples for the task if they exist."""
4242
pass
4343

0 commit comments

Comments
 (0)