You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -110,7 +110,7 @@ accelerate launch main.py \
110
110
```
111
111
112
112
113
-
There is also a version to run the OpenAI API on HumanEvalPack at `lm_eval/tasks/humanevalpack_openai.py`. It requires the `openai` package that can be installed via `pip install openai`. You will need to set the environment variables `OPENAI_ORGANIZATION` and `OPENAI_API_KEY`. Then you may want to modify the global variables defined in the script, such as `LANGUAGE`. Finally, you can run it with `python lm_eval/tasks/humanevalpack_openai.py`.
113
+
There is also a version to run the OpenAI API on HumanEvalPack at `bigcode_eval/tasks/humanevalpack_openai.py`. It requires the `openai` package that can be installed via `pip install openai`. You will need to set the environment variables `OPENAI_ORGANIZATION` and `OPENAI_API_KEY`. Then you may want to modify the global variables defined in the script, such as `LANGUAGE`. Finally, you can run it with `python bigcode_eval/tasks/humanevalpack_openai.py`.
If the prompt involves few-shot examples, you first need to save them in a json `<task_name>_few_shot_prompts.json` in `lm_eval/tasks/few_shot_example` and then load them in `fewshot_examples` method like this:
84
+
If the prompt involves few-shot examples, you first need to save them in a json `<task_name>_few_shot_prompts.json` in `bigcode_eval/tasks/few_shot_example` and then load them in `fewshot_examples` method like this:
You need to load your metric and run it. Check Hugging Face `evaluate`[library](https://huggingface.co/docs/evaluate/index) for the available metrics. For example [code_eval](https://huggingface.co/spaces/evaluate-metric/code_eval) for pass@k, [BLEU](https://huggingface.co/spaces/evaluate-metric/bleu) for BLEU score and [apps_metric](https://huggingface.co/spaces/codeparrot/apps_metric) are implemented. If you cannot find your desired metric, you can either add it to the `evaluate` library or implement it in the `lm_eval/tasks/custom_metrics` folder and import it from there.
116
+
You need to load your metric and run it. Check Hugging Face `evaluate`[library](https://huggingface.co/docs/evaluate/index) for the available metrics. For example [code_eval](https://huggingface.co/spaces/evaluate-metric/code_eval) for pass@k, [BLEU](https://huggingface.co/spaces/evaluate-metric/bleu) for BLEU score and [apps_metric](https://huggingface.co/spaces/codeparrot/apps_metric) are implemented. If you cannot find your desired metric, you can either add it to the `evaluate` library or implement it in the `bigcode_eval/tasks/custom_metrics` folder and import it from there.
117
117
118
118
119
119
### Registering Your Task
120
120
121
-
Now's a good time to register your task to expose it for usage. All you'll need to do is import your task module in `lm_eval/tasks/__init__.py` and provide an entry in the `TASK_REGISTRY` dictionary with the key as the name of your benchmark task (in the form it'll be referred to in the command line) and the value as the task class. See how it's done for other tasks in the [file](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/lm_eval/tasks/__init__.py).
121
+
Now's a good time to register your task to expose it for usage. All you'll need to do is import your task module in `bigcode_eval/tasks/__init__.py` and provide an entry in the `TASK_REGISTRY` dictionary with the key as the name of your benchmark task (in the form it'll be referred to in the command line) and the value as the task class. See how it's done for other tasks in the [file](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/bigcode_eval/tasks/__init__.py).
122
122
123
123
## Task submission
124
124
@@ -136,7 +136,7 @@ Few-shot tasks are easier to conduct, but if you need to add the finetuning scri
136
136
## Code formatting
137
137
You can format your changes and perform `black` standard checks
138
138
```sh
139
-
black lm_eval/tasks/<task-name>.py
139
+
black bigcode_eval/tasks/<task-name>.py
140
140
```
141
141
## Task documentation
142
142
Please document your task with advised parameters for execution from litterature in the [docs](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/docs/README.md) like it's done for the other benchamrks.
0 commit comments