Skip to content

Commit c84f95b

Browse files
committed
fix: update the legacy behaviour
1 parent 02283b0 commit c84f95b

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,7 @@ We share pre-generated code samples from LLMs we have [evaluated](https://huggin
347347
348348
## 🐞 Known Issues
349349
350-
- [ ] Due to [the Hugging Face tokenizer update](https://github.com/huggingface/transformers/pull/31305), some tokenizer may be broken and will degrade the performance of the evaluation. Please try `--tokenizer_legacy` during the generation.
350+
- [ ] Due to [the Hugging Face tokenizer update](https://github.com/huggingface/transformers/pull/31305), some tokenizer may be broken and will degrade the performance of the evaluation. Therefore, we set up with `legacy=False` for the initialization. If you notice the unexpected change, please try `--tokenizer_legacy` during the generation.
351351
352352
- [ ] Due to the flakes in the evaluation, the execution results may vary slightly (~0.2%) between runs. We are working on improving the evaluation stability.
353353

bigcodebench/model.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ def __init__(self, name: str, dataset: str, tp: int, **kwargs) -> None:
135135
if self.tokenizer_name is None:
136136
self.tokenizer_name = self.name
137137

138-
self.tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name, **kwargs, legacy=not self.tokenizer_legacy)
138+
self.tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name, **kwargs, legacy=self.tokenizer_legacy)
139139
if self.tokenizer.chat_template is None:
140140
self.eos += extra_eos_for_direct_completion(dataset)
141141
self.llm = LLM(model=name, max_model_len=2048, **kwargs)
@@ -195,7 +195,7 @@ def __init__(self, name: str, dataset: str, **kwargs):
195195
if self.tokenizer_name is None:
196196
self.tokenizer_name = self.name
197197

198-
self.tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name, **kwargs, legacy=not self.tokenizer_legacy)
198+
self.tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name, **kwargs, legacy=self.tokenizer_legacy)
199199

200200
if self.tokenizer.chat_template is None:
201201
self.eos += extra_eos_for_direct_completion(dataset)
@@ -252,7 +252,7 @@ def __init__(self, name: str, **kwargs):
252252
self.eos += ["\n```\n"]
253253
print(f"EOS strings: {self.eos}")
254254
self.tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name if self.tokenizer_name else self.name,
255-
**kwargs, legacy=not self.tokenizer_legacy)
255+
**kwargs, legacy=self.tokenizer_legacy)
256256

257257
def codegen(
258258
self, prompt: str, do_sample: bool = True, num_samples: int = 200

0 commit comments

Comments
 (0)