Skip to content

Commit 71cc7ba

Browse files
authored
Update README.md
1 parent 0391b55 commit 71cc7ba

File tree

1 file changed

+13
-13
lines changed

1 file changed

+13
-13
lines changed

README.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -31,17 +31,17 @@ print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
3131

3232
## Introduction
3333
This repo provides the code for reproducing the experiments in [CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation](https://arxiv.org/pdf/2109.00859.pdf).
34-
CodeT5 is a new pre-trained encoder-decoder model for programming languages, which is pre-trained on 8.35M functions in 8 programming languages (Python, Java, JavaScript, PHP, Ruby, Go, C, and C#).
35-
In total, it achieves state-of-the-art results on 14 sub-tasks in a code intelligence benchmark - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE).
34+
CodeT5 is a new pre-trained encoder-decoder model for programming languages, which is pre-trained on **8.35M** functions in 8 programming languages (Python, Java, JavaScript, PHP, Ruby, Go, C, and C#).
35+
In total, it achieves state-of-the-art results on **14 sub-tasks** in a code intelligence benchmark - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE).
3636

3737
Paper link: https://arxiv.org/abs/2109.00859
3838

3939
Blog link: https://blog.einstein.ai/codet5/
4040

41-
The code currently include two pre-trained checkpoints ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small) and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and scripts to fine-tine them on 4 generation tasks (code summarization, code generation, translation, and refinement) plus 2 understanding tasks (code defect detection and clone detection) in CodeXGLUE.
41+
The code currently includes two pre-trained checkpoints ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small) and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and scripts to fine-tine them on 4 generation tasks (code summarization, code generation, translation, and refinement) plus 2 understanding tasks (code defect detection and clone detection) in CodeXGLUE.
4242

4343
In practice, CodeT5 can be deployed as an AI-powered coding assistant to boost the productivity of software developers.
44-
At Salesforce, we build an [AI coding assistant demo](https://github.com/salesforce/CodeT5/raw/main/codet5.gif) using CodeT5 to provide three capabilities for Apex developers as a VS Code plugin:
44+
At Salesforce, we build an [AI coding assistant demo](https://github.com/salesforce/CodeT5/raw/main/codet5.gif) using CodeT5 as a VS Code plugin to provide three capabilities for Apex developers:
4545

4646
- **Text-to-code generation**: generate code based on the natural language description.
4747
- **Code autocompletion**: complete the whole function of code given the target function name.
@@ -59,11 +59,12 @@ At Salesforce, we build an [AI coding assistant demo](https://github.com/salesfo
5959
## Citation
6060
If you find this code to be useful for your research, please consider citing.
6161
```
62-
@article{CodeT5,
63-
title={CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation},
64-
author={Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi},
65-
year={2021},
66-
journal={arXiv preprint arXiv:2109.00859},
62+
@inproceedings{
63+
wang2021codet5,
64+
title={CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation},
65+
author={Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi},
66+
booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021},
67+
year={2021},
6768
}
6869
```
6970

@@ -91,7 +92,7 @@ We encourage users of this software to tell us about the applications in which t
9192
## Download
9293
* [Pre-trained checkpoints & Fine-tuning data](https://console.cloud.google.com/storage/browser/sfr-codet5-data-research)
9394

94-
Instructions for download:
95+
Instructions to download:
9596
```
9697
pip install gsutil
9798
@@ -101,7 +102,7 @@ gsutil -m cp -r \
101102
.
102103
```
103104

104-
The repository structure is shown in the following after download:
105+
The repository structure will look like the following after the download:
105106
```
106107
├── CODE_OF_CONDUCT.md
107108
├── README.md
@@ -168,8 +169,7 @@ summary_dir: where to save the training curves
168169
data_num: how many data instances to use, the default -1 is for using the full data
169170
gpu: the index of the GPU to use in the cluster
170171
```
171-
You can also directly revise the suggested arguments in the [get_args_by_task_model](https://github.com/salesforce/CodeT5/blob/4f8818aea1bf170f019381671087e4c4f9608005/sh/run_exp.py#L14) function.
172-
Please refer to the argument flags in `configs.py` for the full available options.
172+
You can also revise the suggested arguments [here](https://github.com/salesforce/CodeT5/blob/4f8818aea1bf170f019381671087e4c4f9608005/sh/run_exp.py#L14) and refer to the argument flags in [configs.py](https://github.com/salesforce/CodeT5/blob/main/configs.py) for the full available options.
173173
The saved training curves in `summary_dir` can be visualized using [tensorboard](https://pypi.org/project/tensorboard/).
174174

175175
## Get Involved

0 commit comments

Comments
 (0)