You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repo provides the code for reproducing the experiments in [CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation](https://arxiv.org/pdf/2109.00859.pdf).
34
-
CodeT5 is a new pre-trained encoder-decoder model for programming languages, which is pre-trained on 8.35M functions in 8 programming languages (Python, Java, JavaScript, PHP, Ruby, Go, C, and C#).
35
-
In total, it achieves state-of-the-art results on 14 sub-tasks in a code intelligence benchmark - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE).
34
+
CodeT5 is a new pre-trained encoder-decoder model for programming languages, which is pre-trained on **8.35M** functions in 8 programming languages (Python, Java, JavaScript, PHP, Ruby, Go, C, and C#).
35
+
In total, it achieves state-of-the-art results on **14 sub-tasks** in a code intelligence benchmark - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE).
36
36
37
37
Paper link: https://arxiv.org/abs/2109.00859
38
38
39
39
Blog link: https://blog.einstein.ai/codet5/
40
40
41
-
The code currently include two pre-trained checkpoints ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small) and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and scripts to fine-tine them on 4 generation tasks (code summarization, code generation, translation, and refinement) plus 2 understanding tasks (code defect detection and clone detection) in CodeXGLUE.
41
+
The code currently includes two pre-trained checkpoints ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small) and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and scripts to fine-tine them on 4 generation tasks (code summarization, code generation, translation, and refinement) plus 2 understanding tasks (code defect detection and clone detection) in CodeXGLUE.
42
42
43
43
In practice, CodeT5 can be deployed as an AI-powered coding assistant to boost the productivity of software developers.
44
-
At Salesforce, we build an [AI coding assistant demo](https://github.com/salesforce/CodeT5/raw/main/codet5.gif) using CodeT5 to provide three capabilities for Apex developers as a VS Code plugin:
44
+
At Salesforce, we build an [AI coding assistant demo](https://github.com/salesforce/CodeT5/raw/main/codet5.gif) using CodeT5 as a VS Code plugin to provide three capabilities for Apex developers:
45
45
46
46
-**Text-to-code generation**: generate code based on the natural language description.
47
47
-**Code autocompletion**: complete the whole function of code given the target function name.
@@ -59,11 +59,12 @@ At Salesforce, we build an [AI coding assistant demo](https://github.com/salesfo
59
59
## Citation
60
60
If you find this code to be useful for your research, please consider citing.
61
61
```
62
-
@article{CodeT5,
63
-
title={CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation},
64
-
author={Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi},
65
-
year={2021},
66
-
journal={arXiv preprint arXiv:2109.00859},
62
+
@inproceedings{
63
+
wang2021codet5,
64
+
title={CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation},
65
+
author={Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi},
66
+
booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021},
67
+
year={2021},
67
68
}
68
69
```
69
70
@@ -91,7 +92,7 @@ We encourage users of this software to tell us about the applications in which t
The repository structure is shown in the following after download:
105
+
The repository structure will look like the following after the download:
105
106
```
106
107
├── CODE_OF_CONDUCT.md
107
108
├── README.md
@@ -168,8 +169,7 @@ summary_dir: where to save the training curves
168
169
data_num: how many data instances to use, the default -1 is for using the full data
169
170
gpu: the index of the GPU to use in the cluster
170
171
```
171
-
You can also directly revise the suggested arguments in the [get_args_by_task_model](https://github.com/salesforce/CodeT5/blob/4f8818aea1bf170f019381671087e4c4f9608005/sh/run_exp.py#L14) function.
172
-
Please refer to the argument flags in `configs.py` for the full available options.
172
+
You can also revise the suggested arguments [here](https://github.com/salesforce/CodeT5/blob/4f8818aea1bf170f019381671087e4c4f9608005/sh/run_exp.py#L14) and refer to the argument flags in [configs.py](https://github.com/salesforce/CodeT5/blob/main/configs.py) for the full available options.
173
173
The saved training curves in `summary_dir` can be visualized using [tensorboard](https://pypi.org/project/tensorboard/).
0 commit comments