You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for all the downstream tasks covered in the paper.
19
+
11
20
**Oct 25, 2021**
12
21
13
-
We release a CodeT5-base fine-tuned checkpoint ([Salesforce/codet5-base-multi-sum](https://huggingface.co/Salesforce/codet5-base-multi-sum)) for multi-lingual code summarzation. Below is how to use this model:
22
+
We release a CodeT5-base fine-tuned
23
+
checkpoint ([Salesforce/codet5-base-multi-sum](https://huggingface.co/Salesforce/codet5-base-multi-sum)) for
24
+
multilingual code summarzation. Below is how to use this model:
14
25
15
26
```python
16
27
from transformers import RobertaTokenizer, T5ForConditionalGeneration
@@ -39,26 +50,17 @@ if __name__ == '__main__':
39
50
# this prints: "Convert a SVG string to a QImage."
40
51
```
41
52
42
-
It significantly outperforms previous methods on code summarization in the [CodeXGLUE benchmark](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Text/code-to-text):
43
-
| Model | Ruby | Javascript | Go | Python | Java | PHP | Overall |
We add a [model card](https://github.com/salesforce/CodeT5/blob/main/CodeT5_model_card.pdf) for CodeT5! Please reach out if you have any questions about it.
55
+
We add a [model card](https://github.com/salesforce/CodeT5/blob/main/CodeT5_model_card.pdf) for CodeT5! Please reach out
56
+
if you have any questions about it.
56
57
57
58
**Sep 24, 2021**
58
59
59
60
CodeT5 is now in [hugginface](https://huggingface.co/)!
60
61
61
-
You can simply load the model ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small) and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and do the inference:
62
+
You can simply load the model ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small)
63
+
and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and do the inference:
62
64
63
65
```python
64
66
from transformers import RobertaTokenizer, T5ForConditionalGeneration
This repo provides the code for reproducing the experiments in [CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation](https://arxiv.org/pdf/2109.00859.pdf).
80
-
CodeT5 is a new pre-trained encoder-decoder model for programming languages, which is pre-trained on **8.35M** functions in 8 programming languages (Python, Java, JavaScript, PHP, Ruby, Go, C, and C#).
81
-
In total, it achieves state-of-the-art results on **14 sub-tasks** in a code intelligence benchmark - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE).
81
+
82
+
This repo provides the code for reproducing the experiments
83
+
in [CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation](https://arxiv.org/pdf/2109.00859.pdf)
84
+
. CodeT5 is a new pre-trained encoder-decoder model for programming languages, which is pre-trained on **8.35M**
85
+
functions in 8 programming languages (Python, Java, JavaScript, PHP, Ruby, Go, C, and C#). In total, it achieves
86
+
state-of-the-art results on **14 sub-tasks** in a code intelligence benchmark - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE).
82
87
83
88
Paper link: https://arxiv.org/abs/2109.00859
84
89
85
90
Blog link: https://blog.einstein.ai/codet5/
86
91
87
-
The code currently includes two pre-trained checkpoints ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small) and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and scripts to fine-tine them on 4 generation tasks (code summarization, code generation, translation, and refinement) plus 2 understanding tasks (code defect detection and clone detection) in CodeXGLUE.
92
+
The code currently includes two pre-trained checkpoints ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small)
93
+
and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and scripts to fine-tine them on 4 generation tasks (
94
+
code summarization, code generation, translation, and refinement) plus 2 understanding tasks (code defect detection and
95
+
clone detection) in CodeXGLUE. We also provide their fine-tuned checkpoints to facilitate the easy replication
96
+
of our paper.
88
97
89
-
In practice, CodeT5 can be deployed as an AI-powered coding assistant to boost the productivity of software developers.
90
-
At Salesforce, we build an [AI coding assistant demo](https://github.com/salesforce/CodeT5/raw/main/codet5.gif) using CodeT5 as a VS Code plugin to provide three capabilities for Apex developers:
98
+
In practice, CodeT5 can be deployed as an AI-powered coding assistant to boost the productivity of software developers.
99
+
At Salesforce, we build an [AI coding assistant demo](https://github.com/salesforce/CodeT5/raw/main/codet5.gif) using
100
+
CodeT5 as a VS Code plugin to provide three capabilities for Apex developers:
91
101
92
102
-**Text-to-code generation**: generate code based on the natural language description.
93
103
-**Code autocompletion**: complete the whole function of code given the target function name.
94
-
-**Code summarization**: generate the summary of a function in natural language description.
104
+
-**Code summarization**: generate the summary of a function in natural language description.
95
105
96
106
## Table of Contents
97
107
@@ -103,7 +113,9 @@ At Salesforce, we build an [AI coding assistant demo](https://github.com/salesfo
103
113
6.[Get Involved](#get-involved)
104
114
105
115
## Citation
116
+
106
117
If you find this code to be useful for your research, please consider citing.
118
+
107
119
```
108
120
@inproceedings{
109
121
wang2021codet5,
@@ -115,116 +127,90 @@ If you find this code to be useful for your research, please consider citing.
115
127
```
116
128
117
129
## License
118
-
The code is released under the BSD-3 License (see `LICENSE.txt` for details), but we also ask that users respect the following:
130
+
131
+
The code is released under the BSD-3 License (see `LICENSE.txt` for details), but we also ask that users respect the
132
+
following:
119
133
120
134
This software should not be used to promote or profit from:
121
135
122
136
violence, hate, and division,
123
137
124
138
environmental destruction,
125
139
126
-
abuse of human rights, or
140
+
abuse of human rights, or
127
141
128
142
the destruction of people's physical and mental health.
129
143
130
-
We encourage users of this software to tell us about the applications in which they are putting it to use by emailing codeT5@salesforce.com, and to use [appropriate](https://arxiv.org/abs/1810.03993)[documentation](https://www.partnershiponai.org/about-ml/) when developing high-stakes applications of this model.
144
+
We encourage users of this software to tell us about the applications in which they are putting it to use by emailing
145
+
codeT5@salesforce.com, and to
146
+
use [appropriate](https://arxiv.org/abs/1810.03993)[documentation](https://www.partnershiponai.org/about-ml/) when
147
+
developing high-stakes applications of this model.
Go to `sh` folder, set the `WORKDIR` in `exp_with_args.sh` to be your downloaded CodeT5 repository path.
205
-
206
-
You can use `run_exp.py` to run a broad set of experiments by simply passing the `model_tag`, `task`, and `sub_task` arguments.
207
-
In total, we support four models (i.e., ['roberta', 'codebert', 'codet5_small', 'codet5_base']) and six tasks (i.e., ['summarize', 'concode', 'translate', 'refine', 'defect', 'clone']).
208
-
For each task, we use the `sub_task` to specify which specific datasets to fine-tine on.
209
-
210
-
For example, if you want to run CodeT5-base model on the code summarization task for Ruby, you can simply run:
174
+
175
+
Go to `sh` folder, set the `WORKDIR` in `exp_with_args.sh` to be your cloned CodeT5 repository path.
176
+
177
+
You can use `run_exp.py` to run a broad set of experiments by simply passing the `model_tag`, `task`, and `sub_task`
178
+
arguments. In total, we support five models (i.e., ['roberta', 'codebert', 'bart_base', 'codet5_small', 'codet5_base'])
179
+
and six tasks (i.e., ['summarize', 'concode', 'translate', 'refine', 'defect', 'clone']). For each task, we use
180
+
the `sub_task` to specify which specific datasets to fine-tine on. Below is the full list:
data_num: how many data instances to use, the default -1 is for using the full data
221
204
gpu: the index of the GPU to use in the cluster
222
205
```
223
-
You can also revise the suggested arguments [here](https://github.com/salesforce/CodeT5/blob/4f8818aea1bf170f019381671087e4c4f9608005/sh/run_exp.py#L14) and refer to the argument flags in [configs.py](https://github.com/salesforce/CodeT5/blob/main/configs.py) for the full available options.
224
-
The saved training curves in `summary_dir` can be visualized using [tensorboard](https://pypi.org/project/tensorboard/).
206
+
207
+
You can also revise the suggested
208
+
arguments [here](https://github.com/salesforce/CodeT5/blob/4f8818aea1bf170f019381671087e4c4f9608005/sh/run_exp.py#L14) or directly customize the [exp_with_args.sh](https://github.com/salesforce/CodeT5/blob/main/sh/exp_with_args.sh) bash file.
209
+
Please refer to the argument flags in [configs.py](https://github.com/salesforce/CodeT5/blob/main/configs.py) for the full
210
+
available options. The saved training curves in `summary_dir` can be visualized using [tensorboard](https://pypi.org/project/tensorboard/).
211
+
225
212
226
213
## Get Involved
227
214
228
-
Please create a GitHub issue if you have any questions, suggestions, requests or bug-reports.
229
-
We welcome PRs!
215
+
Please create a GitHub issue if you have any questions, suggestions, requests or bug-reports. We welcome PRs!
0 commit comments