Skip to content

Commit d9352b3

Browse files
committed
ICLR2023 instructions
1 parent dce56da commit d9352b3

File tree

2 files changed

+396
-0
lines changed

2 files changed

+396
-0
lines changed

ICLR2023/README.md

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
This is the code and data for the paper: Language Models can teach themselves to code better
2+
https://arxiv.org/abs/2207.14502
3+
4+
LICENSE
5+
MIT License - as already specified in the ../LICENSE file of PythonProgrammingPuzzles repo
6+
7+
GPU USAGE
8+
GPU usage was large , especially for the 2.7B sized model which is ~20X the 125M.
9+
Data generation takes the most GPU usage and took about 2500 GPU hours for 2.7B (on v100)
10+
Finetuning on the 1M generated data took about 40 GPU hours for 2.7B (on v100) per epoch of finetuning - 10 epochs = 400 GPU hours
11+
Solving the 228 problem testset with 100 attempts using the finetuned 2.7B model took about 4 hours (on v100)
12+
We mostly used v100, but we used whatever was available, so T4 and A100 sometimes if they were free.
13+
Tried everything at 125M first - debug there and make it work perfect - then roll out the 1.3 and 2.7 jobs
14+
15+
DATASETS
16+
In data directory are the datasets used. We feel the most interesting dataset is data/Codex_PAPER_1M_iter_0.txt
17+
which is generated by Codex and gave the best results when finetuned on. All the datasets are part of our public release.
18+
19+
SETUP
20+
src/requirements.txt is what we install on our cluster machines - the cluster comes with NVidia drivers and matching pytorch
21+
./requirements.txt is what I personally have installed on my local machine and tested this runs - but it has lots of stuff you don't need
22+
So try src/requirements.txt only - and if that doesn't work - then /requirements.txt has all versions of everything installed on my machine
23+
Getting a deepspeed 0.6.1 matching a pytorch matching a nvidia driver install was tricky for me on some machines, torch 1.10 and 1.11 both work
24+
25+
GENERATING/FINETUNING -> run "cd src, ./babysit.sh GPU_INDEX_TO_USE" -> GPU_INDEX_TO_USE=0 typically
26+
In src/babysit.sh is the script that generates data, and finetunes on that data in a loop, finetuning the GPT-Neo 125M/1.3B/2.7B models
27+
In src/babysit.sh TEST_LOCAL=1 controls running locally on machine's GPUs which is great for fast testing, or =0 is launching on the cluster which is slow but has lots of GPUs
28+
Realistically you have to train on a cluster - data generation takes a long time so having lots of machines all generating data is the feasible approach.
29+
But given enough time - this will run locally on 1 GPU. 1 year for 2.7B, or 2 weeks for 125M.
30+
We found generating 75k samples after deduping worked for iteration_0 - finetune on that data.
31+
Then using that fine_tuned model in iter_1 generating data happens more quickly - the finetuned model solves many more problems
32+
Repeating that process works well.
33+
On 125M we looked at just training on only 125M generated data from iter_0 versus iter_1 versus iter_2 - generating 600K for each iteration.
34+
It seemed finetuning on iter_2 data was best on the testset 26.9/228 solved vs iter_1=26.1/228 vs iter_0=22.2/228
35+
With 1M samples from 125M generated data sampled across all the iterations 0,1,2 we got 26.75/228
36+
We understand why it's faster to generate iter_2 data on a finetuned model - it solves more problems.
37+
But why are the generated puzzles&solutions better for training the model on?
38+
We will explore that more in the future - and try iterating a lot farther than 3 iterations - although our preliminary experiments on 125M show it tops out at 3 iterations
39+
40+
FINETUNING ONLY -> run "cd src, ./fine_tune1.sh GPU_INDEX_TO_USE" -> GPU_INDEX_TO_USE=0 typically
41+
# ./fine_tune1.sh GPU MODEL_TO_TRAIN EXPERIMENT_NAME_DIRECTORY TRAIN_DATA EPOCHS
42+
This allows the repeated finetuning on a specific dataset.
43+
Use this to do a temperature grid search, or try different variations of parameters on a specific dataset.
44+
45+
Detailed instructions for reproducing experiments:
46+
# Generating Codex data
47+
python gen.py -n=32 -max_tokens=4096 -model_path=openai/code-davinci-002 -model_path_solve=openai/code-cushman-001 -out=../data/codex/iter_0 -seed=2022
48+
49+
# Measuring codex accuracy via API calls
50+
./solve2.sh
51+
python solve.py -prefix=../data/train_prefix.txt -attempts=1 -model_path=openai/code-cushman-001 -gpu=0 -fixed_temp=0.8 -out=../data/codex -puzzles=../data/test_228.json -seed=2022 -batch_size=64
52+
53+
# Producing verified Codex_PAPER_1M_iter_0.txt from the puzzle/solution old style data generated by Codex
54+
python preprocess.py -path=../data/codex/old_verified -f_name=Codex_PAPER_1M_iter_0.txt -max_sols_per_puzzle=8 -old_style_json=True -max_examples=1000000 -include_failures=False -seed=2022
55+
cp ../data/codex/old/Codex_PAPER_1M_iter_0.txt ../data/Codex_PAPER_1M_iter_0.txt
56+
57+
# Producing unverified Codex_unverified_PAPER_1M_iter_0.txt from the puzzle/solution old style data generated by Codex
58+
python preprocess.py -path=../data/codex/old_unverified -f_name=Codex_unverified_PAPER_1M_iter_0.txt -max_sols_per_puzzle=8 -old_style_json=True -max_examples=1000000 -include_failures=True -seed=2022
59+
cp ../data/codex/old_unverified/Codex_unverified_PAPER_1M_iter_0.txt ../data/Codex_unverified_PAPER_1M_iter_0.txt
60+
61+
# Producing 125M_PAPER_25K_iter_0.txt from the puzzle/solution new style data
62+
python preprocess.py ../data/125M_PAPER/iter_0 125M_PAPER_25K_iter_0.txt 8 False 25000 False -seed=2022
63+
cp ../data/125M_PAPER/iter_0/125M_PAPER_25K_iter_0.txt ../data/125M_PAPER_25K_iter_0.txt
64+
65+
# Producing 125M_PAPER_1M_iter_1.txt from the puzzle/solution new style data
66+
python preprocess.py ../data/125M_PAPER/iter_1 125M_PAPER_1M_iter_1.txt 8 False 1000000 False -seed=2022
67+
cp ../data/125M_PAPER/iter_1/125M_PAPER_1M_iter_1.txt ../data/125M_PAPER_1M_iter_1.txt
68+
69+
# Producing 125M_PAPER_1M_iter_2.txt from the puzzle/solution new style data13B
70+
python preprocess.py ../data/125M_PAPER/iter_2 125M_PAPER_1M_iter_2.txt 8 False 1000000 False -seed=2022
71+
cp ../data/125M_PAPER/iter_2/125M_PAPER_1M_iter_2.txt ../data/125M_PAPER_1M_iter_2.txt
72+
73+
# Producing 13B_PAPER_25K_iter_0.txt from the puzzle/solution new style data
74+
python preprocess.py ../data/13B_PAPER/iter_0 13B_PAPER_25K_iter_0.txt 8 False 25000 False -seed=2022
75+
cp ../data/13B_PAPER/iter_0/13B_PAPER_25K_iter_0.txt ../data/13B_PAPER_25K_iter_0.txt
76+
77+
# Producing 13B_PAPER_1M_iter_1.txt from the puzzle/solution new style data
78+
python preprocess.py ../data/13B_PAPER/iter_1 13B_PAPER_1M_iter_1.txt 8 False 1000000 False -seed=2022
79+
cp ../data/13B_PAPER/iter_1/13B_PAPER_1M_iter_1.txt ../data/13B_PAPER_1M_iter_1.txt
80+
81+
# Producing 13B_PAPER_1M_iter_2.txt from the puzzle/solution new style data
82+
python preprocess.py ../data/13B_PAPER/iter_2 13B_PAPER_1M_iter_2.txt 8 False 1000000 False -seed=2022
83+
cp ../data/13B_PAPER/iter_2/13B_PAPER_1M_iter_2.txt ../data/13B_PAPER_1M_iter_2.txt
84+
85+
# Producing 27B_PAPER_25K_iter_0.txt from the puzzle/solution new style data
86+
python preprocess.py ../data/27B_PAPER/iter_0 27B_PAPER_25K_iter_0.txt 8 False 25000 False -seed=2022
87+
cp ../data/27B_PAPER/iter_0/27B_PAPER_25K_iter_0.txt ../data/27B_PAPER_25K_iter_0.txt
88+
89+
# Producing 27B_PAPER_1M_iter_1.txt from the puzzle/solution new style data
90+
python preprocess.py ../data/27B_PAPER/iter_1 27B_PAPER_1M_iter_1.txt 8 False 1000000 False -seed=2022
91+
cp ../data/27B_PAPER/iter_1/27B_PAPER_1M_iter_1.txt ../data/27B_PAPER_1M_iter_1.txt
92+
93+
# Producing 27B_PAPER_1M_iter_2.txt from the puzzle/solution new style data
94+
python preprocess.py ../data/27B_PAPER/iter_2 27B_PAPER_1M_iter_2.txt 8 False 1000000 False -seed=2022
95+
cp ../data/27B_PAPER/iter_2/27B_PAPER_1M_iter_2.txt ../data/27B_PAPER_1M_iter_2.txt
96+
97+
# Data files produced by babysit.sh - generating data from gpt-neo-* and Codex
98+
# At the time of experiments running, Codex wasn't finetunable, so only iteration 0 data was available
99+
Codex_PAPER_1M_iter_0.txt
100+
125M_PAPER_25K_iter_0.txt
101+
13B_PAPER_25K_iter_0.txt
102+
27B_PAPER_25K_iter_0.txt
103+
125M_PAPER_1M_iter_1.txt
104+
13B_PAPER_1M_iter_1.txt
105+
27B_PAPER_1M_iter_1.txt
106+
125M_PAPER_1M_iter_2.txt
107+
13B_PAPER_1M_iter_2.txt
108+
27B_PAPER_1M_iter_2.txt
109+
110+
# Figure 5 - 3 diagrams - showing the 3 GPT models trained on verified codex vs unverified codex vs baseline
111+
# 5a GPT-NEO 125M
112+
./fine_tune1.sh 0 125M ft1_Codex_PAPER_1M_iter_0 Codex_PAPER_1M_iter_0.txt
113+
./fine_tune1.sh 0 125M ft1_Codex_unverified_PAPER_1M_iter_0 Codex_unverified_PAPER_1M_iter_0.txt
114+
./solve1.sh 0 125M 10 228
115+
# 5b GPT-NEO 13B
116+
./fine_tune1.sh 0 13B ft1_Codex_PAPER_1M_iter_0 Codex_PAPER_1M_iter_0.txt
117+
./fine_tune1.sh 0 13B ft1_Codex_unverified_PAPER_1M_iter_0 Codex_unverified_PAPER_1M_iter_0.txt
118+
./solve1.sh 0 13B 10 228 5
119+
# 5c GPT-NEO 27B
120+
./fine_tune1.sh 0 27B ft1_Codex_PAPER_1M_iter_0 Codex_PAPER_1M_iter_0.txt
121+
./fine_tune1.sh 0 27B ft1_Codex_unverified_PAPER_1M_iter_0 Codex_unverified_PAPER_1M_iter_0.txt
122+
./solve1.sh 0 13B 10 228 5
123+
124+
# Figure 6 - 3 diagrams - showing test228 Pass@ for the 3 GPT models trained on data from 4 generators (codex and 3 GPT-Neo) and baseline
125+
# 6a - GPT-NEO 125M trained on 4 different datasets and baseline
126+
# ./fine_tune1.sh 0 125M ft1_Codex_PAPER_1M_iter_0 Codex_PAPER_1M_iter_0.txt (dupe of 5a)
127+
./fine_tune1.sh 0 125M ft1_125M_PAPER_1M_iter_2 125M_PAPER_1M_iter_2.txt
128+
./fine_tune1.sh 0 125M ft1_13B_PAPER_1M_iter_2 13B_PAPER_1M_iter_2.txt
129+
./fine_tune1.sh 0 125M ft1_27B_PAPER_1M_iter_2 27B_PAPER_1M_iter_2.txt
130+
131+
# 6b - GPT-NEO 13B trained on 4 different datasets and baseline
132+
# ./fine_tune1.sh 0 13B ft1_Codex_PAPER_1M_iter_0 Codex_PAPER_1M_iter_0.txt (dupe of 5b)
133+
./fine_tune1.sh 0 13B ft1_125M_PAPER_1M_iter_2 125M_PAPER_1M_iter_2.txt
134+
./fine_tune1.sh 0 13B ft1_13B_PAPER_1M_iter_2 13B_PAPER_1M_iter_2.txt
135+
./fine_tune1.sh 0 13B ft1_27B_PAPER_1M_iter_2 27B_PAPER_1M_iter_2.txt
136+
137+
# 6c - GPT-NEO 27B trained on 4 different datasets and baseline
138+
# ./fine_tune1.sh 0 27B ft1_Codex_PAPER_1M_iter_0 Codex_PAPER_1M_iter_0.txt (dupe of 5c)
139+
./fine_tune1.sh 0 27B ft1_125M_PAPER_1M_iter_2 125M_PAPER_1M_iter_2.txt
140+
./fine_tune1.sh 0 27B ft1_13B_PAPER_1M_iter_2 13B_PAPER_1M_iter_2.txt
141+
./fine_tune1.sh 0 27B ft1_27B_PAPER_1M_iter_2 27B_PAPER_1M_iter_2.txt
142+
143+
# Launch on torch2020 - edit solve.yaml for correct parameters of model and epoch
144+
./tst_human_eval_base.sh 0 125M 1024
145+
./tst_human_eval_ft1.sh 0 125M 1024
146+
./tst_human_eval_ft5.sh 0 125M 1024
147+
./tst_human_eval_ft10.sh 0 125M 1024

ICLR2023/requirements.txt

Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
adal==1.2.7
2+
aiohttp==3.8.1
3+
aiosignal==1.2.0
4+
amlt==8.0.9
5+
applicationinsights==0.11.10
6+
asn1crypto==0.24.0
7+
astor==0.8.1
8+
async-timeout==4.0.1
9+
attrs==17.4.0
10+
Automat==0.6.0
11+
azure-common==1.1.27
12+
azure-core==1.17.0
13+
azure-data-tables==12.0.0b6
14+
azure-graphrbac==0.61.1
15+
azure-identity==1.4.1
16+
azure-mgmt-authorization==0.61.0
17+
azure-mgmt-containerregistry==2.8.0
18+
azure-mgmt-keyvault==2.2.0
19+
azure-mgmt-resource==13.0.0
20+
azure-mgmt-storage==11.2.0
21+
azure-storage-blob==2.1.0
22+
azure-storage-common==2.1.0
23+
azure-storage-file==2.1.0
24+
azureml-automl-core==1.26.0
25+
azureml-contrib-k8s==0.1.16
26+
azureml-contrib-pipeline-steps==1.26.0
27+
azureml-core==1.26.0
28+
azureml-dataprep==2.13.2
29+
azureml-dataprep-native==32.0.0
30+
azureml-dataprep-rslex==1.11.2
31+
azureml-dataset-runtime==1.26.0
32+
azureml-k8s-mt==1.0.4
33+
azureml-pipeline-core==1.26.0
34+
azureml-pipeline-steps==1.26.0
35+
azureml-telemetry==1.26.0
36+
azureml-train-automl-client==1.26.0
37+
azureml-train-core==1.26.0
38+
azureml-train-restclients-hyperdrive==1.26.0
39+
backcall==0.2.0
40+
backports.tempfile==1.0
41+
backports.weakref==1.0.post1
42+
beautifulsoup4==4.9.3
43+
bitstring==3.1.9
44+
black==21.8b0
45+
blinker==1.4
46+
blis==0.7.4
47+
blobxfer==1.10.0
48+
cachetools==4.2.2
49+
catalogue==2.0.6
50+
certifi==2018.1.18
51+
cffi==1.14.6
52+
chardet==3.0.4
53+
charset-normalizer==2.0.7
54+
click==7.1.2
55+
click-completion @ git+https://github.com/temporaer/click-completion.git@41b21868cac0781d25b37da624bae2fd1f36be88
56+
click-option-group==0.5.3
57+
click-plugins==1.1.1
58+
cloud-init==20.2
59+
cloudpickle==1.6.0
60+
colorama==0.3.7
61+
colorlog==6.4.1
62+
command-not-found==0.3
63+
configobj==5.0.6
64+
configparser==5.0.2
65+
constantly==15.1.0
66+
contextlib2==21.6.0
67+
cryptography==3.4.8
68+
cycler==0.10.0
69+
cymem==2.0.5
70+
datasets==1.15.1
71+
debugpy==1.4.3
72+
decorator==5.0.9
73+
deepspeed==0.5.1
74+
dill==0.3.4
75+
distro==1.6.0
76+
distro-info===0.18ubuntu0.18.04.1
77+
docker==5.0.1
78+
docker-pycreds==0.4.0
79+
dotnetcore2==2.1.21
80+
ecdsa==0.17.0
81+
entrypoints==0.3
82+
et-xmlfile==1.1.0
83+
fail2ban==0.10.2
84+
fastai==2.5.2
85+
fastcore==1.3.26
86+
fastdownload==0.0.5
87+
fastprogress==1.0.0
88+
filelock==3.0.12
89+
Flask==2.0.1
90+
Flask-Cors==3.0.10
91+
Flask-Executor==0.9.4
92+
Flask-FontAwesome==0.1.5
93+
frozenlist==1.2.0
94+
fsspec==2021.11.0
95+
gitdb==4.0.7
96+
GitPython==3.1.18
97+
httplib2==0.9.2
98+
huggingface-hub==0.1.2
99+
humanize==3.11.0
100+
hyperlink==17.3.1
101+
idna==2.6
102+
incremental==16.10.1
103+
ipdb==0.13.9
104+
ipykernel==6.4.1
105+
ipython==7.27.0
106+
ipython-genutils==0.2.0
107+
isodate==0.6.0
108+
itsdangerous==2.0.1
109+
jedi==0.18.0
110+
Jinja2==3.0.1
111+
jmespath==0.10.0
112+
joblib==1.0.1
113+
jsonpatch==1.16
114+
jsonpickle==2.0.0
115+
jsonpointer==1.10
116+
jsonschema==2.6.0
117+
jupyter-client==7.0.5
118+
jupyter-core==4.8.1
119+
keyring==10.6.0
120+
keyrings.alt==3.0
121+
kiwisolver==1.3.2
122+
language-selector==0.1
123+
libtmux==0.10.1
124+
Mako==1.1.5
125+
MarkupSafe==2.0.1
126+
marshmallow==3.10.0
127+
matplotlib==3.4.3
128+
matplotlib-inline==0.1.3
129+
mlb-core==0.0.4
130+
msal==1.14.0
131+
msal-extensions==0.2.2
132+
msrest==0.6.19
133+
msrestazure==0.6.4
134+
multidict==5.2.0
135+
multiprocess==0.70.12.2
136+
murmurhash==1.0.5
137+
mypy-extensions==0.4.3
138+
ndg-httpsclient==0.5.1
139+
nest-asyncio==1.5.1
140+
netifaces==0.10.4
141+
ninja==1.10.2
142+
ntlm-auth==1.5.0
143+
numpy==1.21.2
144+
oauthlib==3.1.1
145+
openai==0.13.0
146+
openpyxl==3.0.9
147+
orderedset==2.0.3
148+
packaging==21.0
149+
PAM==0.4.2
150+
pandas==1.3.2
151+
pandas-stubs==1.2.0.45
152+
parso==0.8.2
153+
passpy==1.0.2
154+
pathspec==0.9.0
155+
pathtools==0.1.2
156+
pathy==0.6.0
157+
Pebble==4.6.3
158+
petname==2.6
159+
pexpect==4.8.0
160+
pickleshare==0.7.5
161+
Pillow==8.3.2
162+
platformdirs==2.3.0
163+
portalocker==1.7.1
164+
preshed==3.0.5
165+
promise==2.3
166+
prompt-toolkit==3.0.20
167+
protobuf==3.17.3
168+
psb2==1.0.0
169+
psutil==5.8.0
170+
ptyprocess==0.7.0
171+
pyarrow==1.0.1
172+
pyasn1==0.4.2
173+
pyasn1-modules==0.2.1
174+
pycparser==2.20
175+
pycrypto==2.6.1
176+
pydantic==1.8.2
177+
Pygments==2.10.0
178+
PyGObject==3.26.1
179+
PyJWT==1.5.3
180+
pyOpenSSL==17.5.0
181+
pyparsing==2.4.7
182+
pyperclip==1.8.2
183+
pyserial==3.4
184+
python-apt==1.6.5+ubuntu0.3
185+
python-dateutil==2.8.2
186+
python-debian==0.1.32
187+
python-gnupg==0.4.7
188+
pytz==2021.1
189+
pyxdg==0.25
190+
PyYAML==5.4.1
191+
pyzmq==22.3.0
192+
regex==2021.8.28
193+
requests==2.25.1
194+
requests-ntlm==1.1.0
195+
requests-oauthlib==1.3.0
196+
requests-unixsocket==0.1.5
197+
ruamel.yaml==0.17.16
198+
ruamel.yaml.clib==0.2.6
199+
sacremoses==0.0.45
200+
scikit-learn==0.24.2
201+
scipy==1.7.1
202+
SecretStorage==2.3.1
203+
sentry-sdk==1.3.1
204+
service-identity==16.0.0
205+
shellingham==1.4.0
206+
shortuuid==1.0.1
207+
six==1.16.0
208+
sklearn==0.0
209+
smart-open==5.2.1
210+
smmap==4.0.0
211+
soupsieve==2.2.1
212+
spacy==3.1.2
213+
spacy-legacy==3.0.8
214+
srsly==2.4.1
215+
ssh-import-id==5.7
216+
sshpubkeys==3.3.1
217+
strictfire==0.4.1
218+
subprocess32==3.5.4
219+
systemd-python==234
220+
tabulate==0.8.9
221+
tensorboardX==1.8
222+
termcolor==1.1.0
223+
thinc==8.0.10
224+
threadpoolctl==2.2.0
225+
tokenizers==0.10.3
226+
toml==0.10.2
227+
tomli==1.2.1
228+
torch==1.9.0
229+
torchvision==0.10.0
230+
tornado==6.1
231+
tqdm==4.62.2
232+
traitlets==5.1.0
233+
transformers==4.10.0
234+
Twisted==17.9.0
235+
typer==0.3.2
236+
typing-extensions==3.10.0.2
237+
ufw==0.36
238+
unattended-upgrades==0.1
239+
urllib3==1.26.6
240+
virtualenv==15.1.0
241+
WALinuxAgent==2.2.45
242+
wasabi==0.8.2
243+
wcwidth==0.2.5
244+
websocket-client==1.2.1
245+
Werkzeug==2.0.1
246+
xdg==5.1.1
247+
xxhash==2.0.2
248+
yarl==1.7.2
249+
zope.interface==4.3.2

0 commit comments

Comments
 (0)