@@ -12,73 +12,50 @@ welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CO
1212
1313[ Tensor2Tensor] ( https://github.com/tensorflow/tensor2tensor ) , or
1414[ T2T] ( https://github.com/tensorflow/tensor2tensor ) for short, is a library
15- of deep learning models and datasets. It has binaries to train the models and
16- to download and prepare the data for you. T2T is modular and extensible and can
17- be used in [ notebooks] ( https://goo.gl/wkHexj ) for prototyping your own models
18- or running existing ones on your data. It is actively used and maintained by
19- researchers and engineers within
20- the [ Google Brain team] ( https://research.google.com/teams/brain/ ) and was used
21- to develop state-of-the-art models for translation (see
22- [ Attention Is All You Need] ( https://arxiv.org/abs/1706.03762 ) ), summarization,
23- image generation and other tasks. You can read
24- more about T2T in the [ Google Research Blog post introducing
25- it] ( https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html ) .
26-
27- We're eager to collaborate with you on extending T2T, so please feel
28- free to [ open an issue on
29- GitHub] ( https://github.com/tensorflow/tensor2tensor/issues ) or
30- send along a pull request to add your dataset or model.
31- See [ our contribution
32- doc] ( CONTRIBUTING.md ) for details and our [ open
33- issues] ( https://github.com/tensorflow/tensor2tensor/issues ) .
34- You can chat with us and other users on
35- [ Gitter] ( https://gitter.im/tensor2tensor/Lobby ) and please join our
36- [ Google Group] ( https://groups.google.com/forum/#!forum/tensor2tensor ) to keep up
37- with T2T announcements.
15+ of deep learning models and datasets designed to [ accelerate deep learning
16+ research] ( https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html ) and make it more accessible.
17+
18+ T2T is actively used and maintained by researchers and engineers within the
19+ [ Google Brain team] ( https://research.google.com/teams/brain/ ) and a community
20+ of users. We're eager to collaborate with you too, so feel free to
21+ [ open an issue on GitHub] ( https://github.com/tensorflow/tensor2tensor/issues )
22+ or send along a pull request (see [ our contribution doc] ( CONTRIBUTING.md ) ).
23+ You can chat with us on
24+ [ Gitter] ( https://gitter.im/tensor2tensor/Lobby ) and join the
25+ [ T2T Google Group] ( https://groups.google.com/forum/#!forum/tensor2tensor ) .
3826
3927### Quick Start
4028
4129[ This iPython notebook] ( https://goo.gl/wkHexj ) explains T2T and runs in your
4230browser using a free VM from Google, no installation needed.
43-
44- Alternatively, here is a one-command version that installs T2T, downloads data,
45- trains an English-German translation model, and evaluates it:
31+ Alternatively, here is a one-command version that installs T2T, downloads MNIST,
32+ trains a model and evaluates it:
4633
4734```
4835pip install tensor2tensor && t2t-trainer \
4936 --generate_data \
5037 --data_dir=~/t2t_data \
51- --problems=translate_ende_wmt32k \
52- --model=transformer \
53- --hparams_set=transformer_base_single_gpu \
54- --output_dir=~/t2t_train/base
55- ```
56-
57- You can decode from the model interactively:
58-
59- ```
60- t2t-decoder \
61- --data_dir=~/t2t_data \
62- --problems=translate_ende_wmt32k \
63- --model=transformer \
64- --hparams_set=transformer_base_single_gpu \
65- --output_dir=~/t2t_train/base \
66- --decode_interactive
38+ --output_dir=~/t2t_train/mnist \
39+ --problems=image_mnist \
40+ --model=shake_shake \
41+ --hparams_set=shake_shake_quick \
42+ --train_steps=1000 \
43+ --eval_steps=100
6744```
6845
69- See the [ Walkthrough] ( #walkthrough ) below for more details on each step
70- and [ Suggested Models] ( #suggested-models ) for well performing models
71- on common tasks.
72-
7346### Contents
7447
75- * [ Walkthrough] ( #walkthrough )
76- * [ Suggested Models] ( #suggested-models )
77- * [ Translation] ( #translation )
78- * [ Summarization] ( #summarization )
48+ * [ Suggested Datasets and Models] ( #suggested-datasets-and-models )
7949 * [ Image Classification] ( #image-classification )
80- * [ Installation] ( #installation )
81- * [ Features] ( #features )
50+ * [ Language Modeling] ( #language-modeling )
51+ * [ Sentiment Analysis] ( #sentiment-analysis )
52+ * [ Speech Recognition] ( #speech-recognition )
53+ * [ Summarization] ( #summarization )
54+ * [ Translation] ( #translation )
55+ * [ Basics] ( #basics )
56+ * [ Walkthrough] ( #walkthrough )
57+ * [ Installation] ( #installation )
58+ * [ Features] ( #features )
8259* [ T2T Overview] ( #t2t-overview )
8360 * [ Datasets] ( #datasets )
8461 * [ Problems and Modalities] ( #problems-and-modalities )
@@ -87,10 +64,102 @@ on common tasks.
8764 * [ Trainer] ( #trainer )
8865* [ Adding your own components] ( #adding-your-own-components )
8966* [ Adding a dataset] ( #adding-a-dataset )
67+ * [ Papers] ( #papers )
68+
69+ ## Suggested Datasets and Models
9070
91- ---
71+ Below we list a number of tasks that can be solved with T2T when
72+ you train the appropriate model on the appropriate problem.
73+ We give the problem and model below and we suggest a setting of
74+ hyperparameters that we know works well in our setup. We usually
75+ run either on Cloud TPUs or on 8-GPU machines; you might need
76+ to modify the hyperparameters if you run on a different setup.
9277
93- ## Walkthrough
78+ ### Image Classification
79+
80+ For image classification, we have a number of standard data-sets:
81+ * ImageNet (a large data-set): ` --problems=image_imagenet ` , or one
82+ of the re-scaled versions (` image_imagenet224 ` , ` image_imagenet64 ` ,
83+ ` image_imagenet32 ` )
84+ * CIFAR-10: ` --problems=image_cifar10 ` (or
85+ ` --problems=image_cifar10_plain ` to turn off data augmentation)
86+ * CIFAR-100: ` --problems=image_cifar100 `
87+ * MNIST: ` --problems=image_mnist `
88+
89+ For ImageNet, we suggest to use the ResNet or Xception, i.e.,
90+ use ` --model=resnet --hparams_set=resnet_50 ` or
91+ ` --model=xception --hparams_set=xception_base ` .
92+ Resnet should get to above 76% top-1 accuracy on ImageNet.
93+
94+ For CIFAR and MNIST, we suggest to try the shake-shake model:
95+ ` --model=shake_shake --hparams_set=shakeshake_big ` .
96+ This setting trained for ` --train_steps=700000 ` should yield
97+ close to 97% accuracy on CIFAR-10.
98+
99+ ### Language Modeling
100+
101+ For language modeling, we have these data-sets in T2T:
102+ * PTB (a small data-set): ` --problems=languagemodel_ptb10k ` for
103+ word-level modeling and ` --problems=languagemodel_ptb_characters `
104+ for character-level modeling.
105+ * LM1B (a billion-word corpus): ` --problems=languagemodel_lm1b32k ` for
106+ subword-level modeling and ` --problems=languagemodel_lm1b_characters `
107+ for character-level modeling.
108+
109+ We suggest to start with ` --model=transformer ` on this task and use
110+ ` --hparams_set=transformer_small ` for PTB and
111+ ` --hparams_set=transformer_base ` for LM1B.
112+
113+ ### Sentiment Analysis
114+
115+ For the task of recognizing the sentiment of a sentence, use
116+ * the IMDB data-set: ` --problems=sentiment_imdb `
117+
118+ We suggest to use ` --model=transformer_encoder ` here and since it is
119+ a small data-set, try ` --hparams_set=transformer_tiny ` and train for
120+ few steps (e.g., ` --train_steps=2000 ` ).
121+
122+ ### Speech Recognition
123+
124+ For speech-to-text, we have these data-sets in T2T:
125+ * Librispeech (English speech to text): ` --problems=librispeech ` for
126+ the whole set and ` --problems=librispeech_clean ` for a smaller
127+ but nicely filtered part.
128+
129+ ### Summarization
130+
131+ For summarizing longer text into shorter one we have these data-sets:
132+ * CNN/DailyMail articles summarized into a few sentences:
133+ ` --problems=summarize_cnn_dailymail32k `
134+
135+ We suggest to use ` --model=transformer ` and
136+ ` --hparams_set=transformer_prepend ` for this task.
137+ This yields good ROUGE scores.
138+
139+ ### Translation
140+
141+ There are a number of translation data-sets in T2T:
142+ * English-German: ` --problems=translate_ende_wmt32k `
143+ * English-French: ` --problems=translate_enfr_wmt32k `
144+ * English-Czech: ` --problems=translate_encs_wmt32k `
145+ * English-Chinese: ` --problems=translate_enzh_wmt32k `
146+
147+ You can get translations in the other direction by appending ` _rev ` to
148+ the problem name, e.g., for German-English use
149+ ` --problems=translate_ende_wmt32k_rev ` .
150+
151+ For all translation problems, we suggest to try the Transformer model:
152+ ` --model=transformer ` . At first it is best to try the base setting,
153+ ` --hparams_set=transformer_base ` . When trained on 8 GPUs for 300K steps
154+ this should reach a BLEU score of about 28 on the English-German data-set,
155+ which is close to state-of-the art. If training on a single GPU, try the
156+ ` --hparams_set=transformer_base_single_gpu ` setting. For very good results
157+ or larger data-sets (e.g., for English-French)m, try the big model
158+ with ` --hparams_set=transformer_big ` .
159+
160+ ## Basics
161+
162+ ### Walkthrough
94163
95164Here's a walkthrough training a good English-to-German translation
96165model using the Transformer model from [ * Attention Is All You
@@ -156,36 +225,8 @@ cat translation.en
156225t2t-bleu --translation=translation.en --reference=ref-translation.de
157226```
158227
159- ---
160-
161- ## Suggested Models
162-
163- Here are some combinations of models, hparams and problems that we found
164- work well, so we suggest to use them if you're interested in that problem.
165-
166- ### Translation
167-
168- For translation, esp. English-German and English-French, we suggest to use
169- the Transformer model in base or big configurations, i.e.
170- for ` --problems=translate_ende_wmt32k ` use ` --model=transformer ` and
171- ` --hparams_set=transformer_base ` . When trained on 8 GPUs for 300K steps
172- this should reach a BLEU score of about 28.
228+ ### Installation
173229
174- ### Summarization
175-
176- For summarization suggest to use the Transformer model in prepend mode, i.e.
177- for ` --problems=summarize_cnn_dailymail32k ` use ` --model=transformer ` and
178- ` --hparams_set=transformer_prepend ` .
179-
180- ### Image Classification
181-
182- For image classification suggest to use the ResNet or Xception, i.e.
183- for ` --problems=image_imagenet ` use ` --model=resnet50 ` and
184- ` --hparams_set=resnet_base ` or ` --model=xception ` and
185- ` --hparams_set=xception_base ` .
186-
187-
188- ## Installation
189230
190231```
191232# Assumes tensorflow or tensorflow-gpu installed
@@ -214,9 +255,7 @@ Library usage:
214255python -c "from tensor2tensor.models.transformer import Transformer"
215256```
216257
217- ---
218-
219- ## Features
258+ ### Features
220259
221260* Many state of the art and baseline models are built-in and new models can be
222261 added easily (open an issue or pull request!).
@@ -229,11 +268,10 @@ python -c "from tensor2tensor.models.transformer import Transformer"
229268 specification.
230269* Support for multi-GPU machines and synchronous (1 master, many workers) and
231270 asynchronous (independent workers synchronizing through a parameter server)
232- [ distributed training] ( https://github.com/tensorflow/ tensor2tensor/tree/master/docs/ distributed_training.md ) .
271+ [ distributed training] ( https://tensorflow. github.io/ tensor2tensor/distributed_training.html ) .
233272* Easily swap amongst datasets and models by command-line flag with the data
234273 generation script ` t2t-datagen ` and the training script ` t2t-trainer ` .
235-
236- ---
274+ * Train on [ Google Cloud ML] ( https://tensorflow.github.io/tensor2tensor/cloud_mlengine.html ) and [ Cloud TPUs] ( https://tensorflow.github.io/tensor2tensor/cloud_tpu.html ) .
237275
238276## T2T overview
239277
@@ -289,9 +327,7 @@ inference. Users can easily switch between problems, models, and hyperparameter
289327sets by using the ` --model ` , ` --problems ` , and ` --hparams_set ` flags. Specific
290328hyperparameters can be overridden with the ` --hparams ` flag. ` --schedule ` and
291329related flags control local and distributed training/evaluation
292- ([ distributed training documentation] ( https://github.com/tensorflow/tensor2tensor/tree/master/docs/distributed_training.md ) ).
293-
294- ---
330+ ([ distributed training documentation] ( https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/g3doc/distributed_training.md ) ).
295331
296332## Adding your own components
297333
@@ -317,6 +353,21 @@ for an example.
317353Also see the [ data generators
318354README] ( https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/README.md ) .
319355
320- ---
356+ ## Papers
357+
358+ Tensor2Tensor was used to develop a number of state-of-the-art models
359+ and deep learning methods. Here we list some papers that were based on T2T
360+ from the start and benefited from its features and architecture in ways
361+ described in the [ Google Research Blog post introducing
362+ T2T] ( https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html ) .
363+
364+ * [ Attention Is All You Need] ( https://arxiv.org/abs/1706.03762 )
365+ * [ Depthwise Separable Convolutions for Neural Machine
366+ Translation] ( https://arxiv.org/abs/1706.03059 )
367+ * [ One Model To Learn Them All] ( https://arxiv.org/abs/1706.05137 )
368+ * [ Discrete Autoencoders for Sequence Models] ( https://arxiv.org/abs/1801.09797 )
369+ * [ Generating Wikipedia by Summarizing Long
370+ Sequences] ( https://arxiv.org/abs/1801.10198 )
371+ * [ Image Transformer] ( https://openreview.net/forum?id=r16Vyf-0- )
321372
322373* Note: This is not an official Google product.*
0 commit comments