Skip to content

Commit 9c9755a

Browse files
committed
AugReg release
1 parent 381b279 commit 9c9755a

File tree

3 files changed

+35
-7
lines changed

3 files changed

+35
-7
lines changed

README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,25 @@ I'm fortunate to be able to dedicate significant time and money of my own suppor
2323

2424
## What's New
2525

26+
### June 20, 2021
27+
* Release Vision Transformer 'AugReg' weights from [How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers](https://arxiv.org/abs/2106.10270)
28+
* .npz weight loading support added, can load any of the 50K+ weights from the [AugReg series](https://console.cloud.google.com/storage/browser/vit_models/augreg)
29+
* See [example notebook](https://colab.research.google.com/github/google-research/vision_transformer/blob/master/vit_jax_augreg.ipynb) from official impl for navigating the augreg weights
30+
* Replaced all default weights w/ best AugReg variant (if possible). All AugReg 21k classifiers work.
31+
* Highlights: `vit_large_patch16_384` (87.1 top-1), `vit_large_r50_s32_384` (86.2 top-1), `vit_base_patch16_384` (86.0 top-1)
32+
* `vit_deit_*` renamed to just `deit_*`
33+
* Remove my old small model, replace with DeiT compatible small w/ AugReg weights
34+
* Add 1st training of my `gmixer_24_224` MLP /w GLU, 78.1 top-1 w/ 25M params.
35+
* Add weights from official ResMLP release (https://github.com/facebookresearch/deit)
36+
* Add `eca_nfnet_l2` weights from my 'lightweight' series. 84.7 top-1 at 384x384.
37+
* Add distilled BiT 50x1 student and 152x2 Teacher weights from [Knowledge distillation: A good teacher is patient and consistent](https://arxiv.org/abs/2106.05237)
38+
* NFNets and ResNetV2-BiT models work w/ Pytorch XLA now
39+
* weight standardization uses F.batch_norm instead of std_mean (std_mean wasn't lowered)
40+
* eps values adjusted, will be slight differences but should be quite close
41+
* Improve test coverage and classifier interface of non-conv (vision transformer and mlp) models
42+
* Cleanup a few classifier / flatten details for models w/ conv classifiers or early global pool
43+
* Please report any regressions, this PR touched quite a few models.
44+
2645
### June 8, 2021
2746
* Add first ResMLP weights, trained in PyTorch XLA on TPU-VM w/ my XLA branch. 24 block variant, 79.2 top-1.
2847
* Add ResNet51-Q model w/ pretrained weights at 82.36 top-1.

timm/models/vision_transformer.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
11
""" Vision Transformer (ViT) in PyTorch
22
3-
A PyTorch implement of Vision Transformers as described in
4-
'An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale' - https://arxiv.org/abs/2010.11929
3+
A PyTorch implement of Vision Transformers as described in:
4+
5+
'An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale'
6+
- https://arxiv.org/abs/2010.11929
7+
8+
`How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers`
9+
- https://arxiv.org/abs/2106.TODO
510
611
The official jax code is released and available at https://github.com/google-research/vision_transformer
712
@@ -15,7 +20,7 @@
1520
* Simple transformer style inspired by Andrej Karpathy's https://github.com/karpathy/minGPT
1621
* Bert reference code checks against Huggingface Transformers and Tensorflow Bert
1722
18-
Hacked together by / Copyright 2020 Ross Wightman
23+
Hacked together by / Copyright 2021 Ross Wightman
1924
"""
2025
import math
2126
import logging

timm/models/vision_transformer_hybrid.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,17 @@
11
""" Hybrid Vision Transformer (ViT) in PyTorch
22
3-
A PyTorch implement of the Hybrid Vision Transformers as described in
3+
A PyTorch implement of the Hybrid Vision Transformers as described in:
4+
45
'An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale'
56
- https://arxiv.org/abs/2010.11929
67
7-
NOTE This relies on code in vision_transformer.py. The hybrid model definitions were moved here to
8-
keep file sizes sane.
8+
`How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers`
9+
- https://arxiv.org/abs/2106.TODO
10+
11+
NOTE These hybrid model definitions depend on code in vision_transformer.py.
12+
They were moved here to keep file sizes sane.
913
10-
Hacked together by / Copyright 2020 Ross Wightman
14+
Hacked together by / Copyright 2021 Ross Wightman
1115
"""
1216
from copy import deepcopy
1317
from functools import partial

0 commit comments

Comments
 (0)