Skip to content

Commit 6cd28bc

Browse files
authored
Merge branch 'huggingface:main' into master
2 parents 4aa166d + f2fdd97 commit 6cd28bc

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+11039
-6297
lines changed

.github/workflows/build_documentation.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,5 +17,4 @@ jobs:
1717
path_to_docs: pytorch-image-models/hfdocs/source
1818
version_tag_suffix: ""
1919
secrets:
20-
token: ${{ secrets.HUGGINGFACE_PUSH }}
21-
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
20+
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}

.github/workflows/delete_doc_comment.yml

Lines changed: 0 additions & 13 deletions
This file was deleted.

.github/workflows/delete_doc_comment_trigger.yml

Lines changed: 0 additions & 12 deletions
This file was deleted.

.github/workflows/tests.yml

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,12 @@ jobs:
1616
strategy:
1717
matrix:
1818
os: [ubuntu-latest]
19-
python: ['3.10']
20-
torch: ['1.13.0']
21-
torchvision: ['0.14.0']
19+
python: ['3.10', '3.11']
20+
torch: [{base: '1.13.0', vision: '0.14.0'}, {base: '2.1.0', vision: '0.16.0'}]
2221
testmarker: ['-k "not test_models"', '-m base', '-m cfg', '-m torchscript', '-m features', '-m fxforward', '-m fxbackward']
22+
exclude:
23+
- python: '3.11'
24+
torch: {base: '1.13.0', vision: '0.14.0'}
2325
runs-on: ${{ matrix.os }}
2426

2527
steps:
@@ -34,17 +36,17 @@ jobs:
3436
pip install -r requirements-dev.txt
3537
- name: Install torch on mac
3638
if: startsWith(matrix.os, 'macOS')
37-
run: pip install --no-cache-dir torch==${{ matrix.torch }} torchvision==${{ matrix.torchvision }}
39+
run: pip install --no-cache-dir torch==${{ matrix.torch.base }} torchvision==${{ matrix.torch.vision }}
3840
- name: Install torch on Windows
3941
if: startsWith(matrix.os, 'windows')
40-
run: pip install --no-cache-dir torch==${{ matrix.torch }} torchvision==${{ matrix.torchvision }}
42+
run: pip install --no-cache-dir torch==${{ matrix.torch.base }} torchvision==${{ matrix.torch.vision }}
4143
- name: Install torch on ubuntu
4244
if: startsWith(matrix.os, 'ubuntu')
4345
run: |
4446
sudo sed -i 's/azure\.//' /etc/apt/sources.list
4547
sudo apt update
4648
sudo apt install -y google-perftools
47-
pip install --no-cache-dir torch==${{ matrix.torch }}+cpu torchvision==${{ matrix.torchvision }}+cpu -f https://download.pytorch.org/whl/torch_stable.html
49+
pip install --no-cache-dir torch==${{ matrix.torch.base }}+cpu torchvision==${{ matrix.torch.vision }}+cpu -f https://download.pytorch.org/whl/torch_stable.html
4850
- name: Install requirements
4951
run: |
5052
pip install -r requirements.txt

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
*This guideline is very much a work-in-progress.*
22

3-
Contriubtions to `timm` for code, documentation, tests are more than welcome!
3+
Contributions to `timm` for code, documentation, tests are more than welcome!
44

55
There haven't been any formal guidelines to date so please bear with me, and feel free to add to this guide.
66

@@ -49,7 +49,7 @@ This is YES:
4949
}
5050
```
5151

52-
When there is descrepancy in a given source file (there are many origins for various bits of code and not all have been updated to what I consider current goal), please follow the style in a given file.
52+
When there is discrepancy in a given source file (there are many origins for various bits of code and not all have been updated to what I consider current goal), please follow the style in a given file.
5353

5454
In general, if you add new code, formatting it with black using the following options should result in a style that is compatible with the rest of the code base:
5555

README.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,36 @@
2626
* The Hugging Face Hub (https://huggingface.co/timm) is now the primary source for `timm` weights. Model cards include link to papers, original source, license.
2727
* Previous 0.6.x can be cloned from [0.6.x](https://github.com/rwightman/pytorch-image-models/tree/0.6.x) branch or installed via pip with version.
2828

29+
### Nov 23, 2023
30+
* Added EfficientViT-Large models, thanks [SeeFun](https://github.com/seefun)
31+
* Fix Python 3.7 compat, will be dropping support for it soon
32+
* Other misc fixes
33+
* Release 0.9.12
34+
35+
### Nov 20, 2023
36+
* Added significant flexibility for Hugging Face Hub based timm models via `model_args` config entry. `model_args` will be passed as kwargs through to models on creation.
37+
* See example at https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m_ft_as20k/blob/main/config.json
38+
* Usage: https://github.com/huggingface/pytorch-image-models/discussions/2035
39+
* Updated imagenet eval and test set csv files with latest models
40+
* `vision_transformer.py` typing and doc cleanup by [Laureηt](https://github.com/Laurent2916)
41+
* 0.9.11 release
42+
43+
### Nov 3, 2023
44+
* [DFN (Data Filtering Networks)](https://huggingface.co/papers/2309.17425) and [MetaCLIP](https://huggingface.co/papers/2309.16671) ViT weights added
45+
* DINOv2 'register' ViT model weights added (https://huggingface.co/papers/2309.16588, https://huggingface.co/papers/2304.07193)
46+
* Add `quickgelu` ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient)
47+
* Improved typing added to ResNet, MobileNet-v3 thanks to [Aryan](https://github.com/a-r-r-o-w)
48+
* ImageNet-12k fine-tuned (from LAION-2B CLIP) `convnext_xxlarge`
49+
* 0.9.9 release
50+
51+
### Oct 20, 2023
52+
* [SigLIP](https://huggingface.co/papers/2303.15343) image tower weights supported in `vision_transformer.py`.
53+
* Great potential for fine-tune and downstream feature use.
54+
* Experimental 'register' support in vit models as per [Vision Transformers Need Registers](https://huggingface.co/papers/2309.16588)
55+
* Updated RepViT with new weight release. Thanks [wangao](https://github.com/jameslahm)
56+
* Add patch resizing support (on pretrained weight load) to Swin models
57+
* 0.9.8 release pending
58+
2959
### Sep 1, 2023
3060
* TinyViT added by [SeeFun](https://github.com/seefun)
3161
* Fix EfficientViT (MIT) to use torch.autocast so it works back to PT 1.10

bulk_runner.py

Lines changed: 35 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
from typing import Callable, List, Tuple, Union
2222

2323

24-
from timm.models import is_model, list_models
24+
from timm.models import is_model, list_models, get_pretrained_cfg
2525

2626

2727
parser = argparse.ArgumentParser(description='Per-model process launcher')
@@ -98,16 +98,33 @@ def main():
9898
cmd, cmd_args = cmd_from_args(args)
9999

100100
model_cfgs = []
101-
model_names = []
102101
if args.model_list == 'all':
103-
# NOTE should make this config, for validation / benchmark runs the focus is 1k models,
104-
# so we filter out 21/22k and some other unusable heads. This will change in the future...
105-
exclude_model_filters = ['*in21k', '*in22k', '*dino', '*_22k']
106102
model_names = list_models(
107103
pretrained=args.pretrained, # only include models w/ pretrained checkpoints if set
108-
exclude_filters=exclude_model_filters
109104
)
110105
model_cfgs = [(n, None) for n in model_names]
106+
elif args.model_list == 'all_in1k':
107+
model_names = list_models(pretrained=True)
108+
model_cfgs = []
109+
for n in model_names:
110+
pt_cfg = get_pretrained_cfg(n)
111+
if getattr(pt_cfg, 'num_classes', 0) == 1000:
112+
print(n, pt_cfg.num_classes)
113+
model_cfgs.append((n, None))
114+
elif args.model_list == 'all_res':
115+
model_names = list_models()
116+
model_names += list_models(pretrained=True)
117+
model_cfgs = set()
118+
for n in model_names:
119+
pt_cfg = get_pretrained_cfg(n)
120+
if pt_cfg is None:
121+
print(f'Model {n} is missing pretrained cfg, skipping.')
122+
continue
123+
n = n.split('.')[0]
124+
model_cfgs.add((n, pt_cfg.input_size[-1]))
125+
if pt_cfg.test_input_size is not None:
126+
model_cfgs.add((n, pt_cfg.test_input_size[-1]))
127+
model_cfgs = [(n, {'img-size': r}) for n, r in sorted(model_cfgs)]
111128
elif not is_model(args.model_list):
112129
# model name doesn't exist, try as wildcard filter
113130
model_names = list_models(args.model_list)
@@ -122,7 +139,8 @@ def main():
122139
results_file = args.results_file or './results.csv'
123140
results = []
124141
errors = []
125-
print('Running script on these models: {}'.format(', '.join(model_names)))
142+
model_strings = '\n'.join([f'{x[0]}, {x[1]}' for x in model_cfgs])
143+
print(f"Running script on these models:\n {model_strings}")
126144
if not args.sort_key:
127145
if 'benchmark' in args.script:
128146
if any(['train' in a for a in args.script_args]):
@@ -136,10 +154,14 @@ def main():
136154
print(f'Script: {args.script}, Args: {args.script_args}, Sort key: {sort_key}')
137155

138156
try:
139-
for m, _ in model_cfgs:
157+
for m, ax in model_cfgs:
140158
if not m:
141159
continue
142160
args_str = (cmd, *[str(e) for e in cmd_args], '--model', m)
161+
if ax is not None:
162+
extra_args = [(f'--{k}', str(v)) for k, v in ax.items()]
163+
extra_args = [i for t in extra_args for i in t]
164+
args_str += tuple(extra_args)
143165
try:
144166
o = subprocess.check_output(args=args_str).decode('utf-8').split('--result')[-1]
145167
r = json.loads(o)
@@ -157,7 +179,11 @@ def main():
157179
if errors:
158180
print(f'{len(errors)} models had errors during run.')
159181
for e in errors:
160-
print(f"\t {e['model']} ({e.get('error', 'Unknown')})")
182+
if 'model' in e:
183+
print(f"\t {e['model']} ({e.get('error', 'Unknown')})")
184+
else:
185+
print(e)
186+
161187
results = list(filter(lambda x: 'error' not in x, results))
162188

163189
no_sortkey = list(filter(lambda x: sort_key not in x, results))

hfdocs/source/models/efficientnet-pruned.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# EfficientNet (Knapsack Pruned)
22

3-
**EfficientNet** is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a *compound coefficient*. Unlike conventional practice that arbitrary scales these factors, the EfficientNet scaling method uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients. For example, if we want to use $2^N$ times more computational resources, then we can simply increase the network depth by $\alpha ^ N$, width by $\beta ^ N$, and image size by $\gamma ^ N$, where $\alpha, \beta, \gamma$ are constant coefficients determined by a small grid search on the original small model. EfficientNet uses a compound coefficient $\phi$ to uniformly scales network width, depth, and resolution in a principled way.
3+
**EfficientNet** is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a *compound coefficient*. Unlike conventional practice that arbitrary scales these factors, the EfficientNet scaling method uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients. For example, if we want to use \\( 2^N \\) times more computational resources, then we can simply increase the network depth by \\( \alpha ^ N \\), width by \\( \beta ^ N \\), and image size by \\( \gamma ^ N \\), where \\( \alpha, \beta, \gamma \\) are constant coefficients determined by a small grid search on the original small model. EfficientNet uses a compound coefficient \\( \phi \\) to uniformly scales network width, depth, and resolution in a principled way.
44

55
The compound scaling method is justified by the intuition that if the input image is bigger, then the network needs more layers to increase the receptive field and more channels to capture more fine-grained patterns on the bigger image.
66

@@ -20,7 +20,7 @@ To load a pretrained model:
2020

2121
To load and preprocess the image:
2222

23-
```py
23+
```py
2424
>>> import urllib
2525
>>> from PIL import Image
2626
>>> from timm.data import resolve_data_config
@@ -51,7 +51,7 @@ To get the top-5 predictions class names:
5151
```py
5252
>>> # Get imagenet class mappings
5353
>>> url, filename = ("https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt", "imagenet_classes.txt")
54-
>>> urllib.request.urlretrieve(url, filename)
54+
>>> urllib.request.urlretrieve(url, filename)
5555
>>> with open("imagenet_classes.txt", "r") as f:
5656
... categories = [s.strip() for s in f.readlines()]
5757

@@ -85,7 +85,7 @@ You can follow the [timm recipe scripts](../scripts) for training a new model af
8585

8686
```BibTeX
8787
@misc{tan2020efficientnet,
88-
title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
88+
title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
8989
author={Mingxing Tan and Quoc V. Le},
9090
year={2020},
9191
eprint={1905.11946},
@@ -209,4 +209,4 @@ Models:
209209
Metrics:
210210
Top 1 Accuracy: 80.86%
211211
Top 5 Accuracy: 95.24%
212-
-->
212+
-->

hfdocs/source/models/efficientnet.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# EfficientNet
22

3-
**EfficientNet** is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a *compound coefficient*. Unlike conventional practice that arbitrary scales these factors, the EfficientNet scaling method uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients. For example, if we want to use $2^N$ times more computational resources, then we can simply increase the network depth by $\alpha ^ N$, width by $\beta ^ N$, and image size by $\gamma ^ N$, where $\alpha, \beta, \gamma$ are constant coefficients determined by a small grid search on the original small model. EfficientNet uses a compound coefficient $\phi$ to uniformly scales network width, depth, and resolution in a principled way.
3+
**EfficientNet** is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a *compound coefficient*. Unlike conventional practice that arbitrary scales these factors, the EfficientNet scaling method uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients. For example, if we want to use \\( 2^N \\) times more computational resources, then we can simply increase the network depth by \\( \alpha ^ N \\), width by \\( \beta ^ N \\), and image size by \\( \gamma ^ N \\), where \\( \alpha, \beta, \gamma \\) are constant coefficients determined by a small grid search on the original small model. EfficientNet uses a compound coefficient \\( \phi \\) to uniformly scales network width, depth, and resolution in a principled way.
44

55
The compound scaling method is justified by the intuition that if the input image is bigger, then the network needs more layers to increase the receptive field and more channels to capture more fine-grained patterns on the bigger image.
66

@@ -18,7 +18,7 @@ To load a pretrained model:
1818

1919
To load and preprocess the image:
2020

21-
```py
21+
```py
2222
>>> import urllib
2323
>>> from PIL import Image
2424
>>> from timm.data import resolve_data_config
@@ -49,7 +49,7 @@ To get the top-5 predictions class names:
4949
```py
5050
>>> # Get imagenet class mappings
5151
>>> url, filename = ("https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt", "imagenet_classes.txt")
52-
>>> urllib.request.urlretrieve(url, filename)
52+
>>> urllib.request.urlretrieve(url, filename)
5353
>>> with open("imagenet_classes.txt", "r") as f:
5454
... categories = [s.strip() for s in f.readlines()]
5555

@@ -83,7 +83,7 @@ You can follow the [timm recipe scripts](../scripts) for training a new model af
8383

8484
```BibTeX
8585
@misc{tan2020efficientnet,
86-
title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
86+
title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
8787
author={Mingxing Tan and Quoc V. Le},
8888
year={2020},
8989
eprint={1905.11946},
@@ -389,4 +389,4 @@ Models:
389389
Metrics:
390390
Top 1 Accuracy: 75.5%
391391
Top 5 Accuracy: 92.51%
392-
-->
392+
-->

hfdocs/source/models/gloun-resnext.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# (Gluon) ResNeXt
22

3-
A **ResNeXt** repeats a [building block](https://paperswithcode.com/method/resnext-block) that aggregates a set of transformations with the same topology. Compared to a [ResNet](https://paperswithcode.com/method/resnet), it exposes a new dimension, *cardinality* (the size of the set of transformations) $C$, as an essential factor in addition to the dimensions of depth and width.
3+
A **ResNeXt** repeats a [building block](https://paperswithcode.com/method/resnext-block) that aggregates a set of transformations with the same topology. Compared to a [ResNet](https://paperswithcode.com/method/resnet), it exposes a new dimension, *cardinality* (the size of the set of transformations) \\( C \\), as an essential factor in addition to the dimensions of depth and width.
44

55
The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
66

@@ -16,7 +16,7 @@ To load a pretrained model:
1616

1717
To load and preprocess the image:
1818

19-
```py
19+
```py
2020
>>> import urllib
2121
>>> from PIL import Image
2222
>>> from timm.data import resolve_data_config
@@ -47,7 +47,7 @@ To get the top-5 predictions class names:
4747
```py
4848
>>> # Get imagenet class mappings
4949
>>> url, filename = ("https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt", "imagenet_classes.txt")
50-
>>> urllib.request.urlretrieve(url, filename)
50+
>>> urllib.request.urlretrieve(url, filename)
5151
>>> with open("imagenet_classes.txt", "r") as f:
5252
... categories = [s.strip() for s in f.readlines()]
5353

@@ -206,4 +206,4 @@ Models:
206206
Metrics:
207207
Top 1 Accuracy: 79.35%
208208
Top 5 Accuracy: 94.42%
209-
-->
209+
-->

0 commit comments

Comments
 (0)