huggingface
diff --git a/‎.github/workflows/build_documentation.yml‎
Lines changed: 1 addition & 2 deletions b/‎.github/workflows/build_documentation.yml‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎.github/workflows/delete_doc_comment.yml‎
Lines changed: 0 additions & 13 deletions b/‎.github/workflows/delete_doc_comment.yml‎
Lines changed: 0 additions & 13 deletions
diff --git a/‎.github/workflows/delete_doc_comment_trigger.yml‎
Lines changed: 0 additions & 12 deletions b/‎.github/workflows/delete_doc_comment_trigger.yml‎
Lines changed: 0 additions & 12 deletions
diff --git a/‎.github/workflows/tests.yml‎
Lines changed: 8 additions & 6 deletions b/‎.github/workflows/tests.yml‎
Lines changed: 8 additions & 6 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 2 additions & 2 deletions b/‎CONTRIBUTING.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎README.md‎
Lines changed: 30 additions & 0 deletions b/‎README.md‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎bulk_runner.py‎
Lines changed: 35 additions & 9 deletions b/‎bulk_runner.py‎
Lines changed: 35 additions & 9 deletions
diff --git a/‎hfdocs/source/models/efficientnet-pruned.mdx‎
Lines changed: 5 additions & 5 deletions b/‎hfdocs/source/models/efficientnet-pruned.mdx‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎hfdocs/source/models/efficientnet.mdx‎
Lines changed: 5 additions & 5 deletions b/‎hfdocs/source/models/efficientnet.mdx‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎hfdocs/source/models/gloun-resnext.mdx‎
Lines changed: 4 additions & 4 deletions b/‎hfdocs/source/models/gloun-resnext.mdx‎
Lines changed: 4 additions & 4 deletions
@@ -17,5 +17,4 @@ jobs:
       path_to_docs: pytorch-image-models/hfdocs/source
       version_tag_suffix: ""
     secrets:
-      token: ${{ secrets.HUGGINGFACE_PUSH }}
-      hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
+      hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
@@ -16,10 +16,12 @@ jobs:
     strategy:
       matrix:
         os: [ubuntu-latest]
-        python: ['3.10']
-        torch: ['1.13.0']
-        torchvision: ['0.14.0']
+        python: ['3.10', '3.11']
+        torch: [{base: '1.13.0', vision: '0.14.0'}, {base: '2.1.0', vision: '0.16.0'}]
         testmarker: ['-k "not test_models"', '-m base', '-m cfg', '-m torchscript', '-m features', '-m fxforward', '-m fxbackward']
+        exclude:
+          - python: '3.11'
+            torch: {base: '1.13.0', vision: '0.14.0'}
     runs-on: ${{ matrix.os }}
 
     steps:
@@ -34,17 +36,17 @@ jobs:
         pip install -r requirements-dev.txt
     - name: Install torch on mac
       if: startsWith(matrix.os, 'macOS')
-      run: pip install --no-cache-dir torch==${{ matrix.torch }} torchvision==${{ matrix.torchvision }}
+      run: pip install --no-cache-dir torch==${{ matrix.torch.base }} torchvision==${{ matrix.torch.vision }}
     - name: Install torch on Windows
       if: startsWith(matrix.os, 'windows')
-      run: pip install --no-cache-dir torch==${{ matrix.torch }} torchvision==${{ matrix.torchvision }}
+      run: pip install --no-cache-dir torch==${{ matrix.torch.base }} torchvision==${{ matrix.torch.vision }}
     - name: Install torch on ubuntu
       if: startsWith(matrix.os, 'ubuntu')
       run: |
         sudo sed -i 's/azure\.//' /etc/apt/sources.list
         sudo apt update
         sudo apt install -y google-perftools
-        pip install --no-cache-dir torch==${{ matrix.torch }}+cpu torchvision==${{ matrix.torchvision }}+cpu -f https://download.pytorch.org/whl/torch_stable.html
+        pip install --no-cache-dir torch==${{ matrix.torch.base }}+cpu torchvision==${{ matrix.torch.vision }}+cpu -f https://download.pytorch.org/whl/torch_stable.html
     - name: Install requirements
       run: |
         pip install -r requirements.txt
 
@@ -1,6 +1,6 @@
 *This guideline is very much a work-in-progress.*
 
-Contriubtions to `timm` for code, documentation, tests are more than welcome!
+Contributions to `timm` for code, documentation, tests are more than welcome!
 
 There haven't been any formal guidelines to date so please bear with me, and feel free to add to this guide.
 
@@ -49,7 +49,7 @@ This is YES:
    }
 ```
 
-When there is descrepancy in a given source file (there are many origins for various bits of code and not all have been updated to what I consider current goal), please follow the style in a given file.
+When there is discrepancy in a given source file (there are many origins for various bits of code and not all have been updated to what I consider current goal), please follow the style in a given file.
 
 In general, if you add new code, formatting it with black using the following options should result in a style that is compatible with the rest of the code base:
 
 
@@ -26,6 +26,36 @@
 * The Hugging Face Hub (https://huggingface.co/timm) is now the primary source for `timm` weights. Model cards include link to papers, original source, license. 
 * Previous 0.6.x can be cloned from [0.6.x](https://github.com/rwightman/pytorch-image-models/tree/0.6.x) branch or installed via pip with version.
 
+### Nov 23, 2023
+* Added EfficientViT-Large models, thanks [SeeFun](https://github.com/seefun)
+* Fix Python 3.7 compat, will be dropping support for it soon
+* Other misc fixes
+* Release 0.9.12
+
+### Nov 20, 2023
+* Added significant flexibility for Hugging Face Hub based timm models via `model_args` config entry. `model_args` will be passed as kwargs through to models on creation. 
+  * See example at https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m_ft_as20k/blob/main/config.json
+  * Usage: https://github.com/huggingface/pytorch-image-models/discussions/2035
+* Updated imagenet eval and test set csv files with latest models
+* `vision_transformer.py` typing and doc cleanup by [Laureηt](https://github.com/Laurent2916)
+* 0.9.11 release
+
+### Nov 3, 2023
+* [DFN (Data Filtering Networks)](https://huggingface.co/papers/2309.17425) and [MetaCLIP](https://huggingface.co/papers/2309.16671) ViT weights added
+* DINOv2 'register' ViT model weights added (https://huggingface.co/papers/2309.16588, https://huggingface.co/papers/2304.07193)
+* Add `quickgelu` ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient)
+* Improved typing added to ResNet, MobileNet-v3 thanks to [Aryan](https://github.com/a-r-r-o-w)
+* ImageNet-12k fine-tuned (from LAION-2B CLIP) `convnext_xxlarge`
+* 0.9.9 release
+
+### Oct 20, 2023
+* [SigLIP](https://huggingface.co/papers/2303.15343) image tower weights supported in `vision_transformer.py`.
+  * Great potential for fine-tune and downstream feature use.
+* Experimental 'register' support in vit models as per [Vision Transformers Need Registers](https://huggingface.co/papers/2309.16588)
+* Updated RepViT with new weight release. Thanks [wangao](https://github.com/jameslahm)
+* Add patch resizing support (on pretrained weight load) to Swin models
+* 0.9.8 release pending
+
 ### Sep 1, 2023
 * TinyViT added by [SeeFun](https://github.com/seefun)
 * Fix EfficientViT (MIT) to use torch.autocast so it works back to PT 1.10
 
@@ -21,7 +21,7 @@
 from typing import Callable, List, Tuple, Union
 
 
-from timm.models import is_model, list_models
+from timm.models import is_model, list_models, get_pretrained_cfg
 
 
 parser = argparse.ArgumentParser(description='Per-model process launcher')
@@ -98,16 +98,33 @@ def main():
     cmd, cmd_args = cmd_from_args(args)
 
     model_cfgs = []
-    model_names = []
     if args.model_list == 'all':
-        # NOTE should make this config, for validation / benchmark runs the focus is 1k models,
-        # so we filter out 21/22k and some other unusable heads. This will change in the future...
-        exclude_model_filters = ['*in21k', '*in22k', '*dino', '*_22k']
         model_names = list_models(
             pretrained=args.pretrained,  # only include models w/ pretrained checkpoints if set
-            exclude_filters=exclude_model_filters
         )
         model_cfgs = [(n, None) for n in model_names]
+    elif args.model_list == 'all_in1k':
+        model_names = list_models(pretrained=True)
+        model_cfgs = []
+        for n in model_names:
+            pt_cfg = get_pretrained_cfg(n)
+            if getattr(pt_cfg, 'num_classes', 0) == 1000:
+                print(n, pt_cfg.num_classes)
+                model_cfgs.append((n, None))
+    elif args.model_list == 'all_res':
+        model_names = list_models()
+        model_names += list_models(pretrained=True)
+        model_cfgs = set()
+        for n in model_names:
+            pt_cfg = get_pretrained_cfg(n)
+            if pt_cfg is None:
+                print(f'Model {n} is missing pretrained cfg, skipping.')
+                continue
+            n = n.split('.')[0]
+            model_cfgs.add((n, pt_cfg.input_size[-1]))
+            if pt_cfg.test_input_size is not None:
+                model_cfgs.add((n, pt_cfg.test_input_size[-1]))
+        model_cfgs = [(n, {'img-size': r}) for n, r in sorted(model_cfgs)]
     elif not is_model(args.model_list):
         # model name doesn't exist, try as wildcard filter
         model_names = list_models(args.model_list)
@@ -122,7 +139,8 @@ def main():
         results_file = args.results_file or './results.csv'
         results = []
         errors = []
-        print('Running script on these models: {}'.format(', '.join(model_names)))
+        model_strings = '\n'.join([f'{x[0]}, {x[1]}' for x in model_cfgs])
+        print(f"Running script on these models:\n {model_strings}")
         if not args.sort_key:
             if 'benchmark' in args.script:
                 if any(['train' in a for a in args.script_args]):
@@ -136,10 +154,14 @@ def main():
         print(f'Script: {args.script}, Args: {args.script_args}, Sort key: {sort_key}')
 
         try:
-            for m, _ in model_cfgs:
+            for m, ax in model_cfgs:
                 if not m:
                     continue
                 args_str = (cmd, *[str(e) for e in cmd_args], '--model', m)
+                if ax is not None:
+                    extra_args = [(f'--{k}', str(v)) for k, v in ax.items()]
+                    extra_args = [i for t in extra_args for i in t]
+                    args_str += tuple(extra_args)
                 try:
                     o = subprocess.check_output(args=args_str).decode('utf-8').split('--result')[-1]
                     r = json.loads(o)
@@ -157,7 +179,11 @@ def main():
         if errors:
             print(f'{len(errors)} models had errors during run.')
             for e in errors:
-                print(f"\t {e['model']} ({e.get('error', 'Unknown')})")
+                if 'model' in e:
+                    print(f"\t {e['model']} ({e.get('error', 'Unknown')})")
+                else:
+                    print(e)
+
         results = list(filter(lambda x: 'error' not in x, results))
 
         no_sortkey = list(filter(lambda x: sort_key not in x, results))
 
@@ -1,6 +1,6 @@
 # EfficientNet (Knapsack Pruned)
 
-**EfficientNet** is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a *compound coefficient*. Unlike conventional practice that arbitrary scales  these factors, the EfficientNet scaling method uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients. For example, if we want to use $2^N$ times more computational resources, then we can simply increase the network depth by $\alpha ^ N$,  width by $\beta ^ N$, and image size by $\gamma ^ N$, where $\alpha, \beta, \gamma$ are constant coefficients determined by a small grid search on the original small model. EfficientNet uses a compound coefficient $\phi$ to uniformly scales network width, depth, and resolution in a  principled way.
+**EfficientNet** is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a *compound coefficient*. Unlike conventional practice that arbitrary scales  these factors, the EfficientNet scaling method uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients. For example, if we want to use \\( 2^N \\) times more computational resources, then we can simply increase the network depth by \\( \alpha ^ N \\),  width by \\( \beta ^ N \\), and image size by \\( \gamma ^ N \\), where \\( \alpha, \beta, \gamma \\) are constant coefficients determined by a small grid search on the original small model. EfficientNet uses a compound coefficient \\( \phi \\) to uniformly scales network width, depth, and resolution in a principled way.
 
 The compound scaling method is justified by the intuition that if the input image is bigger, then the network needs more layers to increase the receptive field and more channels to capture more fine-grained patterns on the bigger image.
 
@@ -20,7 +20,7 @@ To load a pretrained model:
 
 To load and preprocess the image:
 
-```py 
+```py
 >>> import urllib
 >>> from PIL import Image
 >>> from timm.data import resolve_data_config
@@ -51,7 +51,7 @@ To get the top-5 predictions class names:
 ```py
 >>> # Get imagenet class mappings
 >>> url, filename = ("https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt", "imagenet_classes.txt")
->>> urllib.request.urlretrieve(url, filename) 
+>>> urllib.request.urlretrieve(url, filename)
 >>> with open("imagenet_classes.txt", "r") as f:
 ...     categories = [s.strip() for s in f.readlines()]
 
@@ -85,7 +85,7 @@ You can follow the [timm recipe scripts](../scripts) for training a new model af
 
 ```BibTeX
 @misc{tan2020efficientnet,
-      title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks}, 
+      title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
       author={Mingxing Tan and Quoc V. Le},
       year={2020},
       eprint={1905.11946},
@@ -209,4 +209,4 @@ Models:
     Metrics:
       Top 1 Accuracy: 80.86%
       Top 5 Accuracy: 95.24%
--->
+-->
@@ -1,6 +1,6 @@
 # EfficientNet
 
-**EfficientNet** is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a *compound coefficient*. Unlike conventional practice that arbitrary scales  these factors, the EfficientNet scaling method uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients. For example, if we want to use $2^N$ times more computational resources, then we can simply increase the network depth by $\alpha ^ N$,  width by $\beta ^ N$, and image size by $\gamma ^ N$, where $\alpha, \beta, \gamma$ are constant coefficients determined by a small grid search on the original small model. EfficientNet uses a compound coefficient $\phi$ to uniformly scales network width, depth, and resolution in a  principled way.
+**EfficientNet** is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a *compound coefficient*. Unlike conventional practice that arbitrary scales  these factors, the EfficientNet scaling method uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients. For example, if we want to use \\( 2^N \\) times more computational resources, then we can simply increase the network depth by \\( \alpha ^ N \\),  width by \\( \beta ^ N \\), and image size by \\( \gamma ^ N \\), where \\( \alpha, \beta, \gamma \\) are constant coefficients determined by a small grid search on the original small model. EfficientNet uses a compound coefficient \\( \phi \\) to uniformly scales network width, depth, and resolution in a  principled way.
 
 The compound scaling method is justified by the intuition that if the input image is bigger, then the network needs more layers to increase the receptive field and more channels to capture more fine-grained patterns on the bigger image.
 
@@ -18,7 +18,7 @@ To load a pretrained model:
 
 To load and preprocess the image:
 
-```py 
+```py
 >>> import urllib
 >>> from PIL import Image
 >>> from timm.data import resolve_data_config
@@ -49,7 +49,7 @@ To get the top-5 predictions class names:
 ```py
 >>> # Get imagenet class mappings
 >>> url, filename = ("https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt", "imagenet_classes.txt")
->>> urllib.request.urlretrieve(url, filename) 
+>>> urllib.request.urlretrieve(url, filename)
 >>> with open("imagenet_classes.txt", "r") as f:
 ...     categories = [s.strip() for s in f.readlines()]
 
@@ -83,7 +83,7 @@ You can follow the [timm recipe scripts](../scripts) for training a new model af
 
 ```BibTeX
 @misc{tan2020efficientnet,
-      title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks}, 
+      title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
       author={Mingxing Tan and Quoc V. Le},
       year={2020},
       eprint={1905.11946},
@@ -389,4 +389,4 @@ Models:
     Metrics:
       Top 1 Accuracy: 75.5%
       Top 5 Accuracy: 92.51%
--->
+-->
@@ -1,6 +1,6 @@
 # (Gluon) ResNeXt
 
-A **ResNeXt** repeats a [building block](https://paperswithcode.com/method/resnext-block) that aggregates a set of transformations with the same topology. Compared to a [ResNet](https://paperswithcode.com/method/resnet), it exposes a new dimension,  *cardinality* (the size of the set of transformations) $C$, as an essential factor in addition to the dimensions of depth and width. 
+A **ResNeXt** repeats a [building block](https://paperswithcode.com/method/resnext-block) that aggregates a set of transformations with the same topology. Compared to a [ResNet](https://paperswithcode.com/method/resnet), it exposes a new dimension,  *cardinality* (the size of the set of transformations) \\( C \\), as an essential factor in addition to the dimensions of depth and width.
 
 The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
 
@@ -16,7 +16,7 @@ To load a pretrained model:
 
 To load and preprocess the image:
 
-```py 
+```py
 >>> import urllib
 >>> from PIL import Image
 >>> from timm.data import resolve_data_config
@@ -47,7 +47,7 @@ To get the top-5 predictions class names:
 ```py
 >>> # Get imagenet class mappings
 >>> url, filename = ("https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt", "imagenet_classes.txt")
->>> urllib.request.urlretrieve(url, filename) 
+>>> urllib.request.urlretrieve(url, filename)
 >>> with open("imagenet_classes.txt", "r") as f:
 ...     categories = [s.strip() for s in f.readlines()]
 
@@ -206,4 +206,4 @@ Models:
     Metrics:
       Top 1 Accuracy: 79.35%
       Top 5 Accuracy: 94.42%
--->
+-->