@@ -408,6 +408,8 @@ All model architecture families include variants with pretrained weights. There
408408
409409* Aggregating Nested Transformers - https://arxiv.org/abs/2105.12723
410410* BEiT - https://arxiv.org/abs/2106.08254
411+ * BEiT-V2 - https://arxiv.org/abs/2208.06366
412+ * BEiT3 - https://arxiv.org/abs/2208.10442
411413* Big Transfer ResNetV2 (BiT) - https://arxiv.org/abs/1912.11370
412414* Bottleneck Transformers - https://arxiv.org/abs/2101.11605
413415* CaiT (Class-Attention in Image Transformers) - https://arxiv.org/abs/2103.17239
@@ -424,6 +426,7 @@ All model architecture families include variants with pretrained weights. There
424426* DPN (Dual-Path Network) - https://arxiv.org/abs/1707.01629
425427* EdgeNeXt - https://arxiv.org/abs/2206.10589
426428* EfficientFormer - https://arxiv.org/abs/2206.01191
429+ * EfficientFormer-V2 - https://arxiv.org/abs/2212.08059
427430* EfficientNet (MBConvNet Family)
428431 * EfficientNet NoisyStudent (B0-B7, L2) - https://arxiv.org/abs/1911.04252
429432 * EfficientNet AdvProp (B0-B8) - https://arxiv.org/abs/1911.09665
@@ -440,12 +443,14 @@ All model architecture families include variants with pretrained weights. There
440443* EfficientViT (MSRA) - https://arxiv.org/abs/2305.07027
441444* EVA - https://arxiv.org/abs/2211.07636
442445* EVA-02 - https://arxiv.org/abs/2303.11331
446+ * FasterNet - https://arxiv.org/abs/2303.03667
443447* FastViT - https://arxiv.org/abs/2303.14189
444448* FlexiViT - https://arxiv.org/abs/2212.08013
445449* FocalNet (Focal Modulation Networks) - https://arxiv.org/abs/2203.11926
446450* GCViT (Global Context Vision Transformer) - https://arxiv.org/abs/2206.09959
447451* GhostNet - https://arxiv.org/abs/1911.11907
448452* GhostNet-V2 - https://arxiv.org/abs/2211.12905
453+ * GhostNet-V3 - https://arxiv.org/abs/2404.11202
449454* gMLP - https://arxiv.org/abs/2105.08050
450455* GPU-Efficient Networks - https://arxiv.org/abs/2006.14090
451456* Halo Nets - https://arxiv.org/abs/2103.12731
@@ -501,14 +506,19 @@ All model architecture families include variants with pretrained weights. There
501506* SelecSLS - https://arxiv.org/abs/1907.00837
502507* Selective Kernel Networks - https://arxiv.org/abs/1903.06586
503508* Sequencer2D - https://arxiv.org/abs/2205.01972
509+ * SHViT - https://arxiv.org/abs/2401.16456
504510* SigLIP (image encoder) - https://arxiv.org/abs/2303.15343
505511* SigLIP 2 (image encoder) - https://arxiv.org/abs/2502.14786
512+ * StarNet - https://arxiv.org/abs/2403.19967
513+ * SwiftFormer - https://arxiv.org/pdf/2303.15446
506514* Swin S3 (AutoFormerV2) - https://arxiv.org/abs/2111.14725
507515* Swin Transformer - https://arxiv.org/abs/2103.14030
508516* Swin Transformer V2 - https://arxiv.org/abs/2111.09883
517+ * TinyViT - https://arxiv.org/abs/2207.10666
509518* Transformer-iN-Transformer (TNT) - https://arxiv.org/abs/2103.00112
510519* TResNet - https://arxiv.org/abs/2003.13630
511520* Twins (Spatial Attention in Vision Transformers) - https://arxiv.org/pdf/2104.13840.pdf
521+ * VGG - https://arxiv.org/abs/1409.1556
512522* Visformer - https://arxiv.org/abs/2104.12533
513523* Vision Transformer - https://arxiv.org/abs/2010.11929
514524* ViTamin - https://arxiv.org/abs/2404.02132
0 commit comments