Skip to content

Commit dc0630f

Browse files
authored
Merge pull request #30 from rwightman/mixnet_xl
MixNet-XL
2 parents 9816ca3 + 73fbd97 commit dc0630f

File tree

2 files changed

+88
-9
lines changed

2 files changed

+88
-9
lines changed

README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ I've included a few of my favourite models, but this is not an exhaustive collec
3333
* DPN-68, DPN-68b, DPN-92, DPN-98, DPN-131, DPN-107
3434
* Generic EfficientNet (from my standalone [GenMobileNet](https://github.com/rwightman/genmobilenet-pytorch)) - A generic model that implements many of the efficient models that utilize similar DepthwiseSeparable and InvertedResidual blocks
3535
* EfficientNet (B0-B7) (https://arxiv.org/abs/1905.11946) -- validated, compat with TF weights
36+
* EfficientNet-EdgeTPU (S, M, L) (https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html) --validated w/ TF weights
3637
* MixNet (https://arxiv.org/abs/1907.09595) -- validated, compat with TF weights
3738
* MNASNet B1, A1 (Squeeze-Excite), and Small (https://arxiv.org/abs/1807.11626)
3839
* MobileNet-V1 (https://arxiv.org/abs/1704.04861)
@@ -71,6 +72,7 @@ I've leveraged the training scripts in this repository to train a few of the mod
7172

7273
|Model | Prec@1 (Err) | Prec@5 (Err) | Param # | Image Scaling | Image Size |
7374
|---|---|---|---|---|---|
75+
| mixnet_xl | 80.120 (19.880) | 95.022 (4.978) | 11.90M | bicubic | 224 |
7476
| efficientnet_b2 | 79.760 (20.240) | 94.714 (5.286) | 9.11M | bicubic | 260 |
7577
| resnext50d_32x4d | 79.674 (20.326) | 94.868 (5.132) | 25.1M | bicubic | 224 |
7678
| mixnet_l | 78.976 (21.024 | 94.184 (5.816) | 7.33M | bicubic | 224 |
@@ -111,6 +113,8 @@ I've leveraged the training scripts in this repository to train a few of the mod
111113
| gluon_seresnext101_32x4d | 80.902 (19.098) | 95.294 (4.706) | 48.96 | bicubic | 224 | |
112114
| gluon_seresnext101_64x4d | 80.890 (19.110) | 95.304 (4.696) | 88.23 | bicubic | 224 | |
113115
| gluon_resnext101_64x4d | 80.602 (19.398) | 94.994 (5.006) | 83.46 | bicubic | 224 | |
116+
| tf_efficientnet_el | 80.534 (19.466) | 95.190 (4.810) | 10.59 | bicubic | 300 | [Google](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/edgetpu) |
117+
| tf_efficientnet_el *tfp | 80.476 (19.524) | 95.200 (4.800) | 10.59 | bicubic | 300 | [Google](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/edgetpu) |
114118
| gluon_resnet152_v1d | 80.470 (19.530) | 95.206 (4.794) | 60.21 | bicubic | 224 | |
115119
| gluon_resnet101_v1d | 80.424 (19.576) | 95.020 (4.980) | 44.57 | bicubic | 224 | |
116120
| gluon_resnext101_32x4d | 80.334 (19.666) | 94.926 (5.074) | 44.18 | bicubic | 224 | |
@@ -126,15 +130,19 @@ I've leveraged the training scripts in this repository to train a few of the mod
126130
| gluon_resnet101_v1b | 79.304 (20.696) | 94.524 (5.476) | 44.55 | bicubic | 224 | |
127131
| tf_efficientnet_b1 *tfp | 79.172 (20.828) | 94.450 (5.550) | 7.79 | bicubic | 240 | [Google](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet) |
128132
| gluon_resnet50_v1d | 79.074 (20.926) | 94.476 (5.524) | 25.58 | bicubic | 224 | |
133+
| tf_efficientnet_em *tfp | 78.958 (21.042) | 94.458 (5.542) | 6.90 | bicubic | 240 | [Google](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/edgetpu) |
129134
| tf_mixnet_l *tfp | 78.846 (21.154) | 94.212 (5.788) | 7.33 | bilinear | 224 | [Google](https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet/mixnet) |
130135
| tf_efficientnet_b1 | 78.826 (21.174) | 94.198 (5.802) | 7.79 | bicubic | 240 | [Google](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet) |
131136
| gluon_inception_v3 | 78.804 (21.196) | 94.380 (5.620) | 27.16M | bicubic | 299 | [MxNet Gluon](https://gluon-cv.mxnet.io/model_zoo/classification.html) |
132137
| tf_mixnet_l | 78.770 (21.230) | 94.004 (5.996) | 7.33 | bicubic | 224 | [Google](https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet/mixnet) |
138+
| tf_efficientnet_em | 78.742 (21.258) | 94.332 (5.668) | 6.90 | bicubic | 240 | [Google](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/edgetpu) |
133139
| gluon_resnet50_v1s | 78.712 (21.288) | 94.242 (5.758) | 25.68 | bicubic | 224 | |
134140
| gluon_resnet50_v1c | 78.010 (21.990) | 93.988 (6.012) | 25.58 | bicubic | 224 | |
135141
| tf_inception_v3 | 77.856 (22.144) | 93.644 (6.356) | 27.16M | bicubic | 299 | [Tensorflow Slim](https://github.com/tensorflow/models/tree/master/research/slim) |
142+
| tf_efficientnet_es *tfp | 77.616 (22.384) | 93.750 (6.250) | 5.44 | bicubic | 224 | [Google](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/edgetpu) |
136143
| gluon_resnet50_v1b | 77.578 (22.422) | 93.718 (6.282) | 25.56 | bicubic | 224 | |
137144
| adv_inception_v3 | 77.576 (22.424) | 93.724 (6.276) | 27.16M | bicubic | 299 | [Tensorflow Adv models](https://github.com/tensorflow/models/tree/master/research/adv_imagenet_models) |
145+
| tf_efficientnet_es | 77.264 (22.736) | 93.600 (6.400) | 5.44 | bicubic | 224 | [Google](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/edgetpu) |
138146
| tf_efficientnet_b0 *tfp | 77.258 (22.742) | 93.478 (6.522) | 5.29 | bicubic | 224 | [Google](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet) |
139147
| tf_mixnet_m *tfp | 77.072 (22.928) | 93.368 (6.632) | 5.01 | bilinear | 224 | [Google](https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet/mixnet) |
140148
| tf_mixnet_m | 76.950 (23.050) | 93.156 (6.844) | 5.01 | bicubic | 224 | [Google](https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet/mixnet) |

timm/models/gen_efficientnet.py

Lines changed: 80 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,9 @@ def _cfg(url='', **kwargs):
138138
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/mixnet_m-4647fc68.pth'),
139139
'mixnet_l': _cfg(
140140
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/mixnet_l-5a9a2ed8.pth'),
141+
'mixnet_xl': _cfg(
142+
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/mixnet_xl-ac5fbe8d.pth'),
143+
'mixnet_xxl': _cfg(),
141144
'tf_mixnet_s': _cfg(
142145
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_mixnet_s-89d3354b.pth'),
143146
'tf_mixnet_m': _cfg(
@@ -312,21 +315,59 @@ def _decode_block_str(block_str, depth_multiplier=1.0):
312315
else:
313316
assert False, 'Unknown block type (%s)' % block_type
314317

315-
# return a list of block args expanded by num_repeat and
316-
# scaled by depth_multiplier
317-
num_repeat = int(math.ceil(num_repeat * depth_multiplier))
318-
return [deepcopy(block_args) for _ in range(num_repeat)]
318+
return block_args, num_repeat
319319

320320

321-
def _decode_arch_def(arch_def, depth_multiplier=1.0):
321+
def _scale_stage_depth(stack_args, repeats, depth_multiplier=1.0, depth_trunc='ceil'):
322+
""" Per-stage depth scaling
323+
Scales the block repeats in each stage. This depth scaling impl maintains
324+
compatibility with the EfficientNet scaling method, while allowing sensible
325+
scaling for other models that may have multiple block arg definitions in each stage.
326+
"""
327+
328+
# We scale the total repeat count for each stage, there may be multiple
329+
# block arg defs per stage so we need to sum.
330+
num_repeat = sum(repeats)
331+
if depth_trunc == 'round':
332+
# Truncating to int by rounding allows stages with few repeats to remain
333+
# proportionally smaller for longer. This is a good choice when stage definitions
334+
# include single repeat stages that we'd prefer to keep that way as long as possible
335+
num_repeat_scaled = max(1, round(num_repeat * depth_multiplier))
336+
else:
337+
# The default for EfficientNet truncates repeats to int via 'ceil'.
338+
# Any multiplier > 1.0 will result in an increased depth for every stage.
339+
num_repeat_scaled = int(math.ceil(num_repeat * depth_multiplier))
340+
341+
# Proportionally distribute repeat count scaling to each block definition in the stage.
342+
# Allocation is done in reverse as it results in the first block being less likely to be scaled.
343+
# The first block makes less sense to repeat in most of the arch definitions.
344+
repeats_scaled = []
345+
for r in repeats[::-1]:
346+
rs = max(1, round((r / num_repeat * num_repeat_scaled)))
347+
repeats_scaled.append(rs)
348+
num_repeat -= r
349+
num_repeat_scaled -= rs
350+
repeats_scaled = repeats_scaled[::-1]
351+
352+
# Apply the calculated scaling to each block arg in the stage
353+
sa_scaled = []
354+
for ba, rep in zip(stack_args, repeats_scaled):
355+
sa_scaled.extend([deepcopy(ba) for _ in range(rep)])
356+
return sa_scaled
357+
358+
359+
def _decode_arch_def(arch_def, depth_multiplier=1.0, depth_trunc='ceil'):
322360
arch_args = []
323361
for stack_idx, block_strings in enumerate(arch_def):
324362
assert isinstance(block_strings, list)
325363
stack_args = []
364+
repeats = []
326365
for block_str in block_strings:
327366
assert isinstance(block_str, str)
328-
stack_args.extend(_decode_block_str(block_str, depth_multiplier))
329-
arch_args.append(stack_args)
367+
ba, rep = _decode_block_str(block_str)
368+
stack_args.append(ba)
369+
repeats.append(rep)
370+
arch_args.append(_scale_stage_depth(stack_args, repeats, depth_multiplier, depth_trunc))
330371
return arch_args
331372

332373

@@ -1261,7 +1302,7 @@ def _gen_mixnet_s(channel_multiplier=1.0, num_classes=1000, **kwargs):
12611302
return model
12621303

12631304

1264-
def _gen_mixnet_m(channel_multiplier=1.0, num_classes=1000, **kwargs):
1305+
def _gen_mixnet_m(channel_multiplier=1.0, depth_multiplier=1.0, num_classes=1000, **kwargs):
12651306
"""Creates a MixNet Medium-Large model.
12661307
12671308
Ref impl: https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet/mixnet
@@ -1283,7 +1324,7 @@ def _gen_mixnet_m(channel_multiplier=1.0, num_classes=1000, **kwargs):
12831324
# 7x7
12841325
]
12851326
model = GenEfficientNet(
1286-
_decode_arch_def(arch_def),
1327+
_decode_arch_def(arch_def, depth_multiplier=depth_multiplier, depth_trunc='round'),
12871328
num_classes=num_classes,
12881329
stem_size=24,
12891330
num_features=1536,
@@ -1876,6 +1917,36 @@ def mixnet_l(pretrained=False, num_classes=1000, in_chans=3, **kwargs):
18761917
return model
18771918

18781919

1920+
@register_model
1921+
def mixnet_xl(pretrained=False, num_classes=1000, in_chans=3, **kwargs):
1922+
"""Creates a MixNet Extra-Large model.
1923+
Not a paper spec, experimental def by RW w/ depth scaling.
1924+
"""
1925+
default_cfg = default_cfgs['mixnet_xl']
1926+
#kwargs['drop_connect_rate'] = 0.2
1927+
model = _gen_mixnet_m(
1928+
channel_multiplier=1.6, depth_multiplier=1.2, num_classes=num_classes, in_chans=in_chans, **kwargs)
1929+
model.default_cfg = default_cfg
1930+
if pretrained:
1931+
load_pretrained(model, default_cfg, num_classes, in_chans)
1932+
return model
1933+
1934+
1935+
@register_model
1936+
def mixnet_xxl(pretrained=False, num_classes=1000, in_chans=3, **kwargs):
1937+
"""Creates a MixNet Double Extra Large model.
1938+
Not a paper spec, experimental def by RW w/ depth scaling.
1939+
"""
1940+
default_cfg = default_cfgs['mixnet_xxl']
1941+
# kwargs['drop_connect_rate'] = 0.2
1942+
model = _gen_mixnet_m(
1943+
channel_multiplier=2.4, depth_multiplier=1.3, num_classes=num_classes, in_chans=in_chans, **kwargs)
1944+
model.default_cfg = default_cfg
1945+
if pretrained:
1946+
load_pretrained(model, default_cfg, num_classes, in_chans)
1947+
return model
1948+
1949+
18791950
@register_model
18801951
def tf_mixnet_s(pretrained=False, num_classes=1000, in_chans=3, **kwargs):
18811952
"""Creates a MixNet Small model. Tensorflow compatible variant

0 commit comments

Comments
 (0)