-
-
Notifications
You must be signed in to change notification settings - Fork 67
Unify lower level API for EfficientNet and MobileNet model families #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1. Expose `pretrain` option for all models 2. Make it easier to initialise models with config options at the mid level by providing an additional dispatch 3. Some cleanup + documentation
1. Unify MobileNet and EfficientNet APIs into a single lower level builder function 2. Further unification and consolidation of the mid level API 3. Some cleanup
| conv_norm((1, 1), inplanes, squeeze_planes, activation; | ||
| norm_layer)..., | ||
| conv_norm((1, 1), squeeze_planes, inplanes, | ||
| gate_activation; norm_layer)...), .*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL that (.*) returns a Base.Broadcast.BroadcastFunction. Maybe we should create an alias for it in NNlib...
ToucheSir
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fewer comments than I thought for a first pass :)
1. Rename `DropPath` to `StochasticDepth` to better reflect the paper 2. `_prob` is more accurate than `_rate`. 3. Add stochastic depth to the `mbconv` builders
|
This PR now also adds an additional |
|
|
||
| # TODO look into "row" mode for stochastic depth | ||
| """ | ||
| DropPath(p; [rng = rng_from_array(x)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this non/less breaking, adding a deprecation for DropPath would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the Layers API is unexported, so does this still count as a breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really should be exported in 0.8, but for now it isn't so I suppose it might work out in our favour 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed it was why you added the label, but that is true. Built docs list it as "public", but I think that's a false positive and JuliaHub + GH search turn up no results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, the breaking change is that EfficientNet literally returns a different model 😅
|
This PR is complete I think - any further docs improvements would be best handled in a separate PR after landing this one and then #199 |
darsnack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few final naming things. Also, we can tackle more thorough docs including docstrings separately.
src/convnets/builders/mbconv.jl
Outdated
| stochastic_depth_prob = nothing, norm_layer = BatchNorm, | ||
| divisor::Integer = 8, kwargs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to have unused keywords? As far as I can tell, right now it is silently ignored, but not including extraneous keywords would throw a MethodError (as it should).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are being used though, to pass stuff like the se_round_fn in for MobileNetv3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I specifically mean stuff like stochastic_depth_prob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I specifically mean stuff like
stochastic_depth_prob
This particular one needs to be there because it will cause a MethodError for dwsep_conv_norm if passed through (and the default model builder passes it through). See also #200 (comment)
|
Bump? |
| stochastic_depth_prob = nothing, norm_layer = BatchNorm, | ||
| divisor::Integer = 8, kwargs...) | ||
| width_mult, depth_mult = scalings | ||
| block_repeats = [ceil(Int, block_configs[idx][end - 1] * depth_mult) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Mobilenet the config list is
const MOBILENETV2_CONFIGS = [
(mbconv, 3, 16, 1, 1, 1, nothing, relu6),
(mbconv, 3, 24, 6, 2, 2, nothing, relu6),
(mbconv, 3, 32, 6, 2, 3, nothing, relu6),
(mbconv, 3, 64, 6, 2, 4, nothing, relu6),
(mbconv, 3, 96, 6, 1, 3, nothing, relu6),
(mbconv, 3, 160, 6, 2, 3, nothing, relu6),
(mbconv, 3, 320, 6, 1, 1, nothing, relu6),
]
so won't block_repeats just be a list of nothing ?
and as the name suggest, it tells how many time a particular block of layers is repeated?
This PR unifies the lower level API for the EfficientNet and MobileNet model families into a single
irmodelbuilderfunction. This followed as a natural consequence of #198 - a considerable amount of the calculations, model structure and configurations have now been abstracted out enough into pieces that compose, meaning that most of the models in these two families can be almost completely identified by their configuration dict and a couple of other arguments. Amongst other things, what this meant wasDocumentation TODO
Miscellaneous Notes
In a step closer to a uniform API, the higher level model APIs now all accept the
pretrainkeyword argument.