-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Open
Labels
Description
Important
DRAFT: This issue is provided for visibility, the below recommendations will evolve
This issue serves as a tracker to standardise the usage of weight_norm throughout the library for our audio models and establish good practices. Different approaches
- conversion time: remove weight norm once when converting the weights
- inference time: remove weight norm at init, meaning the loaded stated_dict is the one with weight norm weight norm is removed at init
- do not remove weight norm, meaning the correct weight is recomputed each time
A summary of how it is done currently throughout the lib:
| Model | In 🤗 Transformers | Original Codebase / Source Project |
|---|---|---|
| dac | conversion time | inference time (not removed) |
| encodec | inference time (not removed) | inference time (not removed) |
| fastspeech2_conformer | NA (copied from) | NA (copied from) |
| hubert | inference time (not removed) | inference time (not removed) |
| mimi | likely weights have been converted | likely weights have been converted for the inference version |
| seamless_m4t | NA (copied from) | NA (copied from) |
| seamless_m4t_v2 | NA (copied from) | NA (copied from) |
| sew | inference time (not removed) | |
| sew_d | inference time (not removed) | |
| speecht5 | inference time (not removed) | |
| unispeech | inference time (not removed) | |
| unispeech_sat | inference time (not removed) | |
| univnet | conversion time | |
| vits | inference time (not removed) | |
| wav2vec2 | inference time (not removed) | |
| wav2vec2_conformer | inference time (not removed) | |
| wavlm | inference time (not removed) |