|
| 1 | +# Framewise Auto-Regressive GAN (FARGAN) |
| 2 | + |
| 3 | +Implementation of FARGAN, a low-complexity neural vocoder. |
| 4 | + |
| 5 | +## Data preparation |
| 6 | + |
| 7 | +For data preparation you need to build Opus as detailed in the top-level README. |
| 8 | +You will need to use the --enable-deep-plc configure option. |
| 9 | +The build will produce an executable named "dump_data". |
| 10 | +To prepare the training data, run: |
| 11 | + |
| 12 | +./dump_data -train in_speech.pcm out_features.f32 out_speech.pcm |
| 13 | + |
| 14 | +Where the in_speech.pcm speech file is a raw 16-bit PCM file sampled at 16 kHz. |
| 15 | +The speech data used for training the model can be found at: |
| 16 | +https://media.xiph.org/lpcnet/speech/tts_speech_negative_16k.sw |
| 17 | + |
| 18 | +## Training |
| 19 | + |
| 20 | +To perform pre-training, run the following command: |
| 21 | +``` |
| 22 | +python ./train_fargan.py out_features.f32 out_speech.pcm output_dir --epochs 400 --batch-size 4096 --lr 0.002 --cuda-visible-devices 0 |
| 23 | +``` |
| 24 | +Once pre-training is complete, run adversarial training using: |
| 25 | +``` |
| 26 | +python adv_train_fargan.py out_features.f32 out_speech.pcm output_dir --lr 0.000002 --reg-weight 5 --batch-size 160 --cuda-visible-devices 0 --initial-checkpoint output_dir/checkpoints/fargan_400.pth |
| 27 | +``` |
| 28 | +The final model will be in output_dir/checkpoints/fargan_adv_50.pth. |
| 29 | + |
| 30 | +The model can optionally be converted to C using: |
| 31 | +``` |
| 32 | +python dump_fargan_weights.py output_dir/checkpoints/fargan_adv_50.pth fargan_c_dir |
| 33 | +``` |
| 34 | +which will create a fargan_data.c and a fargan_data.h file in the fargan_c_dir directory. |
| 35 | +Copy these files to the opus/dnn/ directory (replacing the existing ones) and recompile Opus. |
| 36 | + |
| 37 | +## Inference |
| 38 | + |
| 39 | +To run the inference, start by generating the features from the audio using: |
| 40 | +``` |
| 41 | +./fargan_demo -features test_speech.pcm test_features.f32 |
| 42 | +``` |
| 43 | +Synthesis can be achieved either using the PyTorch code or the C code. |
| 44 | +To synthesize from PyTorch, run: |
| 45 | +``` |
| 46 | +python test_fargan.py output_dir/checkpoints/fargan_adv_50.pth test_features.f32 output_speech.pcm |
| 47 | +``` |
| 48 | +To synthesize from the C code, run: |
| 49 | +``` |
| 50 | +./fargan_demo -fargan-synthesis test_features.f32 output_speech.pcm |
| 51 | +``` |
0 commit comments