Skip to content
Jorge MF edited this page May 14, 2020 · 6 revisions

Jukebox: A Generative Model for Music [code] (May 2020)
A hierarchical VQ-VAE architecture to compress audio into a discrete space and them use this compressed audio with an autoregressive Sparse Transformer.

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (Jan 2019)
3 models trained separately: speaker encoder (identification with triple loss), Synthesizer which uses the speaker encoder as input and the phoneme sequence, and the vocoder (wavenet) to get a waveform.

Interpretable Convolutional Filters with SincNet [code] (Nov 2018)
Convolutions for raw audio using sinc function as a frequency filter instead of normal convolutions.

Clone this wiki locally