conference logo

Playlist "Electromagnetic Field 2018"

WaveNet, what's behind Google's voice

Norman Casagrande

As a member of the team who brought WaveNet into production, I would like to give an insight into the technical challenges and the technology which powers Google's voice, and how this type of algorithms could be used to solve other interesting problems.

WaveNet is a machine learning algorithm developed at (Google) DeepMind which was the first to obtain human-like quality of voice directly modeling waveforms. You can read more about it here:

https://deepmind.com/blog/wavenet-generative-model-raw-audio/
https://en.wikipedia.org/wiki/WaveNet

Since the original paper was published, there has been several developments to substantially speed up audio generation (several order of magnitude) in order to put it into production, and it is now powering Google's assistant voice in several languages. See also:

https://deepmind.com/blog/wavenet-launches-google-assistant/
https://deepmind.com/blog/high-fidelity-speech-synthesis-wavenet/
https://arxiv.org/abs/1802.08435

Note that the technology is not limited to audio and can pretty much work with any signal, but this is so new that we're just scratching the surface of the potential applications.

In my talk I will be talking about the basic tech, what it took to get it into production and the more recent developments. I would also like to give a few suggestions and hints on how to reproduce these results: the latest version (WaveRNN) is rather simple to implement and yet very powerful.