This companion page to paper [1] presents some randomly selected audio examples from our listening test. The stimuli illustrate the effects of conventional versus robust DNN-based duration prediction from found audiobook data (“Emma” by Jane Austen). Please read the paper for more information, including descriptions of the different systems and their properties.
Note: Should you experience problems with hearing the audio, please wait a while to allow the audio data to load (3.8 MiB). If playback still does not work, please try another web browser.
Audio examples
System: |
|||||||||
---|---|---|---|---|---|---|---|---|---|
VOC |
FRC |
BOT |
MSE |
MLE1 |
MLE3 |
B75 |
B50 |
||
Prompt ID: |
184 |
||||||||
192 |
|||||||||
198 |
|||||||||
207 |
References
- G. E. Henter, S. Ronanki, O. Watts, M. Wester, Z. Wu, and S. King, “Robust TTS duration modelling using DNNs,” Proc. ICASSP, 2016, pp. 5130–5134.
[ pdf | .bib | more info ]