This companion page to paper [1] presents some randomly selected audio examples from our listening test. The stimuli illustrate the effects of gradually stepping between two speech-synthesis paradigms – namely from DNN-based statistical parametric speech synthesis (Merlin) to sequence-to-sequence neural TTS (Ophelia) – in text-to-speech systems trained on the LJ Speech Dataset. Please read the paper for more information, including descriptions of the different systems and their properties.
Note: Should you experience problems with hearing the audio, please wait a while to allow the audio data to load (6.8 MiB). If playback still does not work, please try another web browser.
Audio examples
System: |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
M |
MM |
W2 |
W2T |
W2H |
G2 |
G1 |
G1H |
G1TH |
G1HA |
G1THA |
||
Prompt ID: |
430 |
|||||||||||
490 |
||||||||||||
502 |
||||||||||||
503 |
||||||||||||
508 |
References
- O. Watts, G. E. Henter, J. Fong, and C. Valentini-Botinhao, “Where do the improvements come from in sequence-to-sequence neural TTS?,” Proc. SSW, 2020, pp. 217–222.
[ pdf | .bib | more info ]