This companion page to paper [1] presents some randomly selected audio examples from our listening test. The stimuli illustrate the effects of gradually stepping between two speech-synthesis paradigms – namely from DNN-based statistical parametric speech synthesis (Merlin) to sequence-to-sequence neural TTS (Ophelia) – in text-to-speech systems trained on the LJ Speech Dataset. Please read the paper for more information, including descriptions of the different systems and their properties.

Note: Should you experience problems with hearing the audio, please wait a while to allow the audio data to load (6.8 MiB). If playback still does not work, please try another web browser.


Audio examples



System:


M
MM
W2
W2T
W2H
G2
G1
G1H
G1TH
G1HA
G1THA
Prompt ID:
430

490

502

503

508

References

  1. O. Watts, G. E. Henter, J. Fong, and C. Valentini-Botinhao, “Where do the improvements come from in sequence-to-sequence neural TTS?,” Proc. SSW, 2020, pp. 217–222.
    [ pdf | .bib | more info ]

[ return to main page | contact the author ]