Demo: Where do the improvements come from in sequence-to-sequence neural TTS?

This companion page to paper [1] presents some randomly selected audio examples from our listening test. The stimuli illustrate the effects of gradually stepping between two speech-synthesis paradigms – namely from DNN-based statistical parametric speech synthesis (Merlin) to sequence-to-sequence neural TTS (Ophelia) – in text-to-speech systems trained on the LJ Speech Dataset. Please read the paper for more information, including descriptions of the different systems and their properties.

Note: Should you experience problems with hearing the audio, please wait a while to allow the audio data to load (6.8 MiB). If playback still does not work, please try another web browser.

Audio examples

		System:
		M	MM	W2	W2T	W2H	G2	G1	G1H	G1TH	G1HA	G1THA
Prompt ID:	430
	490
	502
	503
	508

References

O. Watts, G. E. Henter, J. Fong, and C. Valentini-Botinhao, “Where do the improvements come from in sequence-to-sequence neural TTS?,” Proc. SSW, 2020, pp. 217–222.
[ pdf | .bib | more info ]

[ return to main page | contact the author ]

Where do the improvements come fromin sequence-to-sequence neural TTS?:Audio Examples

Audio examples

References

Where do the improvements come from
in sequence-to-sequence neural TTS?:
Audio Examples