Data generation – of which speech synthesis is a prominent example – is an area of burgeoning interest. In this talk, I argue that our priorities in data generation differ from those of conventional statistical estimation techniques, and show that this difference of priorities leads to inappropriate output when we are confronted with bad data or incorrect assumptions. Practitioners need to be aware of this issue when working on data-generation tasks. I present a theoretical argument showing that standard maximum-likelihood estimation prioritises models that fit best in low-density regions of the data. This clashes with established speech-synthesis output generation, which only uses the peak of the fitted distribution as output. Thus, in effect, the tails (outliers) of the data wag the synthesised speech around! Our new insight moreover suggests a natural way to improve our models, based on ideas from the statistical field of robust estimation. An application shows that speech synthesis based on robust techniques better described the typical case in the data and was preferred over non-robust baselines by human listeners. To round off I draw parallels to recent insights in generative adversarial networks and outline a path forward that shadows historical advances on classification tasks.