Overview

This page lists my publications in reverse chronological order, grouped by type:

You can also download a .bib file with all my publications.

Data releases and slides from my talks are listed on the main page.

Books and book chapters

  1. Gustav Eje Henter, Simon King, Thomas Merritt, and Gilles Degottex (2016)
    Analysing shortcomings of statistical parametric speech synthesis
    Draft book chapter for Natural Speech Technology: Statistical and Machine Learning Approaches to Speech Recognition and Synthesis, edited by S. Renals, M. J. F. Gales, T. Hain, P. C. Woodland, S. King, and S. Cunningham
    First posted online Jul. 2018 (34 pages)
    [ preprint | arXiv | abstract | .bib ]
  2. Arne Leijon and Gustav Eje Henter
    Pattern Recognition: Fundamental Theory and Exercise Problems
    Course book for the M.Sc. course EQ2341 Pattern Recognition and Machine Learning at the School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
    Last revised in 2015 (283 pages)
    [ book webpage | course syllabus | course web | .bib ]

Journal papers (peer reviewed)

  1. Simon Alexanderson, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow (2020)
    Style‐controllable speech‐driven gesture synthesis using normalising flows
    Comput. Graph. Forum (vol. 39, no. 2, pp. 487–496)
    Honourable Mention at EUROGRAPHICS 2020 (Best Paper Award nominee, top 4 of 141 submissions)
    [ postprint | publisher (open access, with supplement) | abstract | erratum | .bib | code | teaser video | video presentation ]
  2. Jaime Lorenzo-Trueba, Gustav Eje Henter, Shinji Takaki, Junichi Yamagishi, Yosuke Morino, and Yuta Ochiai (2018)
    Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis
    Speech Commun. (vol. 99, pp. 135–143)
    [ postprint | publisher | abstract | .bib ]
  3. Gustav Eje Henter and W. Bastiaan Kleijn (2016)
    Minimum entropy rate simplification of stochastic processes
    IEEE T. Pattern Anal. (vol. 38, no. 12, pp. 2487–2500)
    [ postprint | publisher | appendices | abstract | .bib ]
  4. Arne Leijon, Gustav Eje Henter, and Martin Dahlquist (2016)
    Bayesian analysis of phoneme confusion matrices
    IEEE/ACM T. Audio Speech (vol. 24, no. 3, pp. 469–482)
    [ postprint | publisher | abstract | .bib ]
  5. Petko N. Petkov, Gustav Eje Henter, and W. Bastiaan Kleijn (2013)
    Maximizing phoneme recognition accuracy for enhanced speech intelligibility in noise
    IEEE T. Audio Speech (vol. 21, no. 5, pp. 1035–1045)
    [ postprint | publisher | abstract | .bib ]
  6. Gustav Eje Henter and W. Bastiaan Kleijn (2013)
    Picking up the pieces: Causal states in noisy data, and how to recover them
    Pattern Recogn. Lett. (vol. 34, no. 5, pp. 587–594)
    [ postprint | publisher | supplement | abstract | .bib ]

Conference papers (peer reviewed)

  1. Simon Alexanderson, Éva Székely, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow (2020)
    Generating coherent spontaneous speech and gesture from text
    Accepted for publication at ACM International Conference on Intelligent Virtual Agents (IVA 2020)
    [ contact me for preprint ]
  2. Anubhab Ghosh, Antoine Honoré, Dong Liu, Gustav Eje Henter, and Saikat Chatterjee (2020)
    Robust classification using hidden Markov models and mixtures of normalizing flows
    Accepted for publication at IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2020)
    [ contact me for postprint ]
  3. Simon Alexanderson and Gustav Eje Henter (2020)
    Robust model training and generalisation with Studentising flows
    Proc. INNF+ 2020 (2nd workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models) (pp. 25:1–25:9)
    [ postprint | publisher (open access) | arXiv | abstract | .bib | video presentation | slides ]
  4. Éva Székely, Gustav Eje Henter, Jonas Beskow, and Joakim Gustafson (2020)
    Breathing and speech planning in spontaneous speech synthesis
    Proc. ICASSP 2020 (pp. 7649–7653)
    [ postprint | publisher | abstract | .bib | listening examples ]
  5. Oliver Watts, Gustav Eje Henter, Jason Fong, and Cassia Valentini-Botinhao (2019)
    Where do the improvements come from in sequence-to-sequence neural TTS?
    Proc. SSW 2019 (pp. 217–222)
    [ postprint | publisher (open access) | abstract | .bib | code | listening examples | slides by Dr. Watts ]
  6. Éva Székely, Gustav Eje Henter, Jonas Beskow, and Joakim Gustafson (2019)
    How to train your fillers: uh and um in spontaneous speech synthesis
    Proc. SSW 2019 (pp. 245–250)
    [ postprint | publisher (open access) | abstract | .bib | listening examples | poster by Dr. Székely ]
  7. Petra Wagner, Jonas Beskow, Simon Betz, Jens Edlund, Joakim Gustafson, Gustav Eje Henter, Sébastien Le Maguer, Zofia Malisz, Éva Székely, Christina Tånnander, and Jana Voße (2019)
    Speech synthesis evaluation – State-of-the-art assessment and suggestion for a novel research program
    Proc. SSW 2019 (pp. 105–110)
    [ postprint | publisher (open access) | abstract | .bib | slides by Prof. Wagner ]
  8. Éva Székely, Gustav Eje Henter, Jonas Beskow, and Joakim Gustafson (2019)
    Spontaneous conversational speech synthesis from found data
    Proc. Interspeech 2019 (pp. 4435–4439)
    [ postprint | publisher (open access) | abstract | .bib | listening examples | poster by Dr. Székely ]
  9. Éva Székely, Gustav Eje Henter, Jonas Beskow, and Joakim Gustafson (2019)
    Off the cuff: Exploring extemporaneous speech delivery with TTS
    Proc. Interspeech 2019 (pp. 3687–3688)
    Show & Tell Demo Award (best of 37 demos)
    [ postprint | publisher (open access) | abstract | .bib | Show & Tell demo video | poster by Dr. Székely ]
  10. Zofia Malisz, Gustav Eje Henter, Cassia Valentini-Botinhao, Oliver Watts, Jonas Beskow, and Joakim Gustafson (2019)
    Modern speech synthesis for phonetic sciences: a discussion and an evaluation
    Proc. ICPhS 2019 (pp. 487–491)
    [ postprint | publisher (open access) | abstract | erratum | .bib | test materials | slides w/ audio (2.6 MiB) | slides w/o audio ]
  11. Taras Kucherenko, Dai Hasegawa, Gustav Eje Henter, Naoshi Kaneko, and Hedvig Kjellström (2019)
    Analyzing input and output representations for speech-driven gesture generation
    Proc. IVA 2019 (pp. 97–104)
    [ postprint | publisher | arXiv | abstract | .bib | main page | code (for Japanese) | code (for English) | slides by Mr. Kucherenko | video presentation | English-language example ]
  12. Éva Székely, Gustav Eje Henter, and Joakim Gustafson (2019)
    Casting to corpus: Segmenting and selecting spontaneous dialogue for TTS with a CNN-LSTM speaker-dependent breath detector
    Proc. ICASSP 2019 (pp. 6925–6929)
    [ postprint | publisher | abstract | .bib | poster by Dr. Székely ]
  13. Taras Kucherenko, Dai Hasegawa, Naoshi Kaneko, Gustav Eje Henter, and Hedvig Kjellström (2019)
    On the importance of representations for speech-driven gesture generation
    Proc. AAMAS 2019 extended abstracts (pp. 2072–2074)
    [ postprint | publisher | abstract | .bib | poster by Mr. Kucherenko ]
  14. Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Mariko Kondo, and Junichi Yamagishi (2018)
    Cyborg speech: Deep multilingual speech synthesis for generating segmental foreign accent with natural prosody
    Proc. ICASSP 2018 (pp. 4799–4803)
    [ postprint | publisher | abstract | .bib | slides w/o audio ]
  15. Jaime Lorenzo-Trueba, Gustav Eje Henter, Shinji Takaki, Junichi Yamagishi, Yosuke Morino, and Yuta Ochiai (2017)
    Investigating different representations for modeling multiple emotions in DNN-based speech synthesis
    Proc. 3rd International Workshop on Affective Social Multimedia Computing (ASMMC) 2017
    [ postprint | abstract | .bib ]
  16. Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, and Junichi Yamagishi (2017)
    Principles for learning controllable TTS from annotated and latent variation
    Proc. Interspeech 2017 (pp. 4799–4803)
    [ corrected postprint | publisher (open access) | abstract | erratum | .bib | poster ]
  17. Jaime Lorenzo-Trueba, Cassia Valentini-Botinhao, Gustav Eje Henter, and Junichi Yamagishi (2017)
    Misperceptions of the emotional content of natural and vocoded speech in a car
    Proc. Interspeech 2017 (pp. 606–610)
    [ postprint | publisher (open access) | abstract | .bib ]
  18. Hieu-Thi Luong, Shinji Takaki, Gustav Eje Henter, and Junichi Yamagishi (2017)
    Adapting and controlling DNN-based speech synthesis using input codes
    Proc. ICASSP 2017 (pp. 4905–4909)
    [ postprint | publisher | abstract | .bib | listening examples by Mr. Luong ]
  19. Srikanth Ronanki, Oliver Watts, Simon King, and Gustav Eje Henter (2016)
    Median-based generation of synthetic speech durations using a non-parametric approach
    Proc. SLT 2016 (pp. 686–692)
    [ postprint | publisher | arXiv | abstract | .bib | poster by Mr. Ronanki ]
  20. Takenori Yoshimura, Gustav Eje Henter, Oliver Watts, Mirjam Wester, Junichi Yamagishi, and Keiichi Tokuda (2016)
    A hierarchical predictor of synthetic speech naturalness using neural networks
    Proc. Interspeech 2016 (pp. 342–346)
    [ postprint | publisher (open access) | abstract | .bib | poster by Mr. Yoshimura ]
  21. Srikanth Ronanki, Gustav Eje Henter, Zhizheng Wu, and Simon King (2016)
    A template-based approach for speech synthesis intonation generation using LSTMs
    Proc. Interspeech 2016 (pp. 2463–2467)
    [ postprint | publisher (open access) | abstract | .bib | code | test materials | poster by Mr. Ronanki | slides by Mr. Ronanki ]
  22. Mirjam Wester, Oliver Watts, and Gustav Eje Henter (2016)
    Evaluating comprehension of natural and synthetic conversational speech
    Proc. Speech Prosody 2016 (pp. 736–740)
    [ postprint | publisher (open access) | abstract | .bib | test materials | poster by Dr. Wester ]
  23. Gustav Eje Henter, Srikanth Ronanki, Oliver Watts, Mirjam Wester, Zhizheng Wu, and Simon King (2016)
    Robust TTS duration modelling using DNNs
    Proc. ICASSP 2016 (pp. 5130–5134)
    [ postprint | publisher | abstract | .bib | listening examples | test materials | slides w/ audio (2.0 MiB) | slides w/o audio ]
  24. Oliver Watts, Gustav Eje Henter, Thomas Merritt, Zhizheng Wu, and Simon King (2016)
    From HMMs to DNNs: where do the improvements come from?
    Proc. ICASSP 2016 (pp. 5505–5509)
    [ postprint | publisher | abstract | .bib | test materials | poster by Dr. Watts ]
  25. Rasmus Dall, Sandrine Brognaux, Korin Richmond, Cassia Valentini-Botinhao, Gustav Eje Henter, Julia Hirschberg, Junichi Yamagishi, and Simon King (2016)
    Testing the consistency assumption: Pronunciation variant forced alignment in read and spontaneous speech synthesis
    Proc. ICASSP 2016 (pp. 5155–5159)
    [ postprint | publisher | abstract | .bib | test materials | slides by Mr. Dall ]
  26. Mirjam Wester, Cassia Valentini-Botinhao, and Gustav Eje Henter (2015)
    Are we using enough listeners? No! An empirically-supported critique of Interspeech 2014 TTS evaluations
    Proc. Interspeech 2015 (pp. 3476–3480)
    [ postprint | publisher (open access) | abstract | .bib | slides by Dr. Wester ]
  27. Gustav Eje Henter, Thomas Merritt, Matt Shannon, Catherine Mayo, and Simon King (2014)
    Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech
    Proc. Interspeech 2014 (pp. 1504–1508)
    [ postprint | publisher (open access) | abstract | .bib | database | listening examples | poster by Mr. Shannon | slides w/ audio (2.9 MiB) | slides w/o audio ]
  28. Matthew P. Aylett, Rasmus Dall, Arnab Ghoshal, Gustav Eje Henter, and Thomas Merritt (2014)
    A flexible front-end for HTS
    Proc. Interspeech 2014 (pp. 1283–1287)
    [ postprint | publisher (open access) | abstract | .bib | .odp slides by Mr. Dall ]
  29. Petko N. Petkov, W. Bastiaan Kleijn, and Gustav Eje Henter (2012)
    Enhancing subjective speech intelligibility using a statistical model of speech
    Proc. Interspeech 2012 (pp. 166–169)
    [ postprint | publisher (open access) | abstract | .bib | poster ]
  30. Petko N. Petkov, W. Bastiaan Kleijn, and Gustav Eje Henter (2012)
    Speech intelligibility enhancement using a statistical model of clean speech
    Proc. The Listening Talker Workshop (p. 77)
    [ postprint | .bib ]
  31. Gustav Eje Henter, Marcus R. Frean, and W. Bastiaan Kleijn (2012)
    Gaussian process dynamical models for nonparametric speech representation and synthesis
    Proc. ICASSP 2012 (pp. 4505–4508)
    [ postprint | publisher | abstract | .bib | poster ]
  32. Gustav Eje Henter and W. Bastiaan Kleijn (2011)
    Intermediate-state HMMs to capture continuously-changing signal features
    Proc. Interspeech 2011 (pp. 1817–1820)
    [ postprint | publisher (open access) | abstract | .bib | poster ]
  33. Gustav Eje Henter and W. Bastiaan Kleijn (2010)
    Simplified probability models for generative tasks: a rate-distortion approach
    Proc. EUSIPCO 2010 (pp. 1159–1163)
    [ postprint | publisher (open access) | abstract | .bib | poster ]

Conference contributions (not peer reviewed)

  1. Zofia Malisz, Gustav Eje Henter, Cassia Valentini-Botinhao, Oliver Watts, Jonas Beskow, and Joakim Gustafson (2019)
    Modern speech synthesis and its implications for speech sciences
    Proc. UK Speech 2019 (pp. 11–12)
    [ abstract | publisher (open access) | .bib | slides w/ audio (2.6 MiB) | slides w/o audio ]
  2. Éva Székely, Gustav Eje Henter, Jonas Beskow, and Joakim Gustafson (2019)
    Spontaneous conversational TTS from found data
    Proc. UK Speech 2019 (p. 31)
    [ abstract | publisher (open access) | .bib | poster ]
  3. Oliver Watts, Gustav Eje Henter, Jason Fong, and Cassia Valentini-Botinhao (2019)
    Sequence-to-sequence neural TTS: an assessment of the contribution of various ingredients
    Proc. UK Speech 2019 (p. 64)
    [ abstract | publisher (open access) | .bib | poster by Dr. Watts ]
  4. Zofia Malisz, Gustav Eje Henter, Cassia Valentini-Botinhao, Oliver Watts, Jonas Beskow, and Joakim Gustafson (2019)
    The speech synthesis phoneticians need is both realistic and controllable
    Proc. FONETIK 2019 (pp. 103–107)
    [ postprint | publisher (open access) | abstract | .bib | slides by Dr. Malisz ]
  5. Éva Székely, Gustav Eje Henter, Jonas Beskow, and Joakim Gustafson (2019)
    Spontaneous conversational speech synthesis: The making of a podcast voice – breathing, uhs & ums and some ponderings about appropriateness
    Show & tell at FONETIK 2019 (without paper)
    [ slides by Dr. Székely | .bib ]
  6. Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Mariko Kondo, and Junichi Yamagishi (2018)
    Generating segment-level foreign-accented synthetic speech with natural speech prosody
    IPSJ SIG Technical Reports (vol. 2018-SLP-120, no. 8, pp. 1–3)
    [ postprint | publisher | abstract | .bib | slides ]
  7. Jaime Lorenzo-Trueba, Gustav Eje Henter, Shinji Takaki, and Junichi Yamagishi (2017)
    Analyzing the impact of including listener perception annotations in RNN-based emotional speech synthesis
    IPSJ SIG Technical Reports (vol. 2017-SLP-119, no. 8, pp. 1–2)
    [ publisher | abstract | .bib ]
  8. Gustav Eje Henter, Srikanth Ronanki, Oliver Watts, and Simon King (2017)
    Non-parametric duration modelling for speech synthesis with a joint model of acoustics and duration
    IEICE Technical Report (vol. 116, no. 414, pp. 11–16)
    [ postprint | publisher | abstract | .bib | slides ]
  9. Gustav Eje Henter, Srikanth Ronanki, Oliver Watts, Mirjam Wester, Zhizheng Wu, and Simon King (2016)
    Robust text-to-speech duration modelling with a deep neural network
    Proc. ASA/ASJ 2016 (vol. 140, no. 4, p. 2961)
    [ abstract | publisher | .bib | poster ]
  10. Gustav Eje Henter, Srikanth Ronanki, Oliver Watts, Mirjam Wester, Zhizheng Wu, and Simon King (2016)
    Robust text-to-speech duration modelling using DNNs
    Proc. UK Speech 2016 (p. 44)
    [ postprint | publisher (open access) | abstract | .bib | poster ]
  11. Srikanth Ronanki, Gustav Eje Henter, Zhizheng Wu, and Simon King (2016)
    A template-based approach for intonation generation using LSTMs
    Proc. UK Speech 2016 (p. 22)
    [ postprint | publisher (open access) | abstract | .bib | poster by Mr. Ronanki ]
  12. Gustav Eje Henter (2014)
    Measuring the perceptual effects of speech synthesis modelling assumptions
    Proc. UK Speech 2014 (p. 37)
    [ abstract | publisher (open access) | .bib | poster by Mr. Shannon ]

Preprints

  1. Taras Kucherenko, Dai Hasegawa, Naoshi Kaneko, Gustav Eje Henter, and Hedvig Kjellström (2020)
    Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation
    Online preprint first posted July 2020 (24 pages)
    [ preprint | arXiv | abstract | .bib | code (for Japanese) | code (for English) ]
  2. Patrik Jonell, Taras Kucherenko, Gustav Eje Henter, and Jonas Beskow (2020)
    Let's face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings
    Online preprint first posted June 2020 (10 pages)
    [ preprint | arXiv | abstract | .bib ]
  3. Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexanderson, Iolanda Leite, and Hedvig Kjellström (2020)
    Gesticulator: A framework for semantically-aware speech-driven gesture generation
    Online preprint first posted Jan. 2020 (9 pages)
    [ preprint | arXiv | abstract | .bib | code ]
  4. Seyyed Saeed Sarfjoo, Xin Wang, Gustav Eje Henter, Jaime Lorenzo-Trueba, Shinji Takaki, and Junichi Yamagishi (2019)
    Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model
    Online preprint first posted Nov. 2019 (5 pages)
    [ preprint | arXiv | abstract | .bib ]
  5. Gustav Eje Henter, Simon Alexanderson, and Jonas Beskow (2019)
    MoGlow: Probabilistic and controllable motion synthesis using normalising flows
    Online preprint first posted May 2019 (12 pages)
    [ preprint | arXiv | abstract | .bib | code | video presentation ]
  6. Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, and Junichi Yamagishi (2018)
    Deep encoder-decoder models for unsupervised learning of controllable speech synthesis
    Online preprint first posted July 2018 (17 pages)
    [ preprint | arXiv | abstract | .bib ]
  7. Sang Phan, Gustav Eje Henter, Yusuke Miyao, and Shin'ichi Satoh (2017)
    Consensus-based sequence training for video captioning
    Online preprint first posted Dec. 2017 (11 pages)
    [ arXiv | abstract | .bib | code ]
  8. Gustav Eje Henter, Arne Leijon, and W. Bastiaan Kleijn (2016)
    Kernel density estimation-based Markov models with hidden state
    Online preprint first posted Jul. 2018 (14 pages)
    [ preprint | arXiv | abstract | .bib | slides ]

Theses

  1. Gustav Eje Henter, supervised by W. Bastiaan Kleijn and Arne Leijon (2013)
    Probabilistic sequence models with speech and language applications
    Ph.D. thesis, School of Electrical Engineering, KTH Royal Institute of Technology
    [ official record | abstract | introduction | .bib ]
  2. Gustav Eje Henter, supervised by Jani Ekonoja and examined by Jan Grandell (2007)
    Mathematical techniques for applied demand estimation and profit maximization
    M.Sc. thesis, Department of Mathematical Statistics, KTH Royal Institute of Technology

Note: Some audio files have been removed from the slides and presentations on this page due to strict Japanese regulations regarding personal data. Audio playback from pdf is known to work in Adobe Acrobat Reader (from trusted documents), but not in Preview on macOS.

This page was last updated 2020-07-26.

[ back to the top of the page | return to main page | contact the author ]