Most of human speech occurs in spontaneous conversation, making it an important goal to replicate such speech with text-to-speech (TTS). Using spontaneous conversational speech data in synthesis is however a challenge due to disfluencies, syntactic differences from written language, and general high variability. Moreover, building synthesisers from genuine spontaneous conversations found in the wild (as opposed to conversations elicited and recorded in the lab) brings further complications such as overlapping speech, lack of transcriptions, and no control over recording conditions. Taken together, these challenges mean that synthesis of conversational spontaneous speech from found data has seldom, if ever, been attempted before. We have previously proposed to address some of the above issues by using deep learning to automatically identify and extract single-speaker breath groups (segments of speech bookended by breaths). In this study we build several Tacotron 2 voices on a corpus of 9 hours of clean single-speaker US English breath groups from a conversational podcast and transcribed using off-the-shelf ASR. Our findings from listening tests on these voices include: 1) Phonetic instead of graphemic input improved pronunciation accuracy, as did transfer learning from a larger read-speech corpus. 2) If filler tokens are left untranscribed, the stochastic synthesis will spontaneously insert filled pauses (FPs) into the output with an FP distribution broadly similar to that in the training corpus. With filler tokens transcribed, FPs are only synthesised when requested. Thus control over output FPs is possible but optional. 3) The presence of filled pauses improved perceived speaker authenticity when synthesising a sequence of extemporaneous prompts. 4) More fluent conversational TTS can be achieved by omitting disfluent utterances from the training corpus. 5) When speaking spontaneous prompts (from public speeches as well as causal conversation), our new voices were preferred over both read-speech synthesis from found data and spontaneous-speech synthesis from a small, carefully transcribed, lab-recorded corpus of spontaneous conversational speech.