DocumentCode :
1937722
Title :
Spoken language synthesis: experiments in synthesis of spontaneous monologues
Author :
Sundaram, Shiva ; Narayanan, Shrikanth
Author_Institution :
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
fYear :
2002
fDate :
11-13 Sept. 2002
Firstpage :
203
Lastpage :
206
Abstract :
While TTS technology has come a long way, there is an ongoing need for bringing improved "naturalness" to synthesized speech. One predominant aspect of natural, spontaneous speech is the variability in it along several dimensions - in terms of vocabulary, prosodic features, paralinguistic elements and discourse markers. Such variability is typically carefully avoided or minimized in conventional text to speech for the sake of high intelligibility. However, in applications requiring immersive anthropomorphic human-machine interfaces, including those with computer-generated avatars, there is a great desire to mimic human-like synthesized speech output. In this paper we investigate methods and the usefulness of incorporating certain features characterizing fluent natural speech for increasing "naturalness" in synthesized speech. We propose a data driven approach for modeling both speaker-independent and speaker-dependent spontaneous speech features at the lexical and acoustic levels (so-called, VoiceFonts). This method has the potential to create unique, custom speaking styles of a target speaker. A simple limited domain synthesizer was built based on this idea using data from a classroom lecture and was used to synthesize 28 target utterances. Results from preliminary listening experiments by 19 volunteers showed that such an approach indeed improves naturalness, without significant loss in intelligibility, beyond the limitations of the underlying waveform. synthesis. For example, subjects could correctly identify natural speech with a probability of 0.6 and confused the clips synthesized in this work with natural speech with a probability of 0.27 in a 4-way choice listening test.
Keywords :
speech processing; speech synthesis; speech-based user interfaces; TTS technology; VoiceFonts; computer-generated avatars; fluent natural speech; human-like synthesized speech; immersive anthropomorphic human-machine interfaces; limited domain synthesizer; speech naturalness; spoken language synthesis; spontaneous monologues; Anthropomorphism; Application software; Avatars; Computer interfaces; Loudspeakers; Man machine systems; Natural languages; Speech synthesis; Synthesizers; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
Print_ISBN :
0-7803-7395-2
Type :
conf
DOI :
10.1109/WSS.2002.1224409
Filename :
1224409
Link To Document :
بازگشت