مرکز منطقه ای اطلاع رساني علوم و فناوري - Spoken language synthesis: experiments in synthesis of spontaneous monologues

DocumentCode :

1937722

Title :

Spoken language synthesis: experiments in synthesis of spontaneous monologues

Author :

Sundaram, Shiva ; Narayanan, Shrikanth

Author_Institution :

Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA

fYear :

2002

fDate :

11-13 Sept. 2002

Firstpage :

203

Lastpage :

206

Abstract :

While TTS technology has come a long way, there is an ongoing need for bringing improved "naturalness" to synthesized speech. One predominant aspect of natural, spontaneous speech is the variability in it along several dimensions - in terms of vocabulary, prosodic features, paralinguistic elements and discourse markers. Such variability is typically carefully avoided or minimized in conventional text to speech for the sake of high intelligibility. However, in applications requiring immersive anthropomorphic human-machine interfaces, including those with computer-generated avatars, there is a great desire to mimic human-like synthesized speech output. In this paper we investigate methods and the usefulness of incorporating certain features characterizing fluent natural speech for increasing "naturalness" in synthesized speech. We propose a data driven approach for modeling both speaker-independent and speaker-dependent spontaneous speech features at the lexical and acoustic levels (so-called, VoiceFonts). This method has the potential to create unique, custom speaking styles of a target speaker. A simple limited domain synthesizer was built based on this idea using data from a classroom lecture and was used to synthesize 28 target utterances. Results from preliminary listening experiments by 19 volunteers showed that such an approach indeed improves naturalness, without significant loss in intelligibility, beyond the limitations of the underlying waveform. synthesis. For example, subjects could correctly identify natural speech with a probability of 0.6 and confused the clips synthesized in this work with natural speech with a probability of 0.27 in a 4-way choice listening test.

Keywords :

speech processing; speech synthesis; speech-based user interfaces; TTS technology; VoiceFonts; computer-generated avatars; fluent natural speech; human-like synthesized speech; immersive anthropomorphic human-machine interfaces; limited domain synthesizer; speech naturalness; spoken language synthesis; spontaneous monologues; Anthropomorphism; Application software; Avatars; Computer interfaces; Loudspeakers; Man machine systems; Natural languages; Speech synthesis; Synthesizers; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on

Print_ISBN :

0-7803-7395-2

Type :

conf

DOI :

10.1109/WSS.2002.1224409

Filename :

1224409

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1937722