مرکز منطقه ای اطلاع رساني علوم و فناوري - Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System

DocumentCode :

868600

Title :

Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System

Author :

Wu, Zhiyong ; Meng, Helen M. ; Yang, Hongwu ; Cai, Lianhong

Author_Institution :

Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong (CUHK), Hong Kong, China

Volume :

Issue :

fYear :

2009

Firstpage :

1567

Lastpage :

1576

Abstract :

This work focuses on the development of expressive text-to-speech synthesis techniques for a Chinese spoken dialog system, where the expressivity is driven by the message content. We adapt the three-dimensional pleasure-displeasure, arousal-nonarousal and dominance-submissiveness (PAD) model for describing expressivity in input text semantics. The context of our study is based on response messages generated by a spoken dialog system in the tourist information domain. We use the P (pleasure) and A (arousal) dimensions to describe expressivity at the prosodic word level based on lexical semantics. The D (dominance) dimension is used to describe expressivity at the utterance level based on dialog acts. We analyze contrastive (neutral versus expressive) speech recordings to develop a nonlinear perturbation model that incorporates the PAD values of a response message to transform neutral speech into expressive speech. Two levels of perturbations are implemented-local perturbation at the prosodic word level, as well as global perturbation at the utterance level. Perceptual experiments involving 14 subjects indicate that the proposed approach can significantly enhance expressivity in response generation for a spoken dialog system.

Keywords :

interactive systems; natural languages; speech synthesis; Chinese text-to-speech synthesis; PAD model; arousal-nonarousal model; dominance-submissiveness model; lexical semantics; nonlinear perturbation model; perceptual experiment; pleasure-displeasure model; prosodic word level; speech recording; spoken dialog system; tourist information domain; Expressive text-to-speech (TTS) synthesis; nonlinear perturbation model; response generation; spoken dialog system (SDS);

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2009.2023161

Filename :

4926212

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=868600