مرکز منطقه ای اطلاع رساني علوم و فناوري - Emotional talking agent: System and evaluation

DocumentCode :

527761

Title :

Emotional talking agent: System and evaluation

Author :

Zhang, Shen ; Jia, Jia ; Xu, Yingjin ; Cai, Lianhong

Author_Institution :

Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China

Volume :

fYear :

2010

fDate :

10-12 Aug. 2010

Firstpage :

3573

Lastpage :

3577

Abstract :

In this paper, we introduce a system that synthesizes the emotional audio-visual speech for a 3-D talking agent by adopting the PAD (Pleasure-Arousal-Dominance) emotional model. A GMM-based method is introduced to predict variation of acoustic features for emotional speech by PAD values, and a parametric framework of PAD-driven emotional facial expression synthesis is built. As the focus of this paper, we performed a series of perceptual evaluations to understand the reinforcement effect of vocal and facial expression of emotion, and investigated the usefulness and effectiveness of the emotional talking agent in human computer speech communications. Three questions are addressed: 1) To what extent do different interfaces affect human´s comprehension of emotion? 2) How accurate the emotional information is conveyed by the talking agent? 3) Is the multimodal (audio-visual) interface helpful to human´s emotion comprehension? An evaluation involving 19 participants was conducted to compare the effect of different interfaces (speech, mute agent and talking agent) on improving human´s comprehension of emotion. The experimental results unveil the significant mutually reinforcing relationship between audio and video modality in emotion perception, and show that the users have a strong preference to multimodal interface for better comprehension of emotion. The results also prove the effectiveness of our PAD based emotional talking agent synthesis system.

Keywords :

Gaussian processes; emotion recognition; face recognition; human computer interaction; speech recognition; speech synthesis; 3D talking agent; GMM-based method; PAD-driven emotional facial expression synthesis; acoustic features; audio modality; audio-visual interface; emotion perception; emotional audio-visual speech; emotional information; emotional speech; emotional talking agent; human computer speech communications; human emotion comprehension; multimodal interface; parametric framework; perceptual evaluations; pleasure-arousal-dominance emotional model; reinforcement effect; video modality; vocal expression; Acoustics; Computers; Feature extraction; Humans; Speech; Speech synthesis; Visualization; PAD; emotion perception; multimodal reinforcement; talking agent;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Natural Computation (ICNC), 2010 Sixth International Conference on

Conference_Location :

Yantai, Shandong

Print_ISBN :

978-1-4244-5958-2

Type :

conf

DOI :

10.1109/ICNC.2010.5584128

Filename :

5584128

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=527761