DocumentCode :
527761
Title :
Emotional talking agent: System and evaluation
Author :
Zhang, Shen ; Jia, Jia ; Xu, Yingjin ; Cai, Lianhong
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Volume :
7
fYear :
2010
fDate :
10-12 Aug. 2010
Firstpage :
3573
Lastpage :
3577
Abstract :
In this paper, we introduce a system that synthesizes the emotional audio-visual speech for a 3-D talking agent by adopting the PAD (Pleasure-Arousal-Dominance) emotional model. A GMM-based method is introduced to predict variation of acoustic features for emotional speech by PAD values, and a parametric framework of PAD-driven emotional facial expression synthesis is built. As the focus of this paper, we performed a series of perceptual evaluations to understand the reinforcement effect of vocal and facial expression of emotion, and investigated the usefulness and effectiveness of the emotional talking agent in human computer speech communications. Three questions are addressed: 1) To what extent do different interfaces affect human´s comprehension of emotion? 2) How accurate the emotional information is conveyed by the talking agent? 3) Is the multimodal (audio-visual) interface helpful to human´s emotion comprehension? An evaluation involving 19 participants was conducted to compare the effect of different interfaces (speech, mute agent and talking agent) on improving human´s comprehension of emotion. The experimental results unveil the significant mutually reinforcing relationship between audio and video modality in emotion perception, and show that the users have a strong preference to multimodal interface for better comprehension of emotion. The results also prove the effectiveness of our PAD based emotional talking agent synthesis system.
Keywords :
Gaussian processes; emotion recognition; face recognition; human computer interaction; speech recognition; speech synthesis; 3D talking agent; GMM-based method; PAD-driven emotional facial expression synthesis; acoustic features; audio modality; audio-visual interface; emotion perception; emotional audio-visual speech; emotional information; emotional speech; emotional talking agent; human computer speech communications; human emotion comprehension; multimodal interface; parametric framework; perceptual evaluations; pleasure-arousal-dominance emotional model; reinforcement effect; video modality; vocal expression; Acoustics; Computers; Feature extraction; Humans; Speech; Speech synthesis; Visualization; PAD; emotion perception; multimodal reinforcement; talking agent;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Computation (ICNC), 2010 Sixth International Conference on
Conference_Location :
Yantai, Shandong
Print_ISBN :
978-1-4244-5958-2
Type :
conf
DOI :
10.1109/ICNC.2010.5584128
Filename :
5584128
Link To Document :
بازگشت