DocumentCode
3703340
Title
From simulated speech to natural speech, what are the robust features for emotion recognition?
Author
Ya Li;Linlin Chao;Yazhu Liu;Wei Bao;Jianhua Tao
Author_Institution
National Laboratory of Pattern Recognition, (NLPR), Institute of Automation, CAS, Beijing, China
fYear
2015
Firstpage
368
Lastpage
373
Abstract
The earliest research on emotion recognition starts with simulated/acted stereotypical emotional corpus, and then extends to elicited corpus. Recently, the demanding for real application forces the research shift to natural and spontaneous corpus. Previous research shows that accuracies of emotion recognition are gradual decline from simulated speech, to elicited and totally natural speech. This paper aims to investigate the effects of the common utilized spectral, prosody and voice quality features in emotion recognition with the three types of corpus, and finds out the robust feature for emotion recognition with natural speech. Emotion recognition by several common machine learning methods are carried out and thoroughly compared. Three feature selection methods are performed to find the robust features. The results on six common used corpora confirm that recognition accuracies decrease when the corpus changing from simulated to natural corpus. In addition, prosody and voice quality features are robust for emotion recognition on simulated corpus, while spectral feature is robust in elicited and natural corpus.
Keywords
"Emotion recognition","Speech","Robustness","Databases","Speech recognition","Feature extraction","Support vector machines"
Publisher
ieee
Conference_Titel
Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on
Electronic_ISBN
2156-8111
Type
conf
DOI
10.1109/ACII.2015.7344597
Filename
7344597
Link To Document