From simulated speech to natural speech, what are the robust features for emotion recognition?

Author

Ya Li;Linlin Chao;Yazhu Liu;Wei Bao;Jianhua Tao

Author_Institution

National Laboratory of Pattern Recognition, (NLPR), Institute of Automation, CAS, Beijing, China

fYear

2015

Firstpage

368

Lastpage

373

Abstract

The earliest research on emotion recognition starts with simulated/acted stereotypical emotional corpus, and then extends to elicited corpus. Recently, the demanding for real application forces the research shift to natural and spontaneous corpus. Previous research shows that accuracies of emotion recognition are gradual decline from simulated speech, to elicited and totally natural speech. This paper aims to investigate the effects of the common utilized spectral, prosody and voice quality features in emotion recognition with the three types of corpus, and finds out the robust feature for emotion recognition with natural speech. Emotion recognition by several common machine learning methods are carried out and thoroughly compared. Three feature selection methods are performed to find the robust features. The results on six common used corpora confirm that recognition accuracies decrease when the corpus changing from simulated to natural corpus. In addition, prosody and voice quality features are robust for emotion recognition on simulated corpus, while spectral feature is robust in elicited and natural corpus.

Keywords

"Emotion recognition","Speech","Robustness","Databases","Speech recognition","Feature extraction","Support vector machines"

Publisher

ieee

Conference_Titel

Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on

Electronic_ISBN

2156-8111

Type

conf

DOI

10.1109/ACII.2015.7344597

Filename

7344597