• DocumentCode
    3703340
  • Title

    From simulated speech to natural speech, what are the robust features for emotion recognition?

  • Author

    Ya Li;Linlin Chao;Yazhu Liu;Wei Bao;Jianhua Tao

  • Author_Institution
    National Laboratory of Pattern Recognition, (NLPR), Institute of Automation, CAS, Beijing, China
  • fYear
    2015
  • Firstpage
    368
  • Lastpage
    373
  • Abstract
    The earliest research on emotion recognition starts with simulated/acted stereotypical emotional corpus, and then extends to elicited corpus. Recently, the demanding for real application forces the research shift to natural and spontaneous corpus. Previous research shows that accuracies of emotion recognition are gradual decline from simulated speech, to elicited and totally natural speech. This paper aims to investigate the effects of the common utilized spectral, prosody and voice quality features in emotion recognition with the three types of corpus, and finds out the robust feature for emotion recognition with natural speech. Emotion recognition by several common machine learning methods are carried out and thoroughly compared. Three feature selection methods are performed to find the robust features. The results on six common used corpora confirm that recognition accuracies decrease when the corpus changing from simulated to natural corpus. In addition, prosody and voice quality features are robust for emotion recognition on simulated corpus, while spectral feature is robust in elicited and natural corpus.
  • Keywords
    "Emotion recognition","Speech","Robustness","Databases","Speech recognition","Feature extraction","Support vector machines"
  • Publisher
    ieee
  • Conference_Titel
    Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on
  • Electronic_ISBN
    2156-8111
  • Type

    conf

  • DOI
    10.1109/ACII.2015.7344597
  • Filename
    7344597