• DocumentCode
    81107
  • Title

    CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset

  • Author

    Houwei Cao ; Cooper, David G. ; Keutmann, Michael K. ; Gur, Ruben C. ; Nenkova, Ani ; Verma, Rajesh

  • Author_Institution
    Dept. of Radiol., Univ. of Pennsylvania, Philadelphia, PA, USA
  • Volume
    5
  • Issue
    4
  • fYear
    2014
  • fDate
    Oct.-Dec. 1 2014
  • Firstpage
    377
  • Lastpage
    390
  • Abstract
    People convey their emotional state in their face and voice. We present an audio-visual dataset uniquely suited for the study of multi-modal emotion expression and perception. The dataset consists of facial and vocal emotional expressions in sentences spoken in a range of basic emotional states (happy, sad, anger, fear, disgust, and neutral). 7,442 clips of 91 actors with diverse ethnic backgrounds were rated by multiple raters in three modalities: audio, visual, and audio-visual. Categorical emotion labels and real-value intensity values for the perceived emotion were collected using crowd-sourcing from 2,443 raters. The human recognition of intended emotion for the audio-only, visual-only, and audio-visual data are 40.9, 58.2 and 63.6 percent respectively. Recognition rates are highest for neutral, followed by happy, anger, disgust, fear, and sad. Average intensity levels of emotion are rated highest for visual-only perception. The accurate recognition of disgust and fear requires simultaneous audio-visual cues, while anger and happiness can be well recognized based on evidence from a single modality. The large dataset we introduce can be used to probe other questions concerning the audio-visual perception of emotion.
  • Keywords
    emotion recognition; face recognition; speech recognition; CREMA-D; audio-visual dataset; crowd-sourced emotional multimodal actor dataset; emotional state; ethnic backgrounds; human emotion recognition; multimodal emotion expression; multimodal emotion perception; Audio-visual systems; Crowdsourcing; Databases; Emotion recognition; Emotional corpora; facial expression; multi-modal recognition; voice expression;
  • fLanguage
    English
  • Journal_Title
    Affective Computing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1949-3045
  • Type

    jour

  • DOI
    10.1109/TAFFC.2014.2336244
  • Filename
    6849440