Title :
Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice
Author :
Metallinou, Angeliki ; Lee, Sungbok ; Narayanan, Shrikanth
Author_Institution :
Sch. of Electr. Eng., Univ. of Southern California, Los Angeles, CA
Abstract :
Emotion expression associated with human communication is known to be a multimodal process. In this work, we investigate the way that emotional information is conveyed by facial and vocal modalities, and how these modalities can be effectively combined to achieve improved emotion recognition accuracy. In particular, the behaviors of different facial regions are studied in detail. We analyze an emotion database recorded from ten speakers (five female, five male), which contains speech and facial marker data. Each individual modality is modeled by Gaussian mixture models (GMMs). Multiple modalities are combined using two different methods: a Bayesian classifier weighting scheme and support vector machines that use post classification accuracies as features. Individual modality recognition performances indicate that anger and sadness have comparable accuracies for facial and vocal modalities, while happiness seems to be more accurately transmitted by facial expressions than voice. The neutral state has the lowest performance, possibly due to the vague definition of neutrality. Cheek regions achieve better emotion recognition accuracy compared to other facial regions. Moreover, classifier combination leads to significantly higher performance, which confirms that training detailed single modality classifiers and combining them at a later stage is an effective approach.
Keywords :
Bayes methods; Gaussian processes; audio-visual systems; emotion recognition; face recognition; speech recognition; support vector machines; Bayesian classifier weighting scheme; Gaussian mixture models; audio-visual emotion recognition; facial modalities; human communication; support vector machines; vocal modalities; Bayesian methods; Data analysis; Emotion recognition; Face recognition; Humans; Spatial databases; Speech analysis; Speech recognition; Support vector machine classification; Support vector machines;
Conference_Titel :
Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on
Conference_Location :
Berkeley, CA
Print_ISBN :
978-0-7695-3454-1
Electronic_ISBN :
978-0-7695-3454-1
DOI :
10.1109/ISM.2008.40