Title :
Anchor Models for Emotion Recognition from Speech
Author :
Attabi, Yazid ; Dumouchel, P.
Author_Institution :
Centre de Rech. Inf. de Montreal (CRIM), Ecole de Technol. Super. (ETS), Montreal, QC, Canada
Abstract :
In this paper, we study the effectiveness of anchor models applied to the multiclass problem of emotion recognition from speech. In the anchor models system, an emotion class is characterized by its measure of similarity relative to other emotion classes. Generative models such as Gaussian Mixture Models (GMMs) are often used as front-end systems to generate feature vectors used to train complex back-end systems such as support vector machines (SVMs) or a multilayer perceptron (MLP) to improve the classification performance. We show that in the context of highly unbalanced data classes, these back-end systems can improve the performance achieved by GMMs provided that an appropriate sampling or importance weighting technique is applied. Furthermore, we show that anchor models based on the euclidean or cosine distances present a better alternative to enhance performances because none of these techniques are needed to overcome the problem of skewed data. The experiments conducted on FAU AIBO Emotion Corpus, a database of spontaneous children´s speech, show that anchor models improve significantly the performance of GMMs by 6.2 percent relative. We also show that the introduction of within-class covariance normalization (WCCN) improves the performance of the anchor models for both distances, but to a higher extent for euclidean distance for which the results become competitive with cosine distance.
Keywords :
Gaussian processes; emotion recognition; multilayer perceptrons; speech recognition; support vector machines; Euclidean distances; FAU AIBO emotion corpus; GMMs; Gaussian mixture models; MLP; SVMs; WCCN; anchor model system; complex back-end systems; cosine distances; emotion recognition; feature vector generation; front-end systems; generative models; importance weighting technique; multilayer perceptron; speech recognition; support vector machines; unbalanced data classes; within-class covariance normalization; Computational modeling; Emotion recognition; Hidden Markov models; Measurement; Speech; Speech recognition; Vectors; Anchor models; GMM model; WCCN; children´s speech; emotion recognition; skewed distribution;
Journal_Title :
Affective Computing, IEEE Transactions on
DOI :
10.1109/T-AFFC.2013.17