Speech Emotion Recognition using Gaussian Mixture Vector Autoregressive Models

Author

El Ayadi, Moataz M. H. ; Kamel, Mohamed S. ; Karray, Fakhri

Author_Institution

Lab. of Pattern Anal. & Machine Intelligence, Waterloo Univ., Ont., Canada

Volume

4

fYear

2007

fDate

15-20 April 2007

Abstract

It is believed that modeling temporal structure of the speech data may be useful for the problem of speech emotion recognition (T. Nwe et al., 2003). In this paper, Gaussian mixture vector autoregressive model is proposed as a statistical classifier for this task. The main motivation behind using such a model is its ability to model the dependency among extracted speech feature vectors as well as the multi-modality in their distribution. When applied to the Berlin emotional speech database, the proposed technique provides a classification accuracy of 76% versus 71% for the hidden Markov model, 67% for the k-nearest neighbors, 55% for feed-forward neural networks. The model gives also better discrimination between high-arousal, low arousal, and neutral emotions than the HMM.

Keywords

Gaussian processes; autoregressive processes; emotion recognition; speech processing; speech recognition; statistical analysis; Berlin emotional speech database; Gaussian mixture vector autoregressive models; extracted speech feature vectors; speech emotion recognition; statistical classifier; Emotion recognition; Feature extraction; Hidden Markov models; Machine intelligence; Neural networks; Pattern analysis; Reactive power; Spatial databases; Speech analysis; Speech synthesis; Gaussian mixture models; expectation maximization algorithm; maximum likelihood estimation; speech emotion recognition; vector autoregressive models;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on

Conference_Location

Honolulu, HI

ISSN

1520-6149

Print_ISBN

1-4244-0727-3

Type

conf

DOI

10.1109/ICASSP.2007.367230

Filename

4218261