Speech emotion classification with the combination of statistic features and temporal features

Author

Jiang, Dan-Ning ; Cai, Lian-Hong

Author_Institution

Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China

Volume

3

fYear

2004

fDate

27-30 June 2004

Firstpage

1967

Abstract

For classifying speech emotion, most previous systems used either statistical features or temporal features exclusively. However, these two distinct feature representations appear to be concerned with different aspects of emotion, and should be combined in the task. This work proposes a classification scheme that enables the combination of them both. In the scheme, GMM and HMM are first performed to model the statistical features and temporal features respectively. Then the GMM likelihoods and HMM likelihoods are used as features in a further procedure. Finally, a weighted Bayesian classifier and MLP are applied to accomplish the classification. Experiments on a Chinese speech corpus have demonstrated that the scheme could improve the classification accuracy greatly. More detailed analysis indicated that these two feature representations could compensate each other efficiently in the classification.

Keywords

Bayes methods; Gaussian distribution; emotion recognition; feature extraction; hidden Markov models; multilayer perceptrons; pattern classification; GMM likelihood; HMM likelihood; MLP; classification accuracy; feature representations; multiple-layer perceptron; speech emotion classification; statistical features; temporal features; weighted Bayesian classifier; Bayesian methods; Computer science; Coordinate measuring machines; Emotion recognition; Feature extraction; Hidden Markov models; Speech; Statistics; Support vector machine classification; Support vector machines;

fLanguage

English

Publisher

ieee

Conference_Titel

Multimedia and Expo, 2004. ICME '04. 2004 IEEE International Conference on

Print_ISBN

0-7803-8603-5

Type

conf

DOI

10.1109/ICME.2004.1394647

Filename

1394647