Title :
On Acoustic Emotion Recognition: Compensating for Covariate Shift
Author :
Hassan, Asif ; Damper, R. ; Niranjan, Mahesan
Author_Institution :
Electron. & Comput. Sci, Univ. of Southampton, Southampton, UK
Abstract :
Pattern recognition tasks often face the situation that training data are not fully representative of test data. This problem is well-recognized in speech recognition, where methods like cepstral mean normalization (CMN), vocal tract length normalization (VTLN) and maximum likelihood linear regression (MLLR) are used to compensate for channel and speaker differences. Speech emotion recognition (SER) is an important emerging field in human-computer interaction and faces the same data shift problems, a fact which has been generally overlooked in this domain. In this paper, we show that compensating for channel and speaker differences can give significant improvements in SER by modelling these differences as a covariate shift. We employ three algorithms from the domain of transfer learning that apply importance weights (IWs) within a support vector machine classifier to reduce the effects of covariate shift. We test these methods on the FAU Aibo Emotion Corpus, which was used in the Interspeech 2009 Emotion Challenge. It consists of two separate parts recorded independently at different schools; hence the two parts exhibit covariate shift. Results show that the IW methods outperform combined CMN and VTLN and significantly improve on the baseline performance of the Challenge. The best of the three methods also improves significantly on the winning contribution to the Challenge.
Keywords :
emotion recognition; human computer interaction; learning (artificial intelligence); maximum likelihood estimation; regression analysis; speech recognition; support vector machines; CMN; FAU Aibo Emotion Corpus; IW; MLLR; SER; VTLN; acoustic emotion recognition; cepstral mean normalization; covariate shift compensation; data shift problems; human-computer interaction; importance weights; maximum likelihood linear regression; pattern recognition; speech emotion recognition; support vector machine classifier; test data representation; transfer learning; vocal tract length normalization; Emotion recognition; covariate shift; speaker and environment differences; transfer learning;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2013.2255278