Title :
Combining speech recognition and acoustic word emotion models for robust text-independent emotion recognition
Author :
Schuller, Bjö Rn ; Vlasenko, Bogdan ; Arsic, Dejan ; Rigoll, Gerhard ; Wendemuth, Andreas
Author_Institution :
Inst. for Human-Machine Commun., Tech. Univ. Munchen, Munich
fDate :
June 23 2008-April 26 2008
Abstract :
Recognition of emotion in speech usually uses acoustic models that ignore the spoken content. Likewise one general model per emotion is trained independent of the phonetic structure. Given sufficient data, this approach seemingly works well enough. Yet, this paper tries to answer the question whether acoustic emotion recognition strongly depends on phonetic content, and if models tailored for the spoken unit can lead to higher accuracies. We therefore investigate phoneme-, and word-models by use of a large prosodic, spectral, and voice quality feature space and Support Vector Machines (SVM). Experiments also take the necessity of ASR into account to select appropriate unit- models. Test-runs on the well-known EMO-DB database facing speaker-independence demonstrate superiority of word emotion models over today´s common general models provided sufficient occurrences in the training corpus.
Keywords :
emotion recognition; speech recognition; EMO-DB database facing speaker-independence; acoustic word emotion models; phonetic structure; robust text-independent emotion recognition; speech recognition; support vector machines; Acoustic testing; Automatic speech recognition; Cepstral analysis; Emotion recognition; Hidden Markov models; Man machine systems; Robustness; Spatial databases; Speech recognition; Support vector machines; Acoustic Modeling; Affective Speech; Emotion Recognition; Word Models;
Conference_Titel :
Multimedia and Expo, 2008 IEEE International Conference on
Conference_Location :
Hannover
Print_ISBN :
978-1-4244-2570-9
Electronic_ISBN :
978-1-4244-2571-6
DOI :
10.1109/ICME.2008.4607689