DocumentCode
566367
Title
Comparison between decision-level and feature-level fusion of acoustic and linguistic features for spontaneous emotion recognition
Author
Planet, Santiago ; Iriondo, Ignasi
Author_Institution
GTM - Grup de Recerca en Tecnologies Media, La Salle - Univ. Ramon Llull, Barcelona, Spain
fYear
2012
fDate
20-23 June 2012
Firstpage
1
Lastpage
6
Abstract
Detection of affective states in speech could improve the way users interact with electronic devices. However the analysis of speech at the acoustic level could be not enough to determine the emotion of a user speaking in a realistic scenario. In this paper we analysed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic and linguistic levels to extract two sets of acoustic and linguistic features. The acoustic set was reduced by a greedy procedure selecting the most relevant features to optimize the learning stage. We experimented with three classification approaches: Naïve-Bayes, a support vector machine and a logistic model tree, and two fusion schemes: decision-level fusion, merging the hard-decisions of the acoustic and linguistic classifiers by means of a decision tree; and feature-level fusion, concatenating both sets of features before the learning stage. Despite the low performance achieved by the linguistic data, a dramatic improvement was achieved after its combination with the acoustic information, improving the results achieved by this second modality on its own. The results achieved by the classifiers using the parameters merged at feature level outperformed the classification results of the decision-level fusion scheme, despite the simplicity of the scheme.
Keywords
Bayes methods; decision trees; emotion recognition; feature extraction; greedy algorithms; human computer interaction; learning (artificial intelligence); sensor fusion; signal classification; speech recognition; support vector machines; FAU Aibo Corpus; Naïve-Bayes classification; acoustic feature extraction; affective state detection; decision tree; decision-level fusion; electronic devices; feature-level fusion; greedy procedure; human-computer interaction; learning stage optimization; linguistic feature extraction; logistic model tree classification; spontaneous emotion recognition; spontaneous speech recordings; support vector machine classification; Acoustics; Classification algorithms; Logistics; Pragmatics; Speech; Support vector machines; Training; Emotion recognition; acoustic features; decision-level fusion; feature-level fusion; linguistic features; spontaneous speech;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Systems and Technologies (CISTI), 2012 7th Iberian Conference on
Conference_Location
Madrid
ISSN
2166-0727
Print_ISBN
978-1-4673-2843-2
Type
conf
Filename
6263129
Link To Document