DocumentCode :
3462683
Title :
Emotional speech characterization based on multi-features fusion for face-to-face interaction
Author :
Mahdhaoui, Ammar ; Ringeval, Fabien ; Chetouani, Mohamed
Author_Institution :
Inst. des Syst. Intelligents et de Robot., Univ. Pierre et Marie Curie, Paris, France
fYear :
2009
fDate :
6-8 Nov. 2009
Firstpage :
1
Lastpage :
6
Abstract :
Speech contains non verbal elements known as paralanguage, including voice quality, emotion and speaking style, as well as prosodic features such as rhythm, intonation and stress. The study of nonverbal communication has focused on face-to-face interaction since that the behaviors of communicators play a major role during social interaction and carry information between the different speakers. In this paper, we describe a computational framework for combining different features for emotional speech detection. The statistical fusion is based on the estimation of local a posteriori class probabilities and the overall decision employs weighting factors directly related to the duration of the individual speech segments. This strategy is applied to a real-life application: detection of motherese in authentic and longitudinal parent-infant interaction at home. The results suggest that short- and long-term information provide a robust and efficient time-scale analysis. A similar fusion methodology is also investigated by the use of a phonetic-specific characterization process. This strategy is motivated by the fact that there are variations across emotional states at the phoneme level. A time-scale based on both vowels and consonants is proposed and it provides a relevant discriminant feature space for acted emotion recognition.
Keywords :
emotion recognition; face recognition; speech recognition; acted emotion recognition; computational framework; discriminant feature space; emotional speech detection; face-to-face interaction; multifeatures fusion; nonverbal communication; phoneme level; phonetic-specific characterization process; social interaction; speech segments; statistical fusion; time-scale analysis; Automatic speech recognition; Biology computing; Emotion recognition; Feature extraction; Humans; Intelligent robots; Mel frequency cepstral coefficient; Speech analysis; Speech processing; Speech synthesis; data-driven approach; emotional speech; face-to-face interaction; feature extraction; statistical fusion; timescales analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signals, Circuits and Systems (SCS), 2009 3rd International Conference on
Conference_Location :
Medenine
Print_ISBN :
978-1-4244-4397-0
Electronic_ISBN :
978-1-4244-4398-7
Type :
conf
DOI :
10.1109/ICSCS.2009.5412691
Filename :
5412691
Link To Document :
بازگشت