DocumentCode :
118041
Title :
Automatic Emotion Variation Detection in continuous speech
Author :
Yuchao Fan ; Mingxing Xu ; Zhiyong Wu ; Lianhong Cai
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fYear :
2014
fDate :
9-12 Dec. 2014
Firstpage :
1
Lastpage :
5
Abstract :
Though emotion speech recognition has gained increasing interest in the field of Human Computer Interaction, it is still a challenge to automatically determine the emotion state type and the boundaries of each emotionally salient segment in continuous speech, which is named as Automatic Emotion Variation Detection (AEVD). In this task, the input utterances are not pre-segmented and may contain emotion variations. This paper proposes a Multi-timescaled Sliding Window based AEVD (MSW-AEVD). Firstly, a sliding window with fixed-length is employed to segment continuous speech for classic emotion recognition. An emotion type is assigned to each window-shift according to the recognition results of all the sliding windows containing that window-shift. Then this basic procedure is extended to multi-timescaled sliding window, in which several different features are utilized for different scales. Finally, a post-processing is employed to refine the final outputs. In this work, we focus on anger-neutral and happiness-neutral cases, which are mostly dominant in recent studies of AEVD. Performance evaluation is carried out across two databases, including German database EMO-DB and Chinese database TH1309-DB. Experimental results show that the proposed method outperforms HMM-based baseline significantly.
Keywords :
audio databases; emotion recognition; hidden Markov models; human computer interaction; signal detection; speech recognition; Chinese database TH1309-DB; German database EMO-DB; HMM-based baseline; MSW-AEVD; anger-neutral case; automatic emotion variation detection; continuous speech segmentation; emotion speech recognition; emotion state type; emotionally salient segment; happiness-neutral case; human computer interaction; multitimescaled sliding window based AEVD; window-shift; Databases; Emotion recognition; Feature extraction; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
Conference_Location :
Siem Reap
Type :
conf
DOI :
10.1109/APSIPA.2014.7041592
Filename :
7041592
Link To Document :
بازگشت