مرکز منطقه ای اطلاع رساني علوم و فناوري - Automatic Emotion Variation Detection in continuous speech

DocumentCode :

118041

Title :

Automatic Emotion Variation Detection in continuous speech

Author :

Yuchao Fan ; Mingxing Xu ; Zhiyong Wu ; Lianhong Cai

Author_Institution :

Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China

fYear :

2014

fDate :

9-12 Dec. 2014

Firstpage :

Lastpage :

Abstract :

Though emotion speech recognition has gained increasing interest in the field of Human Computer Interaction, it is still a challenge to automatically determine the emotion state type and the boundaries of each emotionally salient segment in continuous speech, which is named as Automatic Emotion Variation Detection (AEVD). In this task, the input utterances are not pre-segmented and may contain emotion variations. This paper proposes a Multi-timescaled Sliding Window based AEVD (MSW-AEVD). Firstly, a sliding window with fixed-length is employed to segment continuous speech for classic emotion recognition. An emotion type is assigned to each window-shift according to the recognition results of all the sliding windows containing that window-shift. Then this basic procedure is extended to multi-timescaled sliding window, in which several different features are utilized for different scales. Finally, a post-processing is employed to refine the final outputs. In this work, we focus on anger-neutral and happiness-neutral cases, which are mostly dominant in recent studies of AEVD. Performance evaluation is carried out across two databases, including German database EMO-DB and Chinese database TH1309-DB. Experimental results show that the proposed method outperforms HMM-based baseline significantly.

Keywords :

audio databases; emotion recognition; hidden Markov models; human computer interaction; signal detection; speech recognition; Chinese database TH1309-DB; German database EMO-DB; HMM-based baseline; MSW-AEVD; anger-neutral case; automatic emotion variation detection; continuous speech segmentation; emotion speech recognition; emotion state type; emotionally salient segment; happiness-neutral case; human computer interaction; multitimescaled sliding window based AEVD; window-shift; Databases; Emotion recognition; Feature extraction; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)

Conference_Location :

Siem Reap

Type :

conf

DOI :

10.1109/APSIPA.2014.7041592

Filename :

7041592

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=118041