مرکز منطقه ای اطلاع رساني علوم و فناوري - Emotion recognition using synthetic speech as neutral reference

DocumentCode :

730746

Title :

Emotion recognition using synthetic speech as neutral reference

Author :

Lotfian, Reza ; Busso, Carlos

Author_Institution :

Dept. of Electr. Eng., Univ. of Texas at Dallas, Richardson, TX, USA

fYear :

2015

fDate :

19-24 April 2015

Firstpage :

4759

Lastpage :

4763

Abstract :

A common approach to recognize emotion from speech is to estimate multiple acoustic features at sentence or turn level. These features are derived independent of the underlying lexical content. Studies have demonstrated that lexical dependent models improve emotion recognition accuracy. However, current practical approaches can only model small lexical units like phonemes, syllables or few key words, which limits these systems. We believe that building longer lexical models (i.e., sentence level model) is feasible by leveraging the advances in speech synthesis. Assuming that the transcript of the target speech is available, we synthesize speech conveying the same lexical information. The synthetic speech is used as a neutral reference model to contrast different acoustic features, unveiling local emotional changes. This paper introduces this novel framework and provides insights on how to compare the target and synthetic speech signals. Our evaluations demonstrate the benefits of synthetic speech as neutral reference to incorporate lexical dependencies in emotion recognition. The experimental results show that adding features derived from contrasting expressive speech with the proposed synthetic speech reference increases the accuracy in 2.1% and 2.8% (absolute) in classifying low versus high levels of arousal and valence, respectively.

Keywords :

emotion recognition; speech synthesis; acoustic feature estimation; emotion recognition accuracy; lexical content; lexical dependent model; neutral reference model; speech synthesis; synthetic speech signals; Acoustics; Emotion recognition; Feature extraction; Hidden Markov models; Speech; Speech recognition; Speech synthesis; emotion detection; speech alignment; speech rate; synthetic speech;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location :

South Brisbane, QLD

Type :

conf

DOI :

10.1109/ICASSP.2015.7178874

Filename :

7178874

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=730746