مرکز منطقه ای اطلاع رساني علوم و فناوري - A method for emotional speech synthesis based on the position of emotional state in Valence-Activation space

DocumentCode :

118311

Title :

A method for emotional speech synthesis based on the position of emotional state in Valence-Activation space

Author :

Hamada, Yasuhiro ; Elbarougy, Reda ; Akagi, Masato

Author_Institution :

Acoust. Inf. Sci. Lab., Japan Adv. Inst. of Sci. & Technol., Nomi, Japan

fYear :

2014

fDate :

9-12 Dec. 2014

Firstpage :

Lastpage :

Abstract :

Speech to Speech translation (S2ST) systems are very important for processing by which a spoken utterance in one language is used to produce a spoken output in another language. In S2ST techniques, so far, linguistic information has been mainly adopted without para- and non-linguistic information (emotion, individuality and gender, etc.). Therefore, this systems have a limitation in synthesizing affective speech, for example emotional speech, instead of neutral one. To deal with affective speech, a system that can recognize and synthesize emotional speech is required. Although most studies focused on emotions categorically, emotional styles are not categorical but continuously spread in emotion space that are spanned by two dimensions (Valence and Activation). This paper proposes a method for synthesizing emotional speech based on the positions in Valence-Activation (V-A) space. In order to model relationships between acoustic features and V-A space, Fuzzy Inference Systems (FISs) were constructed. Twenty-one acoustic features were morphed using FISs. To verify whether synthesized speech can be perceived as the same intended position in V-A space, listening tests were carried out. The results indicate that the synthesized speech can give the same impression in the V-A space as the intended speech does.

Keywords :

fuzzy reasoning; speech processing; speech recognition; speech synthesis; FIS; S2ST system; V-A space; emotional speech synthesis; fuzzy inference system; linguistic information; nonlinguistic information; paralinguistic information; speech recognition; speech to speech translation system; spoken utterance; valence-activation space; Acoustics; Databases; Equations; Feature extraction; Frequency measurement; Mathematical model; Speech;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)

Conference_Location :

Siem Reap

Type :

conf

DOI :

10.1109/APSIPA.2014.7041729

Filename :

7041729

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=118311