DocumentCode :
118311
Title :
A method for emotional speech synthesis based on the position of emotional state in Valence-Activation space
Author :
Hamada, Yasuhiro ; Elbarougy, Reda ; Akagi, Masato
Author_Institution :
Acoust. Inf. Sci. Lab., Japan Adv. Inst. of Sci. & Technol., Nomi, Japan
fYear :
2014
fDate :
9-12 Dec. 2014
Firstpage :
1
Lastpage :
7
Abstract :
Speech to Speech translation (S2ST) systems are very important for processing by which a spoken utterance in one language is used to produce a spoken output in another language. In S2ST techniques, so far, linguistic information has been mainly adopted without para- and non-linguistic information (emotion, individuality and gender, etc.). Therefore, this systems have a limitation in synthesizing affective speech, for example emotional speech, instead of neutral one. To deal with affective speech, a system that can recognize and synthesize emotional speech is required. Although most studies focused on emotions categorically, emotional styles are not categorical but continuously spread in emotion space that are spanned by two dimensions (Valence and Activation). This paper proposes a method for synthesizing emotional speech based on the positions in Valence-Activation (V-A) space. In order to model relationships between acoustic features and V-A space, Fuzzy Inference Systems (FISs) were constructed. Twenty-one acoustic features were morphed using FISs. To verify whether synthesized speech can be perceived as the same intended position in V-A space, listening tests were carried out. The results indicate that the synthesized speech can give the same impression in the V-A space as the intended speech does.
Keywords :
fuzzy reasoning; speech processing; speech recognition; speech synthesis; FIS; S2ST system; V-A space; emotional speech synthesis; fuzzy inference system; linguistic information; nonlinguistic information; paralinguistic information; speech recognition; speech to speech translation system; spoken utterance; valence-activation space; Acoustics; Databases; Equations; Feature extraction; Frequency measurement; Mathematical model; Speech;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
Conference_Location :
Siem Reap
Type :
conf
DOI :
10.1109/APSIPA.2014.7041729
Filename :
7041729
Link To Document :
بازگشت