Recognition for synthesis: Automatic parameter selection for resynthesis of emotional speech from neutral speech

Author

Bulut, Murtaza ; Lee, Sungbok ; Narayanan, Shrikanth

Author_Institution

Dept. of Electr. Eng., Southern California Univ., Los Angeles, CA

fYear

2008

fDate

March 31 2008-April 4 2008

Firstpage

4629

Lastpage

4632

Abstract

One of the biggest challenges in emotional speech resynthesis is the selection of modification parameters that will make humans perceive a targeted emotion. The best selection method is by using human raters. However, for large evaluation sets this process can be very costly. In this paper, we describe a recognition for synthesis (RFS) system to automatically select a set of possible parameter values that can be used to resynthesize emotional speech. The system, developed with supervised training, consists of synthesis (TD-PSOLA), recognition (neural network) and parameter selection modules. The experimental results show evidence that the parameter sets selected by the RFS system can be successfully used to resynthesize the input neutral speech as angry speech, demonstrating that the RFS system can assist in the human evaluation of emotional speech.

Keywords

emotion recognition; neural nets; speech recognition; speech synthesis; TD-PSOLA; automatic parameter selection; emotional speech synthesis; neural network; neutral speech; speech recognition; supervised training; Automatic speech recognition; Costs; Emotion recognition; Humans; Network synthesis; Neural networks; Performance evaluation; Speech analysis; Speech synthesis; Testing; automatic evaluation; emotion resynthesis; neural network; recognition for synthesis;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on

Conference_Location

Las Vegas, NV

ISSN

1520-6149

Print_ISBN

978-1-4244-1483-3

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2008.4518688

Filename

4518688