DocumentCode :
1937655
Title :
Perceptual evaluation of cost for segment selection in concatenative speech synthesis
Author :
Toda, Tomoki ; Kawai, Hisashi ; Tsuzaki, Minoru ; Shikano, Kiyohiro
Author_Institution :
ATR Spoken Language Translation Res. Labs., Kyoto, Japan
fYear :
2002
fDate :
11-13 Sept. 2002
Firstpage :
183
Lastpage :
186
Abstract :
In segment selection for concatenative text-to-speech (TTS), it is important to utilize a cost that corresponds to the perceptual characteristics. We clarify correspondence to the perceptual scores of the cost, and then various functions to integrate the costs are evaluated. The perceptual scores are determined from results of perceptual experiments on the naturalness of synthetic speech. The results show that the average cost, which shows the naturalness degradation over the entire synthetic speech has better correspondence to the perceptual scores than the maximum cost, which shows the local naturalness degradation. Furthermore, RMS (root mean square) cost, which is affected by both the average cost and the maximum cost, has the best correspondence.
Keywords :
speech processing; speech synthesis; RMS cost; TTS synthesis; average cost; concatenative speech synthesis; maximum cost; naturalness degradation; perceptual cost evaluation; root mean square cost; segment selection; text-to-speech synthesis; Acoustic measurements; Cost function; Degradation; Information science; Laboratories; Natural languages; Process design; Robustness; Root mean square; Speech synthesis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
Print_ISBN :
0-7803-7395-2
Type :
conf
DOI :
10.1109/WSS.2002.1224404
Filename :
1224404
Link To Document :
بازگشت