Title :
Perceptual evaluation of cost for segment selection in concatenative speech synthesis
Author :
Toda, Tomoki ; Kawai, Hisashi ; Tsuzaki, Minoru ; Shikano, Kiyohiro
Author_Institution :
ATR Spoken Language Translation Res. Labs., Kyoto, Japan
Abstract :
In segment selection for concatenative text-to-speech (TTS), it is important to utilize a cost that corresponds to the perceptual characteristics. We clarify correspondence to the perceptual scores of the cost, and then various functions to integrate the costs are evaluated. The perceptual scores are determined from results of perceptual experiments on the naturalness of synthetic speech. The results show that the average cost, which shows the naturalness degradation over the entire synthetic speech has better correspondence to the perceptual scores than the maximum cost, which shows the local naturalness degradation. Furthermore, RMS (root mean square) cost, which is affected by both the average cost and the maximum cost, has the best correspondence.
Keywords :
speech processing; speech synthesis; RMS cost; TTS synthesis; average cost; concatenative speech synthesis; maximum cost; naturalness degradation; perceptual cost evaluation; root mean square cost; segment selection; text-to-speech synthesis; Acoustic measurements; Cost function; Degradation; Information science; Laboratories; Natural languages; Process design; Robustness; Root mean square; Speech synthesis;
Conference_Titel :
Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
Print_ISBN :
0-7803-7395-2
DOI :
10.1109/WSS.2002.1224404