DocumentCode
1937655
Title
Perceptual evaluation of cost for segment selection in concatenative speech synthesis
Author
Toda, Tomoki ; Kawai, Hisashi ; Tsuzaki, Minoru ; Shikano, Kiyohiro
Author_Institution
ATR Spoken Language Translation Res. Labs., Kyoto, Japan
fYear
2002
fDate
11-13 Sept. 2002
Firstpage
183
Lastpage
186
Abstract
In segment selection for concatenative text-to-speech (TTS), it is important to utilize a cost that corresponds to the perceptual characteristics. We clarify correspondence to the perceptual scores of the cost, and then various functions to integrate the costs are evaluated. The perceptual scores are determined from results of perceptual experiments on the naturalness of synthetic speech. The results show that the average cost, which shows the naturalness degradation over the entire synthetic speech has better correspondence to the perceptual scores than the maximum cost, which shows the local naturalness degradation. Furthermore, RMS (root mean square) cost, which is affected by both the average cost and the maximum cost, has the best correspondence.
Keywords
speech processing; speech synthesis; RMS cost; TTS synthesis; average cost; concatenative speech synthesis; maximum cost; naturalness degradation; perceptual cost evaluation; root mean square cost; segment selection; text-to-speech synthesis; Acoustic measurements; Cost function; Degradation; Information science; Laboratories; Natural languages; Process design; Robustness; Root mean square; Speech synthesis;
fLanguage
English
Publisher
ieee
Conference_Titel
Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
Print_ISBN
0-7803-7395-2
Type
conf
DOI
10.1109/WSS.2002.1224404
Filename
1224404
Link To Document