• DocumentCode
    1937655
  • Title

    Perceptual evaluation of cost for segment selection in concatenative speech synthesis

  • Author

    Toda, Tomoki ; Kawai, Hisashi ; Tsuzaki, Minoru ; Shikano, Kiyohiro

  • Author_Institution
    ATR Spoken Language Translation Res. Labs., Kyoto, Japan
  • fYear
    2002
  • fDate
    11-13 Sept. 2002
  • Firstpage
    183
  • Lastpage
    186
  • Abstract
    In segment selection for concatenative text-to-speech (TTS), it is important to utilize a cost that corresponds to the perceptual characteristics. We clarify correspondence to the perceptual scores of the cost, and then various functions to integrate the costs are evaluated. The perceptual scores are determined from results of perceptual experiments on the naturalness of synthetic speech. The results show that the average cost, which shows the naturalness degradation over the entire synthetic speech has better correspondence to the perceptual scores than the maximum cost, which shows the local naturalness degradation. Furthermore, RMS (root mean square) cost, which is affected by both the average cost and the maximum cost, has the best correspondence.
  • Keywords
    speech processing; speech synthesis; RMS cost; TTS synthesis; average cost; concatenative speech synthesis; maximum cost; naturalness degradation; perceptual cost evaluation; root mean square cost; segment selection; text-to-speech synthesis; Acoustic measurements; Cost function; Degradation; Information science; Laboratories; Natural languages; Process design; Robustness; Root mean square; Speech synthesis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
  • Print_ISBN
    0-7803-7395-2
  • Type

    conf

  • DOI
    10.1109/WSS.2002.1224404
  • Filename
    1224404