• DocumentCode
    1936967
  • Title

    Speech unit selection based on target values driven by speech data in concatenative speech synthesis

  • Author

    Hirai, Toshio ; Tenpaku, Seiichi ; Shikano, Kiyohiro

  • Author_Institution
    Arcadia Inc., Japan
  • fYear
    2002
  • fDate
    11-13 Sept. 2002
  • Firstpage
    43
  • Lastpage
    46
  • Abstract
    In concatenative speech synthesis systems, speech features (prosody, etc.) are estimated as "target values" according to the input text at the start of processing. The target values are used to evaluate each speech unit in the speech database of the system in order to find out a "nearest" unit, which shows the lowest cost, to the value. In conventional systems, the target value is treated as an absolute value to calculate the cost of each unit. In this paper, we propose a new method to drive the target value depending on the features of speech units in the synthesis system within the range in which naturalness and clarity are maintained. This method was applied to fundamental frequency control, which is one of the most important factors in maintaining naturalness of synthesized speech, based on a superpositional model for a 10-name (Japanese place names) synthesis test, and it was found that the cost decreased an average of 22.9%.
  • Keywords
    feature extraction; speech synthesis; Japanese place names; clarity; concatenative speech synthesis; fundamental frequency control; naturalness; prosody; speech data; speech features; speech unit selection; superpositional model; target values; Control system synthesis; Costs; Frequency control; Frequency estimation; Signal processing; Signal synthesis; Spatial databases; Speech analysis; Speech processing; Speech synthesis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
  • Print_ISBN
    0-7803-7395-2
  • Type

    conf

  • DOI
    10.1109/WSS.2002.1224369
  • Filename
    1224369