Title :
Speech-rate-variable HMM-based Japanese TTS system
Author :
Iwano, K. ; Yamada, Makoto ; Togawa, T. ; Furui, S.
Author_Institution :
Tokyo Institute of Technology
Abstract :
This paper proposes a new method for controlling phoneme duration according to arbitrary target speech rate in speech synthesis (TTS, text-to-speech) systems. The proposed method first constructs three fundamental duration models at "fast", "normal", and "slow" speech rates using Hayashi\´s quantification theory (type 1) based on real speech databases and creates a duration model according to a target speech rate by interpolating the fundamental models. Our TTS system uses an HMM-based synthesizer which can achieve flexible prosody control. Various speech synthesized by the proposed method is evaluated by subjective experiments at four speech rates using pair comparison tests between the proposed method and a rule-based method. The results show that the proposed method achieves higher naturalness in synthesized speech than the rule-based method.
Keywords :
hidden Markov models; natural languages; speech processing; speech synthesis; HMM-based synthesizer; Hayashi quantification theory; Japanese language; TTS; arbitrary target speech rate; duration models; flexible prosody control; interpolation; naturalness; phoneme duration control; real speech databases; speech synthesis; text-to-speech systems; Aging; Computer science; Control system synthesis; Databases; Hidden Markov models; Speech analysis; Speech synthesis; Synthesizers; Testing; Vocoders;
Conference_Titel :
Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
Print_ISBN :
0-7803-7395-2
DOI :
10.1109/WSS.2002.1224413