Title :
A novel hybrid mandarin speech synthesis system using different base units for model training and concatenation
Author :
Ran Zhang ; Jianhua Tao ; Ya Li ; Zhengqi Wen
Author_Institution :
Nat. Lab. of Pattern Recognition, Inst. of Autom., Beijing, China
Abstract :
The hybrid speech synthesis system, which uses the acoustic model trained according to the criterion of Maximum Likelihood to select the proper candidates from the corpus, has become a hot topic in recent days. For this hybrid system, the performance is affected by the size of the base training unit and the base candidate unit. Most of existed hybrid systems use the same kind of base unit such as syllable or phone for both model training and concatenation. In Mandarin, initials and finals form the fundamental elements of pronunciation, and are always chosen as the base training unit for statistical parametric TTS system. In this paper a new hybrid Mandarin TTS system is proposed, which uses initial/final for model training and syllable for concatenation. Objective and subjective evaluations are conducted and the comparison results show that the hybrid system we proposed outperforms the traditional systems which use the same base unit for both processes with 4000 and 6000 sentences´ corpus.
Keywords :
hidden Markov models; maximum likelihood estimation; speech synthesis; acoustic model; concatenation; hybrid speech synthesis system; maximum likelihood; model training; Acoustics; Conferences; Hidden Markov models; Speech; Speech synthesis; Training; HMM; Mandarin speech synthesis; hybrid speech synthesis system; syllable;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6853605