DocumentCode :
2009432
Title :
Combining HMM spectrum models and ANN prosody models for speech synthesis of syllable prominent languages
Author :
Gu, Hung-Yan ; Lai, Ming-Yen ; Tsai, Sung-Feng
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ. of Sci. & Technol., Taipei, Taiwan
fYear :
2010
fDate :
Nov. 29 2010-Dec. 3 2010
Firstpage :
451
Lastpage :
454
Abstract :
In this paper, an approach that combines HMM spectrum models and ANN prosody models is proposed to construct a speech synthesis system. Currently, a Mandarin corpus is used to show the feasibility of this approach. We hope that this approach can be used in other syllable prominent languages like Min-Nan and Hakka. In the training phase, DCC (discrete cepstrum coefficients) are computed for each frame of the training corpus and used as spectral parameters. Multiple utterances of a syllable are first grouped into a few clusters according to their DTW paths. Then, each cluster´s syllable utterances are used to train an HMM. In the synthesis phase, for each syllable of a sentence, an HMM of the syllable is selected first according to this syllable´s contextual data. Then, a duration ANN and duration means of the HMM states are used to determine how many frames an HMM state should be assigned. To achieve the goal of real-time synthesis, we propose an interpolation method to generate DCC coefficients for each frame. Next, speech signal is synthesized by using the DCC coefficients and the pitch contour generated by another ANN to control an HNM (harmonic plus noised model) based signal synthesizer. The results of perception tests show that our interpolation method obtains slightly more natural synthetic speech than the MLE method. Also, the duration ANN can have more natural synthetic speech than the duration means of HMM states.
Keywords :
hidden Markov models; interpolation; neural nets; spectral analysis; speech synthesis; ANN prosody model; DCC; HMM spectrum model; MLE method; Mandarin corpus; Speech Synthesis; artificial neural network; discrete cepstrum coefficient; harmonic plus noised model; hidden Markov model; interpolation method; natural synthetic speech; perception test; pitch contour; real time synthesis; signal synthesizer; spectral parameter; syllable prominent language; Artificial neural networks; Biological system modeling; Hidden Markov models; Silicon; Speech; Speech synthesis; Training; ANN; HMM; HNM; discrete cepstrum; prosody model; spectrum model; speech synthesis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location :
Tainan
Print_ISBN :
978-1-4244-6244-5
Type :
conf
DOI :
10.1109/ISCSLP.2010.5684485
Filename :
5684485
Link To Document :
بازگشت