DocumentCode :
1692960
Title :
Personalized natural speech synthesis based on retrieval of pitch patterns using hierarchical Fujisaki model
Author :
Yi-Chin Huang ; Chung-Hsien Wu ; Shih-Lun Lin
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng-Kung Univ., Tainan, Taiwan
fYear :
2013
Firstpage :
7844
Lastpage :
7848
Abstract :
In recent years, speech synthesis based on Hidden Markov Model (HMM) has been developed, which can synthesize stable and intelligible speech with flexibility and small footprint. However, synthesized prosodic features are still incapable to convey personalization and natural property. Previous prosody models, mainly constructed from the clustered prosodic features, are unable to characterize personalized prosodic information as the linguistic cues of the input sentence are indistinguishable for all speakers. An approach to retrieval of personalized pitch patterns from the real speech corpus of the target speaker, is proposed, incorporating with the HMM-based speech synthesizer, to generate a personalized natural pitch contour. The modified Fujisaki model is adopted to depict the hierarchical pitch patterns, aiming to model local pitch contour variation and global intonation of utterances in the corpus. The codeword sequences of utterances in the training and the synthesized corpora are constructed and used to obtain the relationship of pitch patterns between the real and synthesized speech. Finally, a language model of pitch pattern is constructed to obtain an optimal pitch pattern sequence of the input sentence. The experimental results using subjective and objective evaluations demonstrated the proposed approach can substantially outperform the conventional statistical synthesis methods, in terms of naturalness and speaker similarity.
Keywords :
hidden Markov models; linguistics; speech intelligibility; speech synthesis; statistical analysis; HMM-based speech synthesizer; codeword sequence; global intonation; hidden Markov model; hierarchical Fujisaki model; hierarchical pitch pattern retrieval; linguistic cues; local pitch contour variation; personalized natural speech synthesis; pitch pattern language model; prosodic feature synthesis; statistical synthesis method; synthesized corpora; Acoustics; Hidden Markov models; Speech; Speech synthesis; Splines (mathematics); Training; Fujisaki Model; Hierarchical Prosodic Structure; Pattern Retrieval; Personalized Speech Synthesis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639191
Filename :
6639191
Link To Document :
بازگشت