مرکز منطقه ای اطلاع رساني علوم و فناوري - Personalized natural speech synthesis based on retrieval of pitch patterns using hierarchical Fujisaki model

DocumentCode :

1692960

Title :

Personalized natural speech synthesis based on retrieval of pitch patterns using hierarchical Fujisaki model

Author :

Yi-Chin Huang ; Chung-Hsien Wu ; Shih-Lun Lin

Author_Institution :

Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng-Kung Univ., Tainan, Taiwan

fYear :

2013

Firstpage :

7844

Lastpage :

7848

Abstract :

In recent years, speech synthesis based on Hidden Markov Model (HMM) has been developed, which can synthesize stable and intelligible speech with flexibility and small footprint. However, synthesized prosodic features are still incapable to convey personalization and natural property. Previous prosody models, mainly constructed from the clustered prosodic features, are unable to characterize personalized prosodic information as the linguistic cues of the input sentence are indistinguishable for all speakers. An approach to retrieval of personalized pitch patterns from the real speech corpus of the target speaker, is proposed, incorporating with the HMM-based speech synthesizer, to generate a personalized natural pitch contour. The modified Fujisaki model is adopted to depict the hierarchical pitch patterns, aiming to model local pitch contour variation and global intonation of utterances in the corpus. The codeword sequences of utterances in the training and the synthesized corpora are constructed and used to obtain the relationship of pitch patterns between the real and synthesized speech. Finally, a language model of pitch pattern is constructed to obtain an optimal pitch pattern sequence of the input sentence. The experimental results using subjective and objective evaluations demonstrated the proposed approach can substantially outperform the conventional statistical synthesis methods, in terms of naturalness and speaker similarity.

Keywords :

hidden Markov models; linguistics; speech intelligibility; speech synthesis; statistical analysis; HMM-based speech synthesizer; codeword sequence; global intonation; hidden Markov model; hierarchical Fujisaki model; hierarchical pitch pattern retrieval; linguistic cues; local pitch contour variation; personalized natural speech synthesis; pitch pattern language model; prosodic feature synthesis; statistical synthesis method; synthesized corpora; Acoustics; Hidden Markov models; Speech; Speech synthesis; Splines (mathematics); Training; Fujisaki Model; Hierarchical Prosodic Structure; Pattern Retrieval; Personalized Speech Synthesis;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6639191

Filename :

6639191

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1692960