DocumentCode :
118095
Title :
Tuning intonation with pitch accent decomposition for HMM-based expressive speech synthesis
Author :
Jinfu Ni ; Shiga, Yoshinori ; Hori, Chiori
Author_Institution :
Spoken Language Commun. Lab., Universal Commun. Res. Inst., Kyoto, Japan
fYear :
2014
fDate :
9-12 Dec. 2014
Firstpage :
1
Lastpage :
10
Abstract :
Expressive intonation makes focal prominence to give emphases that highlight the focus of speech. This paper describes a method for improving the expressiveness of HMM-based voices, particularly putting focal prominence on a word. Different from previous methods, our method exploits a speech corpus available for model training, without needing to record additional emphasis speech. This method employs a functional Fq model to decompose the pitch accents of utterances into components of lexical accent and pitch register. The two components are anchored by a limited number of target points to establish the topological relations between prosodie and linguistic features of the utterances. The F0 model is further used to adjust the gradient prominence levels of pitch accents to make focal prominence under the constraint of the topological relations. In this way, the demand of recording emphasis speech samples is significantly reduced. Moreover, the emphases with focal prominence can be contextually labeled for training context-dependent models. Experiments are conducted on a neutral speech corpus in Japanese, particularly on expansion of the local pitch range of nuclear pitch accents (the most prominent accents) of utterances. The results demonstrated that the proposed method gracefully put focal prominence on specific words while keeping a high degree of the naturalness in synthetic speech.
Keywords :
hidden Markov models; speech synthesis; HMM-based expressive speech synthesis; HMM-based voices; Japanese; gradient prominence levels; lexical accent; neutral speech corpus; pitch accent decomposition; pitch register; synthetic speech; training context-dependent models; tuning intonation; Context modeling; Hidden Markov models; Registers; Smoothing methods; Speech; Training; Tuning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
Conference_Location :
Siem Reap
Type :
conf
DOI :
10.1109/APSIPA.2014.7041616
Filename :
7041616
Link To Document :
بازگشت