DocumentCode :
1843785
Title :
Review of F0 modelling and generation in HMM based speech synthesis
Author :
Kai Yu
Author_Institution :
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
Volume :
1
fYear :
2012
fDate :
21-25 Oct. 2012
Firstpage :
599
Lastpage :
604
Abstract :
Fundamental frequency, or F0, is a critical factor in synthesising speech which is both natural and expressive. In HMM based speech synthesis, the modelling and generation of F0 is one of the key difficult factors which differentiate synthesis from recognition. Firstly, this is because F0 values are normally considered as a discontinuous function of time, whose domain is partly continuous and partly discrete. This results in two issues to be addressed in F0 modelling and generation: voiced/unvoiced decision and F0 trajectory. Another important characteristics of F0 is that it is supra-segmental, which means F0 should be modelled beyond the traditional phoneme level. Thirdly, the purpose of F0 modelling is not only for general high quality synthetic speech, but also for effective delivery of expressiveness. This requires explicitly link F0 modelling to (para/non-) linguistic information so that the control of F0 is easy and feasible. This paper reviews the state-of-the-art frameworks to address these issues. Possible future research directions are also discussed.
Keywords :
hidden Markov models; speech recognition; speech synthesis; F0 generation; F0 modelling; F0 trajectory; HMM; discontinuous function; fundamental frequency; linguistic information; phoneme level; speech recognition; speech synthesis; voiced-unvoiced decision; F0 modelling; HMM based synthesis; statistical speech synthesis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing (ICSP), 2012 IEEE 11th International Conference on
Conference_Location :
Beijing
ISSN :
2164-5221
Print_ISBN :
978-1-4673-2196-9
Type :
conf
DOI :
10.1109/ICoSP.2012.6491559
Filename :
6491559
Link To Document :
بازگشت