DocumentCode :
3246516
Title :
Fundamental frequency modeling for corpus-based speech synthesis based on a statistical learning technique
Author :
Sakai, Shinsuke ; Glass, James
Author_Institution :
Comput. Sci. & Artificial Intelligence Lab., MIT, Cambridge, MA, USA
fYear :
2003
fDate :
30 Nov.-3 Dec. 2003
Firstpage :
712
Lastpage :
717
Abstract :
The paper proposes a novel two-layer approach to fundamental frequency modeling for concatenative speech synthesis based on a statistical learning technique called additive models. We define an additive F0 contour model consisting of long-term (intonational phrase-level) component and short-term (accentual phrase-level) component, along with a least-squares error criterion that includes a regularization term. A backfitting algorithm, that is derived from this error criterion, estimates both components simultaneously by iteratively applying cubic spline smoothers. When this method is applied to a 7,000 utterance Japanese speech corpus, it achieves F0 RMS errors of 28.9 and 29.8 Hz on the training and test data, respectively, with corresponding correlation coefficients of 0.81 and 0.77. The automatically determined intonational and accentual phrase components behave smoothly, systematically, and intuitively under a variety of prosodic conditions.
Keywords :
error statistics; iterative methods; learning (artificial intelligence); least squares approximations; modelling; parameter estimation; smoothing methods; speech synthesis; splines (mathematics); statistical analysis; Japanese speech corpus; accentual phrase-level component; additive F0 contour model; additive models; backfitting algorithm; concatenative speech synthesis; corpus-based speech synthesis; cubic spline smoothers; fundamental frequency modeling; intonational phrase-level component; least-squares error criterion; prosodic conditions; statistical learning; Artificial intelligence; Computer science; Costs; Frequency synthesizers; Glass; Iterative algorithms; Laboratories; Regression tree analysis; Speech synthesis; Statistical learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
Print_ISBN :
0-7803-7980-2
Type :
conf
DOI :
10.1109/ASRU.2003.1318527
Filename :
1318527
Link To Document :
بازگشت