DocumentCode
1117932
Title
Globally Optimal Training of Unit Boundaries in Unit Selection Text-to-Speech Synthesis
Author
Bellegarda, Jerome R.
Author_Institution
Speech & Language Technol., Apple Comput. Inc., Cupertino, CA
Volume
15
Issue
3
fYear
2007
fDate
3/1/2007 12:00:00 AM
Firstpage
957
Lastpage
965
Abstract
The level of quality that can be achieved by modern concatenative text-to-speech synthesis heavily depends on a judicious composition of the unit inventory used in the unit selection process. Unit boundary optimization, in particular, can make a huge difference in the users´ perception of the concatenated acoustic waveform. This paper considers the iterative refinement of unit boundaries based on a data-driven feature extraction framework separately optimized for each boundary region. This guarantees a globally optimal cut point between any two matching units in the underlying inventory. The associated boundary training procedure is objectively characterized, first in terms of convergence behavior, and then by comparing the distributions in inter-unit discontinuity obtained before and after training. Experimental results underscore the viability of this approach for unit boundary optimization. Listening evidence also qualitatively exemplifies a noticeable reduction in the perception of discontinuity between concatenated acoustic units
Keywords
iterative methods; optimisation; speech synthesis; concatenated acoustic waveform; concatenative text-to-speech synthesis; data-driven feature extraction; globally optimal cut point; globally optimal training; inter-unit discontinuity; iterative refinement; unit boundary optimization; unit inventory; unit selection text-to-speech synthesis; Acoustic waves; Assembly; Concatenated codes; Convergence; Feature extraction; Hidden Markov models; Loudspeakers; Natural languages; Signal processing algorithms; Speech synthesis; boundary optimization; discontinuity perception; segment concatenation; text-to-speech (TTS) synthesis; unit selection;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2006.881675
Filename
4100664
Link To Document