Lsm-Based Boundary Training for Concatenative Speech Synthesis

Author

Bellegarda, Jerome R.

Author_Institution

Speech & Language Technol., Apple Comput., Inc.

Volume

1

fYear

2006

fDate

14-19 May 2006

Abstract

The level of quality that can be achieved in concatenative text-to-speech synthesis depends, among other things, on a judicious chiseling of the inventory used in unit selection. Unit boundary optimization, in particular, can make a huge difference in the users´ perception of the concatenated acoustic waveform. This paper considers the iterative refinement of unit boundaries based on a data-driven feature extraction framework separately optimized for each boundary region. Such unsupervised boundary training guarantees a globally optimal cut point between any two matching units in the inventory. This optimization is objectively characterized, first in terms of convergence behavior, and then by comparing the average inter-unit discontinuity obtained before and after training. Experimental results and listening evidence both underscore the viability of this approach for unit boundary optimization

Keywords

feature extraction; iterative methods; speech synthesis; LSM-based boundary training; average inter-unit discontinuity; concatenative speech synthesis; data-driven feature extraction; iterative refinement; text-to-speech synthesis; unit boundary optimization; Acoustic waves; Concatenated codes; Convergence; Cost function; Feature extraction; Hidden Markov models; Natural languages; Signal synthesis; Speech analysis; Speech synthesis;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on

Conference_Location

Toulouse

ISSN

1520-6149

Print_ISBN

1-4244-0469-X

Type

conf

DOI

10.1109/ICASSP.2006.1660122

Filename

1660122