Title :
Comparison of two different text-to-speech alignment systems: Speech synthesis based vs. hybrid HMM/ANN
Author :
Deroo, O. ; Malfrere, F. ; Dutoit, T.
Author_Institution :
Dept. of Circuits Theor. & Signal Process., Facult Polytech. de Mons, Mons, Belgium
Abstract :
In this paper we compare two different methods for phonetically labeling a speech database. The first approach is based on the alignment of the speech signal on a high quality synthetic speech pattern, and the second one uses a hybrid HMM/ANN system. Both systems have been evaluated on French read utterances from a speaker never seen in the training stage of the HMM/ANN system and manually segmented. This study outlines the advantages and drawbacks of both methods. The high quality speech synthetic system has the great advantage that no training stage is needed, while the classical HMM/ANN system easily allows multiple phonetic transcriptions. We deduce a method for the automatic constitution of phonetically labeled speech databases based on using the synthetic speech segmentation tool to bootstrap the training process of our hybrid HMM/ANN system. The importance of such segmentation tools will be a key point for the development of improved speech synthesis and recognition systems.
Keywords :
hidden Markov models; neural nets; speech processing; speech recognition; speech synthesis; French read utterances; hybrid HMM-ANN system; multiple phonetic transcriptions; phonetic labeling; segmentation tools; speech database; speech recognition systems; speech synthesis; synthetic speech pattern; synthetic speech segmentation tool; text-to-speech alignment systems; training process bootstrap; Artificial neural networks; Cepstral analysis; Hidden Markov models; Speech; Speech recognition; Speech synthesis; Training;
Conference_Titel :
Signal Processing Conference (EUSIPCO 1998), 9th European
Conference_Location :
Rhodes
Print_ISBN :
978-960-7620-06-4