Title :
Factorized context modelling for Text-to-Speech synthesis
Author :
Heng Lu ; King, Simon
Author_Institution :
Centre for Speech Technol. Res., Univ. of Edinburgh, Edinburgh, UK
Abstract :
Because speech units are so context-dependent, a large number of linguistic context features are generally used by HMM-based Text-to-Speech (TTS) speech synthesis systems, via context-dependent models. Since it is impossible to train separate models for every context, decision trees are used to discover the most important combinations of features that should be modelled. The task of the decision tree is very hard - to generalize from a very small observed part of the context feature space to the rest - and they have a major weakness: they cannot directly take advantage of factorial properties: they subdivide the model space based on one feature at a time. We propose a Dynamic Bayesian Network (DBN) based Mixed Memory Markov Model (MMMM) to provide factorization of the context space. The results of a listening test are provided as evidence that the model successfully learns the factorial nature of this space.
Keywords :
belief networks; decision trees; hidden Markov models; linguistics; speech synthesis; DBN; HMM; MMMM; TTS system; context dependent model; context features space; decision tree; dynamic Bayesian network; factorized context modelling; linguistic context feature; mixed memory Markov model; text-to-speech synthesis; Bayes methods; Context; Context modeling; Hidden Markov models; Markov processes; Speech; Speech synthesis; Dynamic Bayesian Network; Mixed Memory Markov Model; Text-To-Speech synthesis; factorized model; maximum likelihood parameter generation;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639192