Title of article :
Spectral Modeling Based on Gaussian Conditional Random Field for Statistical Parametric Speech Synthesis
Author/Authors :
khorram, soheil sharif university of technology - department of computer engineering, ايران , sameti, hossein sharif university of technology - department of computer engineering, ايران , bahmaninezhad, fahimeh sharif university of technology - department of computer engineering, ايران
Abstract :
This paper proposes an innovative spectral modeling approach based on Gaussian conditional random field (GCRF) theory. The proposedmethod is also incorporated in a statistical parametric speech synthesis (SPSS) framework. Conventionally, SPSS systems exploit hiddenMarkov model (HMM)-based spectral modeling technique which suffers from a trivial problem known as state independence assumption.This shortcoming refers to the fact that the distributions of adjacent frames are modeled independently in HMM, whilst they are highlydependent and correlated. The proposed model assumes that spectral trajectories form a left-to-right linear-chain conditional random field(CRF) with Gaussian potential functions. Therefore, instead of the inaccurate independence assumption, Markov assumption is establishedfor adjacent frames in a latent state. In order to train the proposed GCRF model a Viterbi algorithm along with a maximum likelihood(ML)-based parameter estimation procedure have been applied. The estimation algorithm leads to an optimization problem which is solvednumerically through the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. In synthesis phase, an efficient parameter generationalgorithm optimizing output probability measure has been derived. The designed parameter generation algorithm has the ability to exploitdynamic features as well as static features. Two sets of experiments are reported to prove the effectiveness of the proposed GCRF. In the firstset, GCRF with some heuristic context clusters and ML-based parameter estimation is evaluated in contrast to the predominant HMM-basedmethod. The results of objective and subjective tests confirm that the proposed system using heuristic contextual clusters outperformed thestandard HMM in small training databases (i.e. 50, 100 and 200 sentences), but in large datasets HMM performs better. It is mainly due tothe inability of the proposed system to adjust the number of model parameters with the size of training database. In the second set ofexperiments, the performance of GCRF using decision tree-based clusters is investigated. This later model has the ability to change themodel complexity according to the size of training database. All evaluation results of this experiment confirm significant improvement of theproposed system over the conventional HMM.
Keywords :
Gaussian Conditional Random Field , GCRF , Hidden Markov Model , HMM , HMM , Based Speech Synthesis , Spectral Modeling , State Independence Assumption , Statistical Parametric Speech Synthesis
Journal title :
The CSI Journal on Computer Science and Engineering (JCSE)
Journal title :
The CSI Journal on Computer Science and Engineering (JCSE)