مرکز منطقه ای اطلاع رساني علوم و فناوري - Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

DocumentCode :

3851973

Title :

Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

Author :

Heiga Zen;Norbert Braunschweiler;Sabine Buchholz;Mark J. F. Gales;Kate Knill;Sacha Krstulovic;Javier Latorre

Author_Institution :

He is now with Google, London, he was with Toshiba Research Europe, Cambridge, UK

Volume :

Issue :

fYear :

2012

Firstpage :

1713

Lastpage :

1724

Abstract :

An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language.

Keywords :

"Hidden Markov models","Transforms","Decision trees","Speech","Speech synthesis","Adaptation models","Vectors"

Journal_Title :

IEEE Transactions on Audio, Speech, and Language Processing

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2012.2187195

Filename :

6148263

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3851973