Title :
Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis
Author :
Wu, Yi-Jian ; King, Simon ; Tokuda, Keiichi
Author_Institution :
Centre for Speech Technol. Res., Univ. of Edinburgh, Edinburgh, UK
Abstract :
This paper explores a cross-lingual speaker adaptation technique for HMM-based speech synthesis, where a source voice model for English is transformed into a target speaker model using Mandarin Chinese speech data from the target speaker. A phone mapping- based method is adopted to map Chinese Initial/Finals into English phonemes and two types of mapping rules, including one-to-one and one-to-sequence mappings, are compared. In order to avoid having to map prosodic features between languages, the adaptation procedure uses regression classes and transforms that are constructed for triphone models, then used to adapt the phonetic-and-prosodic- context-dependent models. From the experimental results, we found that a one-to-sequence phone mapping is better than a one-to-one mapping, and that the similarity between adapted English speech and target Chinese speaker is reasonable.
Keywords :
hidden Markov models; regression analysis; speech processing; English; HMM-based speech synthesis; Mandarin Chinese speech data; cross-lingual speaker adaptation; one-to-one mappings; one-to-sequence mappings; phone mapping- based method; phonetic-and-prosodic- context-dependent models; source voice model; target speaker model; triphone models; Adaptation model; Context modeling; Hidden Markov models; Loudspeakers; Maximum likelihood linear regression; Natural languages; Speech recognition; Speech synthesis; Stress; Target recognition;
Conference_Titel :
Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-2942-4
Electronic_ISBN :
978-1-4244-2943-1
DOI :
10.1109/CHINSL.2008.ECP.14