مرکز منطقه ای اطلاع رساني علوم و فناوري - Unsupervised Intralingual and Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis Using Two-Pass Decision Tree Construction

DocumentCode :

1296065

Title :

Unsupervised Intralingual and Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis Using Two-Pass Decision Tree Construction

Author :

Gibson, Matthew ; Byrne, William

Author_Institution :

Dept. of Eng., Cambridge Univ., Cambridge, UK

Volume :

Issue :

fYear :

2011

fDate :

5/1/2011 12:00:00 AM

Firstpage :

895

Lastpage :

904

Abstract :

Hidden Markov model (HMM)-based speech synthesis systems possess several advantages over concatenative synthesis systems. One such advantage is the relative ease with which HMM-based systems are adapted to speakers not present in the training dataset. Speaker adaptation methods used in the field of HMM-based automatic speech recognition (ASR) are adopted for this task. In the case of unsupervised speaker adaptation, previous work has used a supplementary set of acoustic models to estimate the transcription of the adaptation data. This paper first presents an approach to the unsupervised speaker adaptation task for HMM-based speech synthesis models which avoids the need for such supplementary acoustic models. This is achieved by defining a mapping between HMM-based synthesis models and ASR-style models, via a two-pass decision tree construction process. Second, it is shown that this mapping also enables unsupervised adaptation of HMM-based speech synthesis models without the need to perform linguistic analysis of the estimated transcription of the adaptation data. Third, this paper demonstrates how this technique lends itself to the task of unsupervised cross-lingual adaptation of HMM-based speech synthesis models, and explains the advantages of such an approach. Finally, listener evaluations reveal that the proposed unsupervised adaptation methods deliver performance approaching that of supervised adaptation.

Keywords :

decision trees; hidden Markov models; speaker recognition; speech synthesis; HMM-based automatic speech recognition; Hidden Markov model-based speech synthesis; cross-lingual speaker adaptation; linguistic analysis; supplementary acoustic model; two-pass decision tree construction process; unsupervised intralingual speaker adaptation; Cross-lingual; hidden Markov model (HMM)-based speech synthesis; unsupervised speaker adaptation;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2010.2066968

Filename :

5549865

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1296065