مرکز منطقه ای اطلاع رساني علوم و فناوري - A comparison of supervised and unsupervised cross-lingual speaker adaptation approaches for HMM-based speech synthesis

DocumentCode :

2798885

Title :

A comparison of supervised and unsupervised cross-lingual speaker adaptation approaches for HMM-based speech synthesis

Author :

Liang, Hui ; Dines, John ; Saheer, Lakshmi

Author_Institution :

Idiap Res. Inst., Martigny, Switzerland

fYear :

2010

fDate :

14-19 March 2010

Firstpage :

4598

Lastpage :

4601

Abstract :

The EMIME project aims to build a personalized speech-to-speech translator, such that spoken input of a user in one language is used to produce spoken output that still sounds like the user´s voice however in another language. This distinctiveness makes unsupervised cross-lingual speaker adaptation one key to the project´s success. So far, research has been conducted into unsupervised and cross-lingual cases separately by means of decision tree marginalization and HMM state mapping respectively. In this paper we combine the two techniques to perform unsupervised cross-lingual speaker adaptation. The performance of eight speaker adaptation systems (supervised vs. unsupervised, intra-lingual vs. cross-lingual) is compared using objective and subjective evaluations. Experimental results show the performance of unsupervised cross-lingual speaker adaptation is comparable to that of the supervised case in terms of spectrum adaptation in the EMIME scenario, even though automatically obtained transcriptions have a very high phoneme error rate.

Keywords :

decision trees; hidden Markov models; language translation; linguistics; speaker recognition; speech synthesis; HMM state mapping; HMM-based speech synthesis; decision tree marginalization; intra-lingual speaker adaptation; personalized speech-to-speech translator; phoneme error rate; spectrum adaptation; unsupervised cross-lingual speaker adaptation; Context modeling; Decision trees; Error analysis; Hidden Markov models; Loudspeakers; Mobile communication; Natural languages; Speech recognition; Speech synthesis; HMM state mapping; decision tree marginalization; unsupervised cross-lingual speaker adaptation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on

Conference_Location :

Dallas, TX

ISSN :

1520-6149

Print_ISBN :

978-1-4244-4295-9

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2010.5495559

Filename :

5495559

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2798885