DocumentCode :
2798885
Title :
A comparison of supervised and unsupervised cross-lingual speaker adaptation approaches for HMM-based speech synthesis
Author :
Liang, Hui ; Dines, John ; Saheer, Lakshmi
Author_Institution :
Idiap Res. Inst., Martigny, Switzerland
fYear :
2010
fDate :
14-19 March 2010
Firstpage :
4598
Lastpage :
4601
Abstract :
The EMIME project aims to build a personalized speech-to-speech translator, such that spoken input of a user in one language is used to produce spoken output that still sounds like the user´s voice however in another language. This distinctiveness makes unsupervised cross-lingual speaker adaptation one key to the project´s success. So far, research has been conducted into unsupervised and cross-lingual cases separately by means of decision tree marginalization and HMM state mapping respectively. In this paper we combine the two techniques to perform unsupervised cross-lingual speaker adaptation. The performance of eight speaker adaptation systems (supervised vs. unsupervised, intra-lingual vs. cross-lingual) is compared using objective and subjective evaluations. Experimental results show the performance of unsupervised cross-lingual speaker adaptation is comparable to that of the supervised case in terms of spectrum adaptation in the EMIME scenario, even though automatically obtained transcriptions have a very high phoneme error rate.
Keywords :
decision trees; hidden Markov models; language translation; linguistics; speaker recognition; speech synthesis; HMM state mapping; HMM-based speech synthesis; decision tree marginalization; intra-lingual speaker adaptation; personalized speech-to-speech translator; phoneme error rate; spectrum adaptation; unsupervised cross-lingual speaker adaptation; Context modeling; Decision trees; Error analysis; Hidden Markov models; Loudspeakers; Mobile communication; Natural languages; Speech recognition; Speech synthesis; HMM state mapping; decision tree marginalization; unsupervised cross-lingual speaker adaptation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
ISSN :
1520-6149
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2010.5495559
Filename :
5495559
Link To Document :
بازگشت