Title :
A frame mapping based HMM approach to cross-lingual voice transformation
Author :
Qian, Yao ; Xu, Ji ; Soong, Frank K.
Author_Institution :
Microsoft Res. Asia, Beijing, China
Abstract :
Cross-lingual voice transformation is challenging when source language (L1) and target language (L2) are very different in corresponding phonetics and prosodies. We propose a frame mapping based HMM approach to this problem. The source speaker\´s speech data is first warped in frequency toward the target speaker by mapping corresponding formants of selected vowels. The parameter trajectories of the warped data are then "tiled" with the frames in target speaker\´s L2 data. The tiled new trajectories then form a simulated training set of target speaker in L1 and it is used to train an HMM TTS. With a bilingual (Mandarin and English) source speaker and a monolingual (English) target speaker, the frame mapping-based approach is capable of generating highly intelligible, good quality speech data in L1 (Mandarin), which sounds rather close to the target speaker. The good performance of the cross-lingual voice transformation is confirmed with speaker similarity, naturalness and intelligibility evaluations subjectively.
Keywords :
hidden Markov models; languages; speaker recognition; HMM TTS; bilingual source speaker; cross-lingual voice transformation; frame mapping based HMM approach; monolingual target speaker; parameter trajectory; phonetic; prosody; source language; speaker speech data source; speech data quality; target language; Adaptation models; Data models; Hidden Markov models; Speech; Training; Trajectory; Transforms; Cross-lingual; HMM-based TTS; VTLN;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2011.5947509