• DocumentCode
    3425221
  • Title

    A cross-language state mapping approach to bilingual (Mandarin-English) TTS

  • Author

    Liang, Hui ; Qian, Yao ; Soong, Frank K. ; Liu, Gongshen

  • Author_Institution
    Microsoft Res. Asia, Beijing
  • fYear
    2008
  • fDate
    March 31 2008-April 4 2008
  • Firstpage
    4641
  • Lastpage
    4644
  • Abstract
    We propose a cross-language state mapping approach to HMM-based bilingual TTS. Two language-dependent decision trees are built first with a bilingual speech database recorded by a single speaker. A state mapping for every leaf node in the decision tree of a target language is created by finding the nearest leaf node in the tree of a source language. Kullback-Leibler divergence between two distributions is used to find the nearest leaf node. To synthesize target language speech by a monolingual, (source language) speaker´s voice, we find HMM parameters trained by the monolingual (source language) speaker in the mapped leaf nodes. Similar mappings can be constructed by reversing the source and target languages. With these bi-directional cross-lingual mappings, we can synthesize bilingual or mixed-code speech by HMMs trained by any monolingual speaker. High voice (speaker) similarity is preserved in synthesized speech of the target language. Two perceptual tests on synthesized Mandarin speech confirms high intelligibility with a Chinese character transcription accuracy of 92.1% and an MOS score of 3.08.
  • Keywords
    decision trees; hidden Markov models; natural language processing; speech coding; speech synthesis; Chinese character transcription; Kullback-Leibler divergence; bi-directional cross-lingual mappings; bilingual Mandarin-English TTS; bilingual code speech synthesis; bilingual speech database; cross-language state mapping approach; hidden Markov model; language-dependent decision trees; mixed-code speech synthesis; Acoustic measurements; Asia; Databases; Decision trees; Frequency; Hidden Markov models; Information security; Natural languages; Signal synthesis; Speech synthesis; Bilingual; HMM-based TTS; new language synthesis; state mapping;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-1483-3
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2008.4518691
  • Filename
    4518691