• DocumentCode
    3164090
  • Title

    Cross-lingual frame selection method for polyglot speech synthesis

  • Author

    Chen, Chia-Ping ; Huang, Yi-Chin ; Wu, Chung-Hsien ; Lee, Kuan-De

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Nat. Sun Yat-Sen Univ., Kaohsiung, Taiwan
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4521
  • Lastpage
    4524
  • Abstract
    A novel approach is proposed to creating a polyglot speech synthesis system without the need of collecting speech data from a bilingual (or multilingual) speaker, which is often expensive or even infeasible. Given a target speaker with data in the first language (Mandarin in this study), the basic idea is to construct artificial utterances in the second language (English) via selection of speech sample frames of the given speaker in the first language. As the speaker needs not be polyglot, this method is generally applicable to any speaker and any languages. In the search for optimal frame sequence selection, the candidate set is constrained by a decision tree for phone segments in the speech data of both languages, and the cost function depends on the context-dependent articulatory and auditory features. Evaluation results show that good performance regarding similarity (speaker identity) and naturalness (speech quality) can be achieved with the proposed method.
  • Keywords
    decision trees; natural languages; speech synthesis; artificial utterances; auditory features; context dependent articulatory features; cost function; crosslingual frame selection method; decision tree; multilingual speaker; optimal frame sequence selection; phone segments; polyglot speech synthesis; speaker identity; speech data; speech quality; speech sample frames; target speaker; Adaptation models; Decision trees; Feature extraction; Hidden Markov models; Speech; Speech synthesis; Vectors; articulatory features; auditory features; frame selection; polyglot speech synthesis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6288923
  • Filename
    6288923