• DocumentCode
    454650
  • Title

    Identifying Language Origin of Person Names With N-Grams of Different Units

  • Author

    Chen, Yining ; You, Jiali ; Chu, Min ; Zhao, Yong ; Wang, JinLin

  • Author_Institution
    Microsoft Res. Asia, Beijing
  • Volume
    1
  • fYear
    2006
  • fDate
    14-19 May 2006
  • Abstract
    Identifying the language origin of a name in English is important for generating its correct pronunciation. In this paper, N-grams of syllable-based letter clusters are proposed for the task. The performance of the N-gram model of a set of frequently used letter clusters (correspond to syllables) is compared to that of letter N-gram model in a four-language task: English, German, French, and Portuguese. On average, the letter cluster N-gram, which has 26% error rate, is slightly better than the letter N-gram, which has 27.2% error rate. Furthermore, it is found that the error distributions from the two N-grams have fairly large differences. Therefore, AdaBoost is used to combine the results from N-grams of different units. The error rate is reduced to 22.5% or a relative 17.5% error reduction is achieved after the combination
  • Keywords
    natural languages; speech recognition; speech synthesis; English; French; German; N-gram model; Portuguese; error reduction; language origin identification; person names; speech recognition; speech synthesis; syllable-based letter clusters; Acoustics; Asia; Engines; Error analysis; HTML; Natural languages; Speech recognition; Speech synthesis; Vocabulary; Watches;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
  • Conference_Location
    Toulouse
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0469-X
  • Type

    conf

  • DOI
    10.1109/ICASSP.2006.1660124
  • Filename
    1660124