Title :
Identifying Language Origin of Person Names With N-Grams of Different Units
Author :
Chen, Yining ; You, Jiali ; Chu, Min ; Zhao, Yong ; Wang, JinLin
Author_Institution :
Microsoft Res. Asia, Beijing
Abstract :
Identifying the language origin of a name in English is important for generating its correct pronunciation. In this paper, N-grams of syllable-based letter clusters are proposed for the task. The performance of the N-gram model of a set of frequently used letter clusters (correspond to syllables) is compared to that of letter N-gram model in a four-language task: English, German, French, and Portuguese. On average, the letter cluster N-gram, which has 26% error rate, is slightly better than the letter N-gram, which has 27.2% error rate. Furthermore, it is found that the error distributions from the two N-grams have fairly large differences. Therefore, AdaBoost is used to combine the results from N-grams of different units. The error rate is reduced to 22.5% or a relative 17.5% error reduction is achieved after the combination
Keywords :
natural languages; speech recognition; speech synthesis; English; French; German; N-gram model; Portuguese; error reduction; language origin identification; person names; speech recognition; speech synthesis; syllable-based letter clusters; Acoustics; Asia; Engines; Error analysis; HTML; Natural languages; Speech recognition; Speech synthesis; Vocabulary; Watches;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location :
Toulouse
Print_ISBN :
1-4244-0469-X
DOI :
10.1109/ICASSP.2006.1660124