Title of article

Learning regional transliteration variants

Author/Authors

Jin-Shea Kuo، نويسنده , , Haizhou Li، نويسنده ,

Issue Information

دوماهنامه با شماره پیاپی سال 2012

Pages

16

From page

154

To page

169

Abstract

This paper conducts an inquiry into regional transliteration variants across Chinese speaking regions. We begin by studying the social association of regional transliterations, followed by postulating a computational model for effective transliteration extraction from the Web. In the computational model, we first propose constraint-based exploration by incorporating transliteration knowledge from transliteration modeling and predictive query suggestions from search engines into query formulation as constraints so as to increase the chance of desired transliteration returns in learning regional transliteration variants. Then, we study a cross-training algorithm, which explores the attainably helpful information of transliteration mappings across related regional corpora for the learning of transliteration models, to improve the overall extraction performance. The experimental results show that the proposed method not only effectively harvests a lexicon of regional transliteration variants but also mitigates the need of manual data labeling for transliteration modeling. We also carry out an investigation into the underlying characteristics of regional transliterations that motivate the cross-training algorithm.

Keywords

Transliteration variants , Constraint-based exploration , Predictive query suggestions , Regional social association , Cross-training algorithm , Transliteration variation

Journal title

Information Processing and Management

Serial Year

2012

Journal title

Information Processing and Management

Record number

Learning regional transliteration variants

Jin-Shea Kuo، نويسنده , , Haizhou Li، نويسنده ,

1229204