Title :
A novel approach for proper name transliteration verification
Author :
Jan, Ea-Ee ; Ge, Niyu ; Lin, Shih-Hsiang ; Roukos, Salim ; Sorensen, Jeffrey
Author_Institution :
IBM T.J Watson Res. Center, Yorktown Heights, NY, USA
fDate :
Nov. 29 2010-Dec. 3 2010
Abstract :
Proper name transliteration, the pronunciation based translation of a proper name, is important to many multilingual natural language processing task, such as Statistical Machine Translation (SMT) and Cross Lingual Information Retrieval (CLIR). This task is extremely challenging due to the pronunciation difference between the source and target language. A given proper name can lead to many different transliterations. In the past, research efforts had demonstrated a 30-50% error using top-1 reference for transliteration. This error leads to performance degradation for many applications. In this paper, a novel approach to verify a given proper name transliteration pair using a discrete variant Hidden Markov Model (HMM) alignment is proposed. The state emission probabilities are derived from SMT phrase tables. The proposed method yields an Equal Error Rate (EER) of 3.73% on a 300 matched and 1000 unmatched name pairs test set. By comparison, the commonly used SMT framework yields 6.5% EER under the best configuration. The widely used edit distance approach has an EER of 22%. Our new method achieves high accuracy and low complexity, and provides an alternative for name transliteration in CLIR and other cross lingual natural language applications such as word alignment and machine translation.
Keywords :
hidden Markov models; information retrieval; language translation; natural language processing; probability; SMT phrase tables; cross lingual information retrieval; discrete variant hidden Markov model; equal error rate; multilingual natural language processing task; pronunciation based translation; proper name transliteration verification; state emission probabilities; Computational modeling; Decoding; Error analysis; Hidden Markov models; Kernel; Noise measurement; Training; component; cross lingual IR; machine translation; translteration;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location :
Tainan
Print_ISBN :
978-1-4244-6244-5
DOI :
10.1109/ISCSLP.2010.5684842