DocumentCode
2016751
Title
A novel approach for proper name transliteration verification
Author
Jan, Ea-Ee ; Ge, Niyu ; Lin, Shih-Hsiang ; Roukos, Salim ; Sorensen, Jeffrey
Author_Institution
IBM T.J Watson Res. Center, Yorktown Heights, NY, USA
fYear
2010
fDate
Nov. 29 2010-Dec. 3 2010
Firstpage
89
Lastpage
94
Abstract
Proper name transliteration, the pronunciation based translation of a proper name, is important to many multilingual natural language processing task, such as Statistical Machine Translation (SMT) and Cross Lingual Information Retrieval (CLIR). This task is extremely challenging due to the pronunciation difference between the source and target language. A given proper name can lead to many different transliterations. In the past, research efforts had demonstrated a 30-50% error using top-1 reference for transliteration. This error leads to performance degradation for many applications. In this paper, a novel approach to verify a given proper name transliteration pair using a discrete variant Hidden Markov Model (HMM) alignment is proposed. The state emission probabilities are derived from SMT phrase tables. The proposed method yields an Equal Error Rate (EER) of 3.73% on a 300 matched and 1000 unmatched name pairs test set. By comparison, the commonly used SMT framework yields 6.5% EER under the best configuration. The widely used edit distance approach has an EER of 22%. Our new method achieves high accuracy and low complexity, and provides an alternative for name transliteration in CLIR and other cross lingual natural language applications such as word alignment and machine translation.
Keywords
hidden Markov models; information retrieval; language translation; natural language processing; probability; SMT phrase tables; cross lingual information retrieval; discrete variant hidden Markov model; equal error rate; multilingual natural language processing task; pronunciation based translation; proper name transliteration verification; state emission probabilities; Computational modeling; Decoding; Error analysis; Hidden Markov models; Kernel; Noise measurement; Training; component; cross lingual IR; machine translation; translteration;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location
Tainan
Print_ISBN
978-1-4244-6244-5
Type
conf
DOI
10.1109/ISCSLP.2010.5684842
Filename
5684842
Link To Document