Title :
The Chinese-English Bilingual Sentence Alignment Based on Length
Author :
Ding, Huafu ; Quan, Lili ; Qi, Haoliang
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Univ. of Sci. & Technol., Harbin, China
Abstract :
Bilingual sentence pairs are key resource for statistical machine translation. Currently, most of the sentence alignment corpus is between English and French or English and German. And there is little specialized sentence alignment dataset between English and Chinese. So our aim is to create large-scale, high-precision English-Chinese aligned sentences. Length based method is used to align bilingual paragraphs which were extracted from CNKI (China National Knowledge Infrastructure). CNKI is one of largest academic website, and contains huge Chinese-English bilingual paragraph. Our method adapts and combines some approaches, which are based on words and based on hybrid. At last, we choose the best alignment by dynamic programming. The experiments on CNKI dataset showed that the presented method had satisfactory the recall ratio and the precision ratio.
Keywords :
dynamic programming; language translation; natural language processing; CNKI academic Web site; China National Knowledge Infrastructure; Chinese-English bilingual sentence alignment; bilingual paragraph alignment; bilingual sentence pair; dynamic programming; length based method; precision ratio; recall ratio; sentence alignment corpus; statistical machine translation; Computational linguistics; Data mining; Dictionaries; Dynamic programming; Educational institutions; Meetings; Pragmatics; bilingual sentence alignment; corpus of Chinese-English; sentence alignment based on length;
Conference_Titel :
Asian Language Processing (IALP), 2011 International Conference on
Conference_Location :
Penang
Print_ISBN :
978-1-4577-1733-8
DOI :
10.1109/IALP.2011.70