DocumentCode :
2044743
Title :
Building Bilingual Parallel Corpora Based on Wikipedia
Author :
Mohammadi, Mehdi ; GhasemAghaee, Nasser
Author_Institution :
Dept. of Comput. Eng., Sheikh Bahaie Univ., Isfahan, Iran
Volume :
2
fYear :
2010
fDate :
19-21 March 2010
Firstpage :
264
Lastpage :
268
Abstract :
Aligned parallel corpora are an important resource for a wide range of multilingual researches, specifically, corpus-based machine translation. In this paper we present a Persian-English sentence-aligned parallel corpus by mining Wikipedia. We propose a method of extracting sentence-level alignment by using an extended link-based bilingual lexicon method. Experimental results show that our method increase precision, while it reduce the total number of generated candidate pairs.
Keywords :
data mining; language translation; natural language processing; search engines; Persian-English sentence-aligned parallel corpus; Wikipedia mining; bilingual parallel corpora; corpus-based machine translation; extended link-based bilingual lexicon method; sentence-level alignment extraction; Application software; Biographies; Buildings; Computer applications; Concurrent computing; Dictionaries; Encyclopedias; Natural languages; Parallel processing; Wikipedia; Parallel corpora; Sentence alignment; Wikipedia;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Engineering and Applications (ICCEA), 2010 Second International Conference on
Conference_Location :
Bali Island
Print_ISBN :
978-1-4244-6079-3
Electronic_ISBN :
978-1-4244-6080-9
Type :
conf
DOI :
10.1109/ICCEA.2010.203
Filename :
5445653
Link To Document :
بازگشت