DocumentCode
2044743
Title
Building Bilingual Parallel Corpora Based on Wikipedia
Author
Mohammadi, Mehdi ; GhasemAghaee, Nasser
Author_Institution
Dept. of Comput. Eng., Sheikh Bahaie Univ., Isfahan, Iran
Volume
2
fYear
2010
fDate
19-21 March 2010
Firstpage
264
Lastpage
268
Abstract
Aligned parallel corpora are an important resource for a wide range of multilingual researches, specifically, corpus-based machine translation. In this paper we present a Persian-English sentence-aligned parallel corpus by mining Wikipedia. We propose a method of extracting sentence-level alignment by using an extended link-based bilingual lexicon method. Experimental results show that our method increase precision, while it reduce the total number of generated candidate pairs.
Keywords
data mining; language translation; natural language processing; search engines; Persian-English sentence-aligned parallel corpus; Wikipedia mining; bilingual parallel corpora; corpus-based machine translation; extended link-based bilingual lexicon method; sentence-level alignment extraction; Application software; Biographies; Buildings; Computer applications; Concurrent computing; Dictionaries; Encyclopedias; Natural languages; Parallel processing; Wikipedia; Parallel corpora; Sentence alignment; Wikipedia;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Engineering and Applications (ICCEA), 2010 Second International Conference on
Conference_Location
Bali Island
Print_ISBN
978-1-4244-6079-3
Electronic_ISBN
978-1-4244-6080-9
Type
conf
DOI
10.1109/ICCEA.2010.203
Filename
5445653
Link To Document