Title :
English/Arabic bilingual dictionary construction using parallel texts from the Internet archive
Author :
Fattah, M.A. ; Ren, Fuji ; Shingo, Kuroiwa ; Atlam, Alsayed
Author_Institution :
Fac. of Eng., Tokushima Univ., Japan
Abstract :
In order to construct a good machine translation system or make any natural language processing research for cross language information retrieval you must have a good parallel corpus. The Internet archive contains a lot of parallel documents. To construct a good parallel corpus from the Internet archive, you must have a good bilingual dictionary. This paper describes an algorithm to automatically extract an English/Arabic bilingual dictionary from parallel texts that exist in the Internet archive. The system should preferably be useful for many different language pairs. Unlike most of the systems done, our system can extract translation pairs from a very small parallel corpus. This new system can extract translations from only two sentences in one language and two sentences in the other language if the requirements of the system accomplished. Moreover, this system is able to extract word pairs that are translation of each other and the explanation of the Arabic or English word in the other language as well. The accuracy of the system is 59.1% in the case of one English word translated to one Arabic word, 23.9% in the case of one English word translated to more than one Arabic word (Arabic phrase), and 14.6% in the case of one Arabic word translated to more than one English word (English phrase).
Keywords :
Internet; computational linguistics; dictionaries; language translation; natural languages; Arabic-English translation; English-Arabic bilingual dictionary construction; English-Arabic translation; Internet archives; machine translation system; natural language processing; parallel documents; parallel texts; translation pair extraction; Automatic testing; Data mining; Dictionaries; Information filtering; Information filters; Information retrieval; Internet; Natural language processing; Natural languages; Web pages; English/Arabic translation; Multilingual dictionaries; Parallel corpora;
Conference_Titel :
Circuits and Systems, 2003 IEEE 46th Midwest Symposium on
Print_ISBN :
0-7803-8294-3
DOI :
10.1109/MWSCAS.2003.1562450