Title :
Creating a Wikipedia-based Persian-English word association dictionary
Author :
Rahimi, Zahra ; Shakery, Azadeh
Author_Institution :
Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran
Abstract :
One of the most important issues in cross language information retrieval is how to cross the language barrier between the query and the documents. Different translation resources have been studied for this purpose. In this research, we study using Wikipedia for query translation by constructing a Wikipedia-based bilingual association dictionary. We use English and Persian Wikipedia inter-language links to align related titles and then mine word by word associations between the two languages using the extracted alignments. We use the mined word association dictionary for translating queries in Persian-English cross language information retrieval. Our experimental results on Hamshari corpus show that the proposed method is effective in extracting word associations and that Persian Wikipedia is a promising translation resource. Using the association dictionary, we can improve the pure dictionary-based method, where the only translation resource is a bilingual dictionary, by 33.6% and its recall by 26.2%.
Keywords :
Internet; data mining; natural language processing; query processing; Hamshari corpus; Persian Wikipedia; Persian-English word association dictionary; Wikipedia-based bilingual association dictionary; cross language information retrieval; query translation; word association dictionary mining; Dictionaries; Electronic publishing; Encyclopedias; Frequency measurement; Information retrieval; Internet; Wikipedia; Wikipedia Mining; association dictionary; cross language information retrieval;
Conference_Titel :
Telecommunications (IST), 2010 5th International Symposium on
Conference_Location :
Tehran
Print_ISBN :
978-1-4244-8183-5
DOI :
10.1109/ISTEL.2010.5734088