Title :
Calculating Wikipedia Article Similarity Using Machine Translation Evaluation Metrics
Author :
Erdmann, Maike ; Finch, Andrew ; Nakayama, Kotaro ; Sumita, Eiichiro ; Hara, Takahiro ; Nishio, Shojiro
Author_Institution :
Dept. of Inf. Sci. & Technol., Osaka Univ., Osaka, Japan
Abstract :
Calculating the similarity of Wikipedia articles in different languages is helpful for bilingual dictionary construction and various other research areas. However, standard methods for document similarity calculation are usually very simple. Therefore, we describe an approach of translating one Wikipedia article into the language of the other article, and then calculating article similarity with standard machine translation evaluation metrics. An experiment revealed that our approach is effective for identifying Wikipedia articles in different languages that are covering the same concept.
Keywords :
Web sites; language translation; natural language processing; Wikipedia article similarity; bilingual dictionary construction; machine translation evaluation metrics; Dictionaries; Electronic publishing; Encyclopedias; Internet; Measurement; Thesauri; Bilingual Dictionary Construction; Cross-language Document Similarity; Wikipedia Mining;
Conference_Titel :
Advanced Information Networking and Applications (WAINA), 2011 IEEE Workshops of International Conference on
Conference_Location :
Biopolis
Print_ISBN :
978-1-61284-829-7
Electronic_ISBN :
978-0-7695-4338-3
DOI :
10.1109/WAINA.2011.132