• DocumentCode
    3089785
  • Title

    Calculating Wikipedia Article Similarity Using Machine Translation Evaluation Metrics

  • Author

    Erdmann, Maike ; Finch, Andrew ; Nakayama, Kotaro ; Sumita, Eiichiro ; Hara, Takahiro ; Nishio, Shojiro

  • Author_Institution
    Dept. of Inf. Sci. & Technol., Osaka Univ., Osaka, Japan
  • fYear
    2011
  • fDate
    22-25 March 2011
  • Firstpage
    620
  • Lastpage
    625
  • Abstract
    Calculating the similarity of Wikipedia articles in different languages is helpful for bilingual dictionary construction and various other research areas. However, standard methods for document similarity calculation are usually very simple. Therefore, we describe an approach of translating one Wikipedia article into the language of the other article, and then calculating article similarity with standard machine translation evaluation metrics. An experiment revealed that our approach is effective for identifying Wikipedia articles in different languages that are covering the same concept.
  • Keywords
    Web sites; language translation; natural language processing; Wikipedia article similarity; bilingual dictionary construction; machine translation evaluation metrics; Dictionaries; Electronic publishing; Encyclopedias; Internet; Measurement; Thesauri; Bilingual Dictionary Construction; Cross-language Document Similarity; Wikipedia Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Networking and Applications (WAINA), 2011 IEEE Workshops of International Conference on
  • Conference_Location
    Biopolis
  • Print_ISBN
    978-1-61284-829-7
  • Electronic_ISBN
    978-0-7695-4338-3
  • Type

    conf

  • DOI
    10.1109/WAINA.2011.132
  • Filename
    5763570