• DocumentCode
    2249929
  • Title

    Improving the Wikipedia Miner word sense disambiguation algorithm

  • Author

    Pohl, Aleksander

  • Author_Institution
    Jagiellonian Univ., Kraków, Poland
  • fYear
    2012
  • fDate
    9-12 Sept. 2012
  • Firstpage
    241
  • Lastpage
    248
  • Abstract
    This document describes the improvements of the Wikipedia Miner word sense disambiguation algorithm. The original algorithm performs very well in detecting key terms in documents and disambiguating them against Wikipedia articles. By replacing the original Normalized Google Distance inspired measure with Jaccard coefficient inspired measure and taking into account additional features, the disambiguation algorithm was improved by 8 percentage points (F1-measure), without impeding its performance nor introducing any additional preprocessing overhead. This document also presents some statistical data that are extracted from the Polish Wikipedia by Wikipedia Miner. An automatic evaluation of the performance of the disambiguation algorithm for Polish shows that it is almost as good as for English, even though the Polish Wikipedia has only a quarter of the number of the articles of the English Wikipedia.
  • Keywords
    Internet; Web sites; document handling; natural language processing; English; Jaccard coefficient; Normalized Google Distance; Polish; Wikipedia Miner word sense disambiguation algorithm; Wikipedia articles; document handling; Context; Electronic publishing; Encyclopedias; Internet; Machine learning algorithms; Semantics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Systems (FedCSIS), 2012 Federated Conference on
  • Conference_Location
    Wroclaw
  • Print_ISBN
    978-1-4673-0708-6
  • Electronic_ISBN
    978-83-60810-51-4
  • Type

    conf

  • Filename
    6354382