• DocumentCode
    3628501
  • Title

    Comparing measures of semantic similarity

  • Author

    Nikola Ljubesic;Damir Boras;Nikola Bakaric;Jasmina Njavro

  • Author_Institution
    Department of Information Sciences, Faculty of Humanities and Social Sciences, Ivana Lu?i?a 3, 10000 Zagreb, Croatia
  • fYear
    2008
  • fDate
    6/1/2008 12:00:00 AM
  • Firstpage
    675
  • Lastpage
    682
  • Abstract
    The aim of this paper is to compare different methods for automatic extraction of semantic similarity measures from corpora. The semantic similarity measure is proven to be very useful for many tasks in natural language processing like information retrieval, information extraction, machine translation etc. Additionally, one of the main problems in natural language processing is data sparseness since no language sample is large enough to seize all possible language combinations. In our research we experiment with four different measures of association with context and eight different measures of vector similarity. The results show that the Jensen-Shannon divergence and L1 and L2 norm outperform other measures of vector similarity regardless of the measure of association with context used. Maximum likelihood estimate and t-test show better results than other measures of association with context.
  • Keywords
    "Frequency measurement","Vectors","Distance measurement","Buildings","Data mining","Computational efficiency","Information retrieval"
  • Publisher
    ieee
  • Conference_Titel
    Information Technology Interfaces, 2008. ITI 2008. 30th International Conference on
  • ISSN
    1330-1012
  • Print_ISBN
    978-953-7138-12-7
  • Type

    conf

  • DOI
    10.1109/ITI.2008.4588492
  • Filename
    4588492