• DocumentCode
    3627435
  • Title

    Comparison of semantic and single term similarity measures for clustering turkish documents

  • Author

    Bulent Yucesoy;Sule Gunduz Oguducu

  • Author_Institution
    Istanbul Tech. Univ., Istanbul
  • fYear
    2007
  • Firstpage
    393
  • Lastpage
    398
  • Abstract
    With the rapid growth of the World Wide Web (www), it becomes a critical issue to design and organize the vast amounts of on-line documents on the web according to their topic. Even for the search engines it is very important to group similar documents in order to improve their performance when a query is submitted to the system. Clusterng is useful for taxonomy design and similarity search of documents on such a domain. Similarity is fundamental to many clustering applications on hypertext. In this paper, we will study how measures of similarity are used to cluster a collection of documents on a web site. Most of the document clustering techniques rely on single term analysis of text, such as vector space model. To better group of related documents we propose a new semantic similarity measure. We compare our measure with Wu-Palmer similarity and cosine similarity. Experimental results show that cosine similarity perform better than the semantic similarities. We demonstrate our results on Turkish documents. This is a first study that considers the semantic similarities between Turkish documents.
  • Keywords
    "Taxonomy","Design engineering","Web sites","World Wide Web","Search engines","Functional analysis","Frequency","Thesauri","Machine learning","Application software"
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on
  • Print_ISBN
    978-0-7695-3069-7
  • Type

    conf

  • DOI
    10.1109/ICMLA.2007.52
  • Filename
    4457262