• DocumentCode
    449837
  • Title

    Document Clustering with Semantic Analysis

  • Author

    Wang, Yong ; Hodges, Julia

  • Author_Institution
    Mississippi State University
  • Volume
    3
  • fYear
    2006
  • fDate
    04-07 Jan. 2006
  • Abstract
    Document clustering generates clusters from the whole document collection automatically and is used in many fields, including data mining and information retrieval. In the traditional vector space model, the unique words occurring in the document set are used as the features. But because of the synonym problem and the polysemous problem, such a bag of original words cannot represent the content of a document precisely. In this paper, we investigate using the sense disambiguation method to identify the sense of words to construct the feature vector for document representation. Our experimental results demonstrate that in most conditions, using sense can improve the performance of our document clustering system. But the comprehensive statistical analysis performed indicates that the differences between using original single words and using senses of words are not statistically significant. In this paper, we also provide an evaluation of several basic clustering algorithms for algorithm selection.
  • Keywords
    Clustering algorithms; Computer science; Data engineering; Data mining; Databases; Frequency; Information retrieval; Partitioning algorithms; Statistical analysis; Thesauri;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    System Sciences, 2006. HICSS '06. Proceedings of the 39th Annual Hawaii International Conference on
  • ISSN
    1530-1605
  • Print_ISBN
    0-7695-2507-5
  • Type

    conf

  • DOI
    10.1109/HICSS.2006.129
  • Filename
    1579400