• DocumentCode
    3336537
  • Title

    Clustering method using hypergraph models based on Set Pair Analysis

  • Author

    Lin, Guo-Ping ; Li, Shao-Zi

  • Author_Institution
    Dept. of Math. & Inf. Sci., Zhangzhou Normal Univ., Zhangzhou, China
  • Volume
    1
  • fYear
    2009
  • fDate
    14-16 Aug. 2009
  • Firstpage
    1194
  • Lastpage
    1197
  • Abstract
    Text clustering methods can be used to structure large sets of text or hypertext documents. However, a lot of well-known methods for text clustering do not really address the special problems of text clustering: very high dimensionality of the data and understandability of the cluster description. In this paper, we introduce a novel approach which is based on the hypergraph model of text clustering by using Set Pair Analysis (SPA) that is a new methodology to describe and process system uncertainty. In this method, we define a new measure for text similarity by the identical, different, and contrary of Set Pair. After setting up the hypergraph model, a hypergraph partitioning algorithm will be used to find clusters. The new method can eliminate disadvantageous factors and decreases the textual dimension of text and enhances the speed and accuracy of the text clustering. The experiment demonstrates that our approach is applicable and effective in high dimensional textual datasets.
  • Keywords
    graph theory; hypermedia; text analysis; cluster description; clustering method; high dimensional textual datasets; hypergraph models; hypergraph partitioning algorithm; hypertext documents; process system uncertainty; set pair analysis; text clustering methods; Clustering algorithms; Clustering methods; Cognitive science; Information analysis; Information science; Information systems; Partitioning algorithms; Spatial databases; Text mining; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    IT in Medicine & Education, 2009. ITIME '09. IEEE International Symposium on
  • Conference_Location
    Jinan
  • Print_ISBN
    978-1-4244-3928-7
  • Electronic_ISBN
    978-1-4244-3930-0
  • Type

    conf

  • DOI
    10.1109/ITIME.2009.5236279
  • Filename
    5236279