Title :
Clustering method using hypergraph models based on Set Pair Analysis
Author :
Lin, Guo-Ping ; Li, Shao-Zi
Author_Institution :
Dept. of Math. & Inf. Sci., Zhangzhou Normal Univ., Zhangzhou, China
Abstract :
Text clustering methods can be used to structure large sets of text or hypertext documents. However, a lot of well-known methods for text clustering do not really address the special problems of text clustering: very high dimensionality of the data and understandability of the cluster description. In this paper, we introduce a novel approach which is based on the hypergraph model of text clustering by using Set Pair Analysis (SPA) that is a new methodology to describe and process system uncertainty. In this method, we define a new measure for text similarity by the identical, different, and contrary of Set Pair. After setting up the hypergraph model, a hypergraph partitioning algorithm will be used to find clusters. The new method can eliminate disadvantageous factors and decreases the textual dimension of text and enhances the speed and accuracy of the text clustering. The experiment demonstrates that our approach is applicable and effective in high dimensional textual datasets.
Keywords :
graph theory; hypermedia; text analysis; cluster description; clustering method; high dimensional textual datasets; hypergraph models; hypergraph partitioning algorithm; hypertext documents; process system uncertainty; set pair analysis; text clustering methods; Clustering algorithms; Clustering methods; Cognitive science; Information analysis; Information science; Information systems; Partitioning algorithms; Spatial databases; Text mining; Web sites;
Conference_Titel :
IT in Medicine & Education, 2009. ITIME '09. IEEE International Symposium on
Conference_Location :
Jinan
Print_ISBN :
978-1-4244-3928-7
Electronic_ISBN :
978-1-4244-3930-0
DOI :
10.1109/ITIME.2009.5236279