Title :
Fuzzy co-clustering of documents and keywords
Author :
Kummamuru, Krishna ; Dhawale, Ajay ; Krishnapuram, Raghu
Author_Institution :
IBM India Res. Lab., IIT, New Delhi, India
Abstract :
Conventional clustering algorithms such as K-means and SAHN (also known as AHC) have been well studied and used in the information retrieval community for clustering text documents. More recently, efforts have been made to cluster documents and words simultaneously. The FCCM algorithm due to Oh et al. is a fuzzy clustering algorithm that maximizes the co-occurrence of categorical attributes (keywords) and the individual patterns (documents) in clusters. However, this algorithm poses certain problems when the number of documents or the number of words is very large. In this paper, we modify the FCCM algorithm so that it can be used to cluster large text corpora. Our experiments show that the modified algorithm is scalable and produces meaningful clusters. We also show the relation between FCCM and the Spherical K-Means (SKM) algorithm and introduce the Spherical Fuzzy c-Means (SFCM) algorithm.
Keywords :
fuzzy set theory; information retrieval; information retrieval systems; pattern clustering; text analysis; categorical attributes; categorical multivariate data; clustering text documents; conventional clustering algorithms; document co-clustering; fuzzy clustering algorithms; individual patterns; information retrieval community; keyword co-clustering; sequential agglomerative hierarchial nonoverlapping; spherical fuzzy c-means algorithm; spherical k-means algorithm; Bipartite graph; Clustering algorithms; Frequency shift keying; Information retrieval; Partitioning algorithms; Text mining;
Conference_Titel :
Fuzzy Systems, 2003. FUZZ '03. The 12th IEEE International Conference on
Print_ISBN :
0-7803-7810-5
DOI :
10.1109/FUZZ.2003.1206527