Title :
Incremental fuzzy clustering for document categorization
Author :
Jian-Ping Mei ; Yangtao Wang ; Lihui Chen ; Chunyan Miao
Author_Institution :
Coll. of Comput. Sci. & Technol., Zhejiang Univ. of Technol., Hangzhou, China
Abstract :
Incremental clustering has been proposed to handle large datasets which can not fit into memory entirely. Single pass fuzzy c-means (SpFCM) and Online fuzzy c-means (OFCM) are two representative incremental fuzzy clustering methods. Both of them extend the scalability of fuzzy c-means (FCM) by processing the dataset chunk by chunk. However, due to the data sparsity and high-dimensionality, SpFCM and OFCM fail to produce reasonable results for document data. In this study, we work on clustering approaches that take care of both the large-scale and high-dimensionality issues. Specifically, we propose two methods for incrementally clustering of document data. The first method is a modification of the existing FCM-based incremental clustering with a step to normalize the centroids in each iteration, while the other method is incremental clustering, i.e., Single-Pass or Online, with weighted fuzzy co-clustering. We use several benchmark document datasets for experimental study. The experimental results show that the proposed approaches achieved significant improvements over existing SpFCM and OFCM in document clustering.
Keywords :
document handling; fuzzy set theory; pattern clustering; OFCM; SpFCM; benchmark document datasets; data high-dimensionality; data sparsity; document categorization; document data; fuzzy c-means; incremental fuzzy clustering method; online fuzzy c-means; weighted fuzzy co-clustering; Atmospheric measurements; Clustering algorithms; Computers; Educational institutions; Electronic mail; Scalability; Vectors;
Conference_Titel :
Fuzzy Systems (FUZZ-IEEE), 2014 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-2073-0
DOI :
10.1109/FUZZ-IEEE.2014.6891554