DocumentCode
3031364
Title
Creating Streaming Iterative Soft Clustering Algorithms
Author
Hore, Prodip ; Hall, Lawrence O. ; Goldgof, Dmitry B.
Author_Institution
Univ. of South Florida, Tampa
fYear
2007
fDate
24-27 June 2007
Firstpage
484
Lastpage
488
Abstract
There are an increasing number of large labeled and unlabeled data sets available. Clustering algorithms are the best suited for helping one make sense out of unlabeled data. However, scaling iterative clustering algorithms to large amounts of data has been a challenge. The computation time can be very great and for data sets that will not fit in even the largest memory, only carefully chosen subsets of data can be practically clustered. We present a general approach which enables iterative fuzzy/possibilistic clustering algorithms to be turned into algorithms that can handle arbitrary amounts of streaming data. The computation time is also reduced for very large data sets while the results of clustering will be very similar to clustering with all the data, if that was possible. We introduce transformed equations for fuzzy-C-means, possibilistic C-means, the Gustafson-Kessel algorithm and show the excellent performance with a streaming fuzzy C-means implementation. The resulting clusters are both sensible and for comparable data sets (those that fit in memory) almost identical to those obtained with the original clustering algorithm.
Keywords
fuzzy logic; iterative methods; pattern clustering; possibility theory; Gustafson-Kessel algorithm; fuzzy-C-means; iterative fuzzy-possibilistic clustering algorithms; possibilistic C-means; streaming iterative soft clustering algorithm; unlabeled data set; Clustering algorithms; Computer science; Equations; Fuzzy sets; Iterative algorithms; Iterative methods; Labeling; Partitioning algorithms; Sampling methods; Wrapping; clustering; fuzzy; possibilistic; scalable; streaming;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Information Processing Society, 2007. NAFIPS '07. Annual Meeting of the North American
Conference_Location
San Diego, CA
Print_ISBN
1-4244-1213-7
Electronic_ISBN
1-4244-1214-5
Type
conf
DOI
10.1109/NAFIPS.2007.383888
Filename
4271111
Link To Document