DocumentCode :
659600
Title :
Scalable approximation of kernel fuzzy c-means
Author :
Zijian Zhang ; Havens, Timothy C.
Author_Institution :
Dept. of Electr. & Comput. Eng., Michigan Technol. Univ., Houghton, MI, USA
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
161
Lastpage :
168
Abstract :
Virtually every sector of business and industry that use computing, including financial analysis, search engines, and electronic commerce, incorporate Big Data analysis into their business model. Sophisticated clustering algorithms are highly desired to deduce the nature of data by assigning labels to unlabeled data. We address two main challenges in Big Data. First, by definition, the volume of Big Data is too large to be loaded into a computer´s memory (this volume changes based on the computer used or available). Second, in real-time applications, the velocity of new incoming data prevents historical data from being stored and future data from being accessed. Therefore, we propose our Streaming Kernel Fuzzy c-Means (stKFCM) algorithm, which reduces both computational complexity and space complexity significantly. The proposed stKFCM only requires O(n2) memory where n is the (predetermined) size of a data subset (or data chunk) at each time step, which makes this algorithm truly scalable (as n can be chosen based on the available memory). Furthermore, only 2n2 elements of the full N × N (where N >> n) kernel matrix need to be calculated at each time-step, thus reducing both the computation time in producing the kernel elements and the complexity of the FCM algorithm. Empirical results show that stKFCM, even with very small n, can provide clustering performance as accurately as kernel fuzzy c-means run on the entire data set while achieving a significant speedup.
Keywords :
Big Data; approximation theory; computational complexity; pattern clustering; big data analysis; clustering performance; computation time reduction; computational complexity reduction; kernel fuzzy c-means scalable approximation; kernel matrix; space complexity reduction; stKFCM algorithm; streaming kernel fuzzy c-means algorithm; Approximation algorithms; Clustering algorithms; Data handling; Information management; Kernel; Partitioning algorithms; Vectors; fuzzy c-means; kernel clustering; projection; scalable algorithms; streaming data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
Type :
conf
DOI :
10.1109/BigData.2013.6691749
Filename :
6691749
Link To Document :
بازگشت