Title :
Big Data Clustering Based on Summary Statistics
Author :
Junsong Fu;Yun Liu;Zhenjiang Zhang;Fei Xiong
Author_Institution :
Beijing Key Lab. of Commun. &
Abstract :
Big Data are expanding fast and widely researched and used in many domains. One of the largest challenges in data mining is how to cluster the Big Data efficiently. CF-tree is the original of many big data clustering algorithms, however some shortcomings are exist. This paper proposes an algorithm named Clustering based On the Summary Statistics (COSS). We first analyzes the shortcomings of the traditional approaches for constructing CF-tree with constant radius thresholds T for the micro clusters in detail and proposed a dynamic adaptive threshold setting mechanism. Having got all the micro clusters, a proper clustering algorithm is used to get the final clustering algorithm. We include a performance study demonstrating that the improved CF-tree is more space efficient and the clustering results are more refined.
Keywords :
"Clustering algorithms","Big data","Heuristic algorithms","Algorithm design and analysis","Partitioning algorithms","Time complexity","Scalability"
Conference_Titel :
Computational Intelligence Theory, Systems and Applications (CCITSA), 2015 First International Conference on
DOI :
10.1109/CCITSA.2015.23