DocumentCode :
1781411
Title :
Incomplete Big Data Clustering Algorithm Using Feature Selection and Partial Distance
Author :
Fanyu Bu ; Zhikui Chen ; Qingchen Zhang ; Xin Wang
Author_Institution :
Sch. of Software Technol., Dalian Univ. of Technol., Dalian, China
fYear :
2014
fDate :
28-30 Nov. 2014
Firstpage :
263
Lastpage :
266
Abstract :
Incomplete data clustering plays an important role in the big data analysis and processing. Existing algorithms for clustering incomplete high-dimensional big data have low performances in both efficiency and effectiveness. The paper proposes an incomplete high-dimensional big data clustering algorithm based on feature selection and partial distance strategy. First, a hierarchical clustering-based feature subset selection algorithm is designed to reduce the dimensions of the data set. Next, a parallel k-means algorithm based on partial distance is derived to cluster the selected data subset in the first step. Experimental results demonstrate that the proposed algorithm achieves better clustering accuracy than the existing algorithms and takes significantly less time than other algorithms for clustering high-dimensional big data.
Keywords :
Big Data; data analysis; pattern clustering; Big Data analysis; Big Data processing; clustering accuracy; dimension reduction; feature selection; hierarchical clustering-based feature subset selection algorithm; incomplete Big Data clustering algorithm; parallel k-means algorithm; partial distance strategy; Accuracy; Algorithm design and analysis; Big data; Clustering algorithms; Educational institutions; Partitioning algorithms; Software algorithms; big data; cluster analysis; feature subset selection; incomplete data clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Home (ICDH), 2014 5th International Conference on
Conference_Location :
Guangzhou
Print_ISBN :
978-1-4799-4285-5
Type :
conf
DOI :
10.1109/ICDH.2014.57
Filename :
6996772
Link To Document :
بازگشت