DocumentCode
257691
Title
Clustering high-dimensional data via random sampling and consensus
Author
Traganitis, Panagiotis A. ; Slavakis, Konstantinos ; Giannakis, Georgios B.
Author_Institution
Dept. of ECE & Digital Technol. Center, Univ. of Minnesota, Minneapolis, MN, USA
fYear
2014
fDate
3-5 Dec. 2014
Firstpage
307
Lastpage
311
Abstract
In response to the urgent need for learning tools tuned to big data analytics, the present paper introduces a feature selection approach to efficient clustering of high-dimensional vectors. The resultant method leverages random sampling and consensus (RANSAC) arguments, originally developed for robust regression tasks in computer vision, to yield novel dimensionality reduction schemes. The advocated random sampling and consensus K-means (RSC-Kmeans) algorithm can operate in either batch or sequential modes, with the latter being able to afford lower computational footprint than the former. Extensive numerical tests on synthetic and real datasets highlight the potential of the proposed algorithms, and demonstrate their competitive performance relative to state-of-the-art random projection alternatives.
Keywords
feature selection; pattern clustering; random processes; sampling methods; Big Data analytics; RANSAC arguments; RSC-Kmeans algorithm; batch modes; computational footprint; dimensionality reduction schemes; feature selection approach; high-dimensional data clustering; high-dimensional vector clustering; learning tools; numerical tests; random sampling-and-consensus K-means algorithm; real datasets; sequential modes; synthetic datasets; Accuracy; Big data; Clustering algorithms; Information processing; Pattern recognition; Robustness; Vectors; Clustering; K-means; feature selection; high-dimensional data; random sampling and consensus;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on
Conference_Location
Atlanta, GA
Type
conf
DOI
10.1109/GlobalSIP.2014.7032128
Filename
7032128
Link To Document