Clustering high-dimensional data via random sampling and consensus

Author

Traganitis, Panagiotis A. ; Slavakis, Konstantinos ; Giannakis, Georgios B.

Author_Institution

Dept. of ECE & Digital Technol. Center, Univ. of Minnesota, Minneapolis, MN, USA

fYear

2014

fDate

3-5 Dec. 2014

Firstpage

307

Lastpage

311

Abstract

In response to the urgent need for learning tools tuned to big data analytics, the present paper introduces a feature selection approach to efficient clustering of high-dimensional vectors. The resultant method leverages random sampling and consensus (RANSAC) arguments, originally developed for robust regression tasks in computer vision, to yield novel dimensionality reduction schemes. The advocated random sampling and consensus K-means (RSC-Kmeans) algorithm can operate in either batch or sequential modes, with the latter being able to afford lower computational footprint than the former. Extensive numerical tests on synthetic and real datasets highlight the potential of the proposed algorithms, and demonstrate their competitive performance relative to state-of-the-art random projection alternatives.

Keywords

feature selection; pattern clustering; random processes; sampling methods; Big Data analytics; RANSAC arguments; RSC-Kmeans algorithm; batch modes; computational footprint; dimensionality reduction schemes; feature selection approach; high-dimensional data clustering; high-dimensional vector clustering; learning tools; numerical tests; random sampling-and-consensus K-means algorithm; real datasets; sequential modes; synthetic datasets; Accuracy; Big data; Clustering algorithms; Information processing; Pattern recognition; Robustness; Vectors; Clustering; K-means; feature selection; high-dimensional data; random sampling and consensus;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on

Conference_Location

Atlanta, GA

Type

conf

DOI

10.1109/GlobalSIP.2014.7032128

Filename

7032128