Title :
clusiVAT: A mixed visual/numerical clustering algorithm for big data
Author :
Kumar, Dinesh ; Palaniswami, Marimuthu ; Rajasegarar, Sutharshan ; Leckie, Christopher ; Bezdek, James C. ; Havens, Timothy C.
Author_Institution :
EEE, U. of Melbourne, Melbourne, VIC, Australia
Abstract :
Recent algorithmic and computational improvements have reduced the time it takes to build a minimal spanning tree (MST) for big data sets. In this paper we compare single linkage clustering based on MSTs built with the Filter-Kruskal method to the proposed clusiVAT algorithm, which is based on sampling the data, imaging the sample to estimate the number of clusters, followed by non-iterative extension of the labels to the rest of the big data with the nearest prototype rule. Numerical experiments with both synthetic and real data confirm the theory that clusiVAT produces true single linkage clusters in compact, separated data. We also show that single linkage fails, while clusiVAT finds high quality partitions that match ground truth labels very well. And clusiVAT is fast: it recovers the preferred c = 3 Gaussian clusters in a mixture of 1 million two-dimensional data points with 100% accuracy in 3.1 seconds.
Keywords :
Big Data; Gaussian processes; pattern clustering; sampling methods; trees (mathematics); Filter-Kruskal method; Gaussian clusters; MST; big data; clusiVAT algorithm; clustering with scalable visual assessment of tendency algorithm; data sampling; ground truth labels; high quality partitions; labels noniterative extension; minimal spanning tree; mixed visual-numerical clustering algorithm; prototype rule; sample imaging; single linkage clustering; true single linkage clusters; Clustering algorithms; Couplings; Data handling; Data storage systems; Information management; Partitioning algorithms; Visualization; Big Data; Cluster Analysis; Filter-Kruskal MST; Pattern Recognition; Single Linkage;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691561