DocumentCode
679532
Title
Power to the Points: Validating Data Memberships in Clusterings
Author
Raman, Pavithra ; Venkatasubramanian, Suresh
Author_Institution
Sch. of Comput., Univ. of Utah, Salt Lake City, UT, USA
fYear
2013
fDate
7-10 Dec. 2013
Firstpage
617
Lastpage
626
Abstract
In this paper, we present a method to attach affinity scores to the implicit labels of individual points in a clustering. The affinity scores capture the confidence level of the cluster that claims to "own" the point. We demonstrate that these scores accurately capture the quality of the label assigned to the point. We also show further applications of these scores to estimate global measures of clustering quality, as well as accelerate clustering algorithms by orders of magnitude using active selection based on affinity. This method is very general and applies to clusterings derived from any geometric source. It lends itself to easy visualization and can prove useful as part of an interactive visual analytics framework. It is also efficient: assigning an affinity score to a point depends only polynomially on the number of clusters and is independent both of the size and dimensionality of the data. It is based on techniques from the theory of interpolation, coupled with sampling and estimation algorithms from high dimensional computational geometry.
Keywords
computational geometry; data analysis; data visualisation; estimation theory; interpolation; pattern clustering; sampling methods; active selection; affinity scores; cluster confidence level; clustering algorithm; clustering quality global measure estimation; data membership validation; estimation algorithm; high dimensional computational geometry; interactive visual analytics framework; interpolation theory; label quality; sampling algorithm; visualization; Clustering algorithms; Data models; Data visualization; Educational institutions; Probabilistic logic; Stability analysis; Standards; Natural Neighbor Interpolation; Power Diagrams; Validating Clusterings;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location
Dallas, TX
ISSN
1550-4786
Type
conf
DOI
10.1109/ICDM.2013.147
Filename
6729546
Link To Document