DocumentCode
2734715
Title
Testing of clustering
Author
Alon, Noga ; Dar, Seannie ; Parnas, Michal ; Ron, Dana
Author_Institution
Dept. of Math., Tel Aviv Univ., Israel
fYear
2000
fDate
2000
Firstpage
240
Lastpage
250
Abstract
A set X of points in ℜd is (k,b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X, distinguish between the case that X is (k,b)-clusterable and the case that X is ε-far from being (k,b´)-clusterable for any given 0<ε⩽1 and for b´⩾b. In ε-far from being (k,b´)-clusterable we mean that more than ε.|X| points should be removed from X so that it becomes (k,b´)-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of |X|, and polynomial in k and 1/ε. Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an ε-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of |X|. That is, without actually having to partition all points in X, the implicit representation can be used to answer queries concerning the cluster any given point belongs to
Keywords
computational complexity; pattern clustering; statistical analysis; clustering testing; cost measures; lower bounds; optimal cost; sampling; Clustering algorithms; Cost function; Educational institutions; Mathematics; Partitioning algorithms; Performance evaluation; Sampling methods; Size measurement; Testing; USA Councils;
fLanguage
English
Publisher
ieee
Conference_Titel
Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on
Conference_Location
Redondo Beach, CA
ISSN
0272-5428
Print_ISBN
0-7695-0850-2
Type
conf
DOI
10.1109/SFCS.2000.892111
Filename
892111
Link To Document