DocumentCode
2941469
Title
How many clusters to report: A recursive heuristic
Author
Carlis, John ; Bruso, Kelsey
Author_Institution
Comput. Sci. & Eng. Dept., Univ. of Minnesota, Minneapolis, MN, USA
fYear
2010
fDate
Aug. 31 2010-Sept. 4 2010
Firstpage
1069
Lastpage
1072
Abstract
Clustering can be a valuable tool for analyzing large amounts of data, but anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter when working within each of the three available frameworks where one thinks of clustering: as a Euclidean distance problem; as a statistical model problem; or as a complexity theory problem. We report here a novel recursive square root heuristic, RSQRT, which accurately predicts Kreported as a function of the attribute or item count, depending on attribute scales. We tested the heuristic on 226 widely-varying, but mostly scientific, studies, and found that the heuristic´s Kbest-predicted rounded to exactly Kreported in over half of the studies and was close in almost all of them. We claim that this strongly-supported heuristic makes sense and that, although it is not prescriptive, using it prospectively is much better than guessing.
Keywords
bioinformatics; data analysis; data clustering; item count; recursive square root heuristic; Bayesian methods; Clustering algorithms; Complexity theory; Computational modeling; Presses; Shape; Spirals; Algorithms; Cluster Analysis; Data Interpretation, Statistical; Humans; Incidence; Proportional Hazards Models; Risk Assessment; Risk Factors; Schizophrenia;
fLanguage
English
Publisher
ieee
Conference_Titel
Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE
Conference_Location
Buenos Aires
ISSN
1557-170X
Print_ISBN
978-1-4244-4123-5
Type
conf
DOI
10.1109/IEMBS.2010.5627287
Filename
5627287
Link To Document