Learning the Threshold in Hierarchical Agglomerative Clustering

Author

Daniels, Kristine ; Giraud-Carrier, Christophe

Author_Institution

Dept. of Comput. Sci., Brigham Young Univ., Provo, UT

fYear

2006

fDate

Dec. 2006

Firstpage

270

Lastpage

278

Abstract

Most partitional clustering algorithms require the number of desired clusters to be set a priori. Not only is this somewhat counter-intuitive, it is also difficult except in the simplest of situations. By contrast, hierarchical clustering may create partitions with varying numbers of clusters. The actual final partition depends on a threshold placed on the similarity measure used. Given a cluster quality metric, one can efficiently discover an appropriate threshold through a form of semi-supervised learning. This paper shows one such solution for complete-link hierarchical agglomerative clustering using the F-measure and a small subset of labeled examples. Empirical evaluation demonstrates promise

Keywords

learning (artificial intelligence); pattern clustering; hierarchical agglomerative clustering algorithm; semisupervised learning algorithm; Clustering algorithms; Computer science; Data mining; Data visualization; Euclidean distance; Iterative algorithms; Merging; Partitioning algorithms; Semisupervised learning; Taxonomy;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Learning and Applications, 2006. ICMLA '06. 5th International Conference on

Conference_Location

Orlando, FL

Print_ISBN

0-7695-2735-3

Type

conf

DOI

10.1109/ICMLA.2006.33

Filename

4041503