DocumentCode
3756817
Title
MDL-based Hierarchical Clustering
Author
Zdravko Markov
Author_Institution
Comput. Sci. Dept., Central Connecticut State Univ., New Britain, CT, USA
fYear
2015
Firstpage
471
Lastpage
474
Abstract
This paper presents a new hierarchical clustering algorithm based on the use of the Minimum Description Length (MDL) principle. The clusters are created by recursively splitting the data using the values of an attribute (similarly to decision tree learning), so that each cluster contains the instances that have the same value for this attribute. Attributes are chosen to minimize the MDL evaluation measure of the clustering they create. The algorithm´s computational complexity is linear in the number of data instances and quadratic in the total number of different attribute-values in the data and can be substantially reduced by an efficient implementation using bit-level parallelism. We empirically evaluate the algorithm on 20 datasets from the UCI ML repository and show that it compares favorably to k-means and EM.
Keywords
"Clustering algorithms","Classification algorithms","Decision trees","Algorithm design and analysis","Encoding","Computational complexity","Entropy"
Publisher
ieee
Conference_Titel
Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on
Type
conf
DOI
10.1109/ICMLA.2015.95
Filename
7424360
Link To Document