• DocumentCode
    3756817
  • Title

    MDL-based Hierarchical Clustering

  • Author

    Zdravko Markov

  • Author_Institution
    Comput. Sci. Dept., Central Connecticut State Univ., New Britain, CT, USA
  • fYear
    2015
  • Firstpage
    471
  • Lastpage
    474
  • Abstract
    This paper presents a new hierarchical clustering algorithm based on the use of the Minimum Description Length (MDL) principle. The clusters are created by recursively splitting the data using the values of an attribute (similarly to decision tree learning), so that each cluster contains the instances that have the same value for this attribute. Attributes are chosen to minimize the MDL evaluation measure of the clustering they create. The algorithm´s computational complexity is linear in the number of data instances and quadratic in the total number of different attribute-values in the data and can be substantially reduced by an efficient implementation using bit-level parallelism. We empirically evaluate the algorithm on 20 datasets from the UCI ML repository and show that it compares favorably to k-means and EM.
  • Keywords
    "Clustering algorithms","Classification algorithms","Decision trees","Algorithm design and analysis","Encoding","Computational complexity","Entropy"
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on
  • Type

    conf

  • DOI
    10.1109/ICMLA.2015.95
  • Filename
    7424360