• DocumentCode
    3067616
  • Title

    Information theoretic model validation for clustering

  • Author

    Buhmann, Joachim M.

  • Author_Institution
    Dept. of Comput. Sci., ETH Zurich, Zurich, Switzerland
  • fYear
    2010
  • fDate
    13-18 June 2010
  • Firstpage
    1398
  • Lastpage
    1402
  • Abstract
    Model selection in clustering requires (i) to specify a suitable clustering principle and (ii) to control the model order complexity by choosing an appropriate number of clusters depending on the noise level in the data. We advocate an information theoretic perspective where the uncertainty in the measurements quantizes the set of data partitionings and, thereby, induces uncertainty in the solution space of clusterings. A clustering model, which can tolerate a higher level of fluctuations in the measurements than alternative models, is considered to be superior provided that the clustering solution is equally informative. This tradeoff between informativeness and robustness is used as a model selection criterion. The requirement that data partitionings should generalize from one data set to an equally probable second data set gives rise to a new notion of structure induced information.
  • Keywords
    information theory; pattern clustering; clustering principle; information theoretic model validation; model order complexity; model selection criterion; structure induced information; Appropriate technology; Clustering algorithms; Clustering methods; Computer science; Couplings; Data analysis; Noise level; Noise robustness; Partitioning algorithms; Stability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on
  • Conference_Location
    Austin, TX
  • Print_ISBN
    978-1-4244-7890-3
  • Electronic_ISBN
    978-1-4244-7891-0
  • Type

    conf

  • DOI
    10.1109/ISIT.2010.5513616
  • Filename
    5513616