• DocumentCode
    1472393
  • Title

    Information Distance in Multiples

  • Author

    Vitányi, Paul M B

  • Author_Institution
    Nat. Res. Center for Math. & Comput. Sci. in the Netherlands, Netherlands
  • Volume
    57
  • Issue
    4
  • fYear
    2011
  • fDate
    4/1/2011 12:00:00 AM
  • Firstpage
    2451
  • Lastpage
    2456
  • Abstract
    Information distance is a parameter-free similarity measure based on compression, used in pattern recognition, data mining, phylogeny, clustering and classification. The notion of information distance is extended from pairs to multiples (finite lists). We study maximal overlap, metricity, universality, minimal overlap, additivity and normalized information distance in multiples. We use the theoretical notion of Kolmogorov complexity which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program.
  • Keywords
    communication complexity; data mining; information theory; pattern classification; pattern clustering; Kolmogorov complexity; data mining; information distance; parameter-free similarity measure; pattern classification; pattern clustering; pattern recognition; phylogeny; Additives; Color; Complexity theory; Measurement; Pattern recognition; Proposals; Turing machines; Data mining; Kolmogorov complexity; information distance; multiples; pattern recognition; similarity;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.2011.2110130
  • Filename
    5730590