• DocumentCode
    818291
  • Title

    The Normalized Compression Distance Is Resistant to Noise

  • Author

    Cebrián, Manuel ; Alfonseca, Manuel ; Ortega, Alfonso

  • Author_Institution
    Escuela Politecnica Superior, Univ. Autonoma de Madrid
  • Volume
    53
  • Issue
    5
  • fYear
    2007
  • fDate
    5/1/2007 12:00:00 AM
  • Firstpage
    1895
  • Lastpage
    1900
  • Abstract
    This correspondence studies the influence of noise on the normalized compression distance (NCD), a measure based on the use of compressors to compute the degree of similarity of two files. This influence is approximated by a first order differential equation which gives rise to a complex effect, which explains the fact that the NCD may give values greater than 1, observed by other authors. The model is tested experimentally with good adjustment. Finally, the influence of noise on the clustering of files of different types is explored, finding that the NCD performs well even in the presence of quite high noise levels
  • Keywords
    data compression; differential equations; NCD; file clustering; first order differential equation; normalized compression distance; Associate members; Compression algorithms; Compressors; Data analysis; Differential equations; Noise level; Noise measurement; Noise reduction; Signal to noise ratio; Testing; Clustering and noise resistance; Kolmogorov complexity; datafile corruption; heterogeneous data analysis; noisy channel; normalized compression distance; universal similarity distance;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.2007.894669
  • Filename
    4167725