• DocumentCode
    1657013
  • Title

    Algorithms for estimating information distance with application to bioinformatics and linguistics

  • Author

    Kaitchenko, A.

  • Author_Institution
    Dept. of Phys. & Comput., Wilfrid Laurier Univ., Waterloo, Ont., Canada
  • Volume
    4
  • fYear
    2004
  • Firstpage
    2255
  • Abstract
    We review unnormalized and normalized information distances based on incomputable notions of Kolmogorov complexity and discuss how Kolmogorov complexity can be approximated by data compression algorithms. We argue that optimal algorithms for data compression with side information can be successfully used to approximate the normalized distance. Next, we discuss an alternative information distance, which is based on relative entropy rate (also known as Kullback-Leibler divergence), and compression-based algorithms for its estimation. We conjecture that in bioinformatics and computational linguistics this alternative distance is more relevant and important than the ones based on Kolmogorov complexity.
  • Keywords
    approximation theory; computational complexity; computational linguistics; data compression; entropy; parameter estimation; Kolmogorov complexity; Kullback-Leibler divergence; bioinformatics; computational linguistics; data compression algorithms; information distance estimation algorithms; normalized information distance; relative entropy rate; unnormalized information distance; Bioinformatics; Computational linguistics; DNA; Data compression; Entropy; Genetic communication; Helium; Information theory; Physics computing; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical and Computer Engineering, 2004. Canadian Conference on
  • ISSN
    0840-7789
  • Print_ISBN
    0-7803-8253-6
  • Type

    conf

  • DOI
    10.1109/CCECE.2004.1347695
  • Filename
    1347695