DocumentCode
1657013
Title
Algorithms for estimating information distance with application to bioinformatics and linguistics
Author
Kaitchenko, A.
Author_Institution
Dept. of Phys. & Comput., Wilfrid Laurier Univ., Waterloo, Ont., Canada
Volume
4
fYear
2004
Firstpage
2255
Abstract
We review unnormalized and normalized information distances based on incomputable notions of Kolmogorov complexity and discuss how Kolmogorov complexity can be approximated by data compression algorithms. We argue that optimal algorithms for data compression with side information can be successfully used to approximate the normalized distance. Next, we discuss an alternative information distance, which is based on relative entropy rate (also known as Kullback-Leibler divergence), and compression-based algorithms for its estimation. We conjecture that in bioinformatics and computational linguistics this alternative distance is more relevant and important than the ones based on Kolmogorov complexity.
Keywords
approximation theory; computational complexity; computational linguistics; data compression; entropy; parameter estimation; Kolmogorov complexity; Kullback-Leibler divergence; bioinformatics; computational linguistics; data compression algorithms; information distance estimation algorithms; normalized information distance; relative entropy rate; unnormalized information distance; Bioinformatics; Computational linguistics; DNA; Data compression; Entropy; Genetic communication; Helium; Information theory; Physics computing; Sequences;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical and Computer Engineering, 2004. Canadian Conference on
ISSN
0840-7789
Print_ISBN
0-7803-8253-6
Type
conf
DOI
10.1109/CCECE.2004.1347695
Filename
1347695
Link To Document