Title :
An extended similarity distance for use with computable information estimators
Author_Institution :
Dept. of Comput. Sci., Univ. of Auckland, Auckland, New Zealand
Abstract :
Computable complexity and information estimators such as those from the Lempel-Ziv family or Titchener´s T-complexity and T-information may be used in similarity comparison in conjunction with the Normalized Compression Distance (NCD). The NCD is (almost) a metric and computes a similarity distance between two digitally encoded objects x and y based exclusively on their estimated individual and joint information content. In some similarity comparison applications, however, objects may also be distinguished by entropy rate rather than information content only. However, the NCD is not sensitive to entropy rate. This paper proposes an entropy rate sensitive extended version of the NCD, called ENCD, for use in such applications. It also shows that the T-information performs well in the context of both NCD and ENCD. Finally, the paper discusses the problem of added noise and scaling in input data to the NCD and ENCD, and demonstrates how appropriate encoding of the input data may mitigate the impact of these effects.
Keywords :
computational complexity; data compression; entropy; ENCD; Lempel-Ziv family; NCD metric; T-complexity; T-information; added noise problem; computable complexity; computable information estimators; digitally encoded objects; entropy rate sensitive extended NCD; extended similarity distance; individual information content; input data encoding; joint information content; normalized compression distance; scaling problem; similarity comparison applications; Complexity theory; Compressors; Entropy; Noise measurement; Signal to noise ratio;
Conference_Titel :
Information Theory and its Applications (ISITA), 2014 International Symposium on
Conference_Location :
Melbourne, VIC