Title :
Compression for similarity identification: Fundamental limits
Author :
Ingber, Amir ; Weissman, Tsachy
Author_Institution :
Yahoo! Labs., Sunnyvale, CA, USA
fDate :
June 29 2014-July 4 2014
Abstract :
We study the problem of compressing a source for the goal of answering similarity queries from the compressed data. Unlike classical compression, here there is no requirement that the source be reproduced from the compressed form. For discrete memoryless sources and an arbitrary similarity measure, we fully characterize the minimal compression rate that allows query answers, that are reliable in the sense of having a vanishing false-positive probability, when false negatives are not allowed. The result is partially based on a previous work by Ahlswede et al. [1], and the inherently typical subset lemma plays a key role in the converse proof. We then discuss the performance that is attainable by using schemes that use lossy source codes as a building block, and show that such schemes are, in general, suboptimal. Finally, we discuss the problem of computing the fundamental limit, and present numerical results.
Keywords :
data compression; query processing; classical compression; compressed data; discrete memoryless sources; lossy source codes; query answers; similarity identification; similarity queries; vanishing false-positive probability; Channel capacity; Databases; Distortion measurement; Random variables; Rate-distortion; Reliability;
Conference_Titel :
Information Theory (ISIT), 2014 IEEE International Symposium on
Conference_Location :
Honolulu, HI
DOI :
10.1109/ISIT.2014.6874783