DocumentCode :
147051
Title :
Compression Schemes for Similarity Queries
Author :
Ochoa, Idoia ; Ingber, Amir ; Weissman, Tsachy
Author_Institution :
Dept. of Electr. Eng., Stanford Univ., Stanford, CA, USA
fYear :
2014
fDate :
26-28 March 2014
Firstpage :
332
Lastpage :
341
Abstract :
We consider compression of sequences in a database so that similarity queries can be performed efficiently in the compressed domain. The fundamental limits for this problem setting, which characterize the trade off between compression rate and reliability of the answers to the queries, have been characterized in past work. However, how to approach these limits in practice has remained largely unexplored. Recently, we proposed a scheme for this task that is based on existing lossy compression algorithms, for the general case where the similarity measure satisfies a triangle inequality. Although it was shown that it achieves the fundamental limits for some cases, it is suboptimal in general. In this paper we propose a new scheme that also uses lossy compression algorithms as a building block, but with a carefully chosen distortion measure that is different than the one defining the similarity between sequences. The new scheme significantly improves the compression rate compared to the previously proposed scheme in many cases. For example, for binary sources and Hamming similarity measure, simulation results show a compression rate close to the fundamental limit, and an improvement over the previously proposed scheme of up to 55% (for the same reliability). The results shed light on the fact that compression for similarity identification is inherently different than classical lossy compression.
Keywords :
data compression; query processing; compression rate; compression schemes; distortion measure; fundamental limits; lossy compression algorithm; reliability; similarity identification; similarity measure; similarity queries; triangle inequality; Compression algorithms; Databases; Distortion measurement; Loss measurement; Random variables; Reliability; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference (DCC), 2014
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Type :
conf
DOI :
10.1109/DCC.2014.37
Filename :
6824441
Link To Document :
بازگشت