Title :
Quadratic Similarity Queries on Compressed Data
Author :
Ingber, Amir ; Courtade, Thomas ; Weissman, Tsachy
Author_Institution :
Dept. of Electr. Eng., Stanford Univ., Stanford, CA, USA
Abstract :
The problem of performing similarity queries on compressed data is considered. We study the fundamental tradeoff between compression rate, sequence length, and reliability of queries performed on compressed data. For a Gaussian source and quadratic similarity criterion, we show that queries can be answered reliably if and only if the compression rate exceeds a given threshold - the identification rate - which we explicitly characterize. When compression is performed at a rate greater than the identification rate, responses to queries on the compressed data can be made exponentially reliable. We give a complete characterization of this exponent, which is analogous to the error and excess-distortion exponents in channel and source coding, respectively. For a general source, we prove that the identification rate is at most that of a Gaussian source with the same variance. Therefore, as with classical compression, the Gaussian source requires the largest compression rate. Moreover, a scheme is described that attains this maximal rate for any source distribution.
Keywords :
channel coding; data compression; query processing; source coding; Gaussian source; channel coding; compressed data; compression rate; excess-distortion exponents; identification rate; quadratic similarity criterion; quadratic similarity queries; query reliability; sequence length; source coding; source distribution; Data compression; Compression; Fundamental limits; Hash; Search; similarity query;
Conference_Titel :
Data Compression Conference (DCC), 2013
Conference_Location :
Snowbird, UT
Print_ISBN :
978-1-4673-6037-1
DOI :
10.1109/DCC.2013.52