DocumentCode :
62422
Title :
Compression for Quadratic Similarity Queries
Author :
Ingber, Amir ; Courtade, Thomas ; Weissman, Tsachy
Author_Institution :
Dept. of Electr. Eng., Stanford Univ., Stanford, CA, USA
Volume :
61
Issue :
5
fYear :
2015
fDate :
May-15
Firstpage :
2729
Lastpage :
2747
Abstract :
The problem of performing similarity queries on compressed data is considered. We focus on the quadratic similarity measure, and study the fundamental tradeoff between compression rate, sequence length, and reliability of queries performed on the compressed data. For a Gaussian source, we show that the queries can be answered reliably if and only if the compression rate exceeds a given threshold-the identification rate-which we explicitly characterize. Moreover, when compression is performed at a rate greater than the identification rate, responses to queries on the compressed data can be made exponentially reliable. We give a complete characterization of this exponent, which is analogous to the error and excess-distortion exponents in channel and source coding, respectively. For a general source, we prove that, as with classical compression, the Gaussian source requires the largest compression rate among sources with a given variance. Moreover, a robust scheme is described that attains this maximal rate for any source distribution.
Keywords :
Gaussian processes; data compression; query processing; source coding; Gaussian source; data compression rate; error excess-distortion exponents; excess-distortion exponents; identification rate; quadratic similarity queries; robust scheme; sequence length; source coding; Accuracy; Databases; Electrical engineering; Electronic mail; Materials; Random variables; Reliability; Compression; databases; error exponent; identification rate; search;
fLanguage :
English
Journal_Title :
Information Theory, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9448
Type :
jour
DOI :
10.1109/TIT.2015.2402972
Filename :
7039228
Link To Document :
بازگشت