DocumentCode :
3123092
Title :
Recognition capacity versus search speed in noisy databases
Author :
Tuncel, Ertem
Author_Institution :
Dept. of Electr. Eng., Univ. of California, Riverside, CA, USA
fYear :
2012
fDate :
1-6 July 2012
Firstpage :
2566
Lastpage :
2570
Abstract :
The tradeoff between the number of distinguishable objects and search speed in a data management system is investigated in an information-theoretic framework. In the discussed scenario, incoming high-dimensional (and noisy) data vectors are enrolled in (possibly multiple) clusters to be accessed later. Upon receiving a random query, which is the noisy version of an enrolled vector, the search engine retrieves only a subset of the clusters to compare against the query. This creates tension between the search speed (determined by the expected number of retrieved entries) and recognition capacity (maximum possible number of entries that can be reliably recognized). A single-letter achievable rate region is characterized and it is shown with examples that search can be performed much faster in the discussed scenario than in non-clustered linear scan without compromising maximum recognition capacity.
Keywords :
database management systems; information theory; vectors; data management system; high-dimensional data vector; information-theoretic framework; noisy database; nonclustered linear scan; random query; recognition capacity; search speed; Databases; Markov processes; Noise measurement; Reliability; Tin; Vectors; Zinc;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on
Conference_Location :
Cambridge, MA
ISSN :
2157-8095
Print_ISBN :
978-1-4673-2580-6
Electronic_ISBN :
2157-8095
Type :
conf
DOI :
10.1109/ISIT.2012.6283981
Filename :
6283981
Link To Document :
بازگشت