Title :
Recognition capacity versus search speed in noisy databases
Author_Institution :
Dept. of Electr. Eng., Univ. of California, Riverside, CA, USA
Abstract :
The tradeoff between the number of distinguishable objects and search speed in a data management system is investigated in an information-theoretic framework. In the discussed scenario, incoming high-dimensional (and noisy) data vectors are enrolled in (possibly multiple) clusters to be accessed later. Upon receiving a random query, which is the noisy version of an enrolled vector, the search engine retrieves only a subset of the clusters to compare against the query. This creates tension between the search speed (determined by the expected number of retrieved entries) and recognition capacity (maximum possible number of entries that can be reliably recognized). A single-letter achievable rate region is characterized and it is shown with examples that search can be performed much faster in the discussed scenario than in non-clustered linear scan without compromising maximum recognition capacity.
Keywords :
database management systems; information theory; vectors; data management system; high-dimensional data vector; information-theoretic framework; noisy database; nonclustered linear scan; random query; recognition capacity; search speed; Databases; Markov processes; Noise measurement; Reliability; Tin; Vectors; Zinc;
Conference_Titel :
Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on
Conference_Location :
Cambridge, MA
Print_ISBN :
978-1-4673-2580-6
Electronic_ISBN :
2157-8095
DOI :
10.1109/ISIT.2012.6283981