Author :
Panigrahy, Rina ; Talwar, Kunal ; Wieder, Udi
Abstract :
This work investigates a geometric approach to proving cell probe lower bounds for data structure problems.We consider the {em approximate nearest neighbor search problem} on the Boolean hypercube $(bool^d,onenorm{cdot})$ with $d=Theta(log n)$. We show that any (randomized) data structure for the problem that answers $c$-approximate nearest neighbor search queries using $t$ probes must use space at least $n^{1+Omega(1/ct)}$. In particular, our bound implies that any data structure that uses space $tilde{O}(n)$ with polylogarithmic word size, and with constant probability gives a constant approximation to nearest neighbor search queries must be probed $Omega(log n/ loglog n)$ times. This improves on the lower bound of $Omega(loglog d/logloglog d)$ probes shown by Chakrabarti and Regev~cite{ChakrabartiR04} for any polynomial space data structure, and the $Omega(loglog d)$ lower bound in Patrascu and Thorup~cite{PatrascuT07} for linear space data structures.Our lower bound holds for the {em near neighbor problem}, where the algorithm knows in advance a good approximation to the distance to the nearest neighbor.Additionally, it is an {em average case} lower bound for the natural distribution for the problem. Our approach also gives the same bound for $(2-frac{1}{c})$-approximation to the farthest neighbor problem.For the case of non-adaptive algorithms we can improve the bound slightly and show a $Omega(log n)$ lower bound on the time complexity of data structures with $O(n)$ space and logarithmic word size.We also show similar lower bounds for the partial match problem: any randomized $t$-probe data structure that solves the partial match problem on ${0,1,star}^d$ for $d=Theta(log n)$ must use space $n^{1+Omega(1/t)}$. This implies an $Omega(log n/loglog n)$ lower bound for time complexity of near linear space data structures, slightly improving the $Omega(log n /(log log n)^2)$ lower bound from~cite{PatrascuT06a},cite{Jayra- mKKR03} for this range of $d$. Recently and independently Patrascu achieved similar bounds cite{patrascu08}. Our results also generalize to approximate partial match, improving on the bounds of cite{BarkolR02,PatrascuT06a}.
Keywords :
computational complexity; data structures; search problems; Boolean hypercube; cell probe lower bounds; data structure; geometric approach; nearest neighbor search problem; nonadaptive algorithm; partial match problem; time complexity; Approximation algorithms; Computational biology; Computer science; Data structures; Hypercubes; Information retrieval; Machine learning algorithms; Nearest neighbor searches; Polynomials; Probes; Cell Probe Lower Bounds; Geometry; Near Neighbor Search; Partial Match;