DocumentCode :
1087
Title :
λ-diverse nearest neighbors browsing for multidimensional data
Author :
Kucuktunc, Onur ; Ferhatosmanoglu, Hakan
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Volume :
25
Issue :
3
fYear :
2013
fDate :
Mar-13
Firstpage :
481
Lastpage :
493
Abstract :
Traditional search methods try to obtain the most relevant information and rank it according to the degree of similarity to the queries. Diversity in query results is also preferred by a variety of applications since results very similar to each other cannot capture all aspects of the queried topic. In this paper, we focus on the lambda-diverse k-nearest neighbor search problem on spatial and multidimensional data. Unlike the approach of diversifying query results in a postprocessing step, we naturally obtain diverse results with the proposed geometric and index-based methods. We first make an analogy with the concept of Natural Neighbors (NatN) and propose a natural neighbor-based method for 2D and 3D data and an incremental browsing algorithm based on Gabriel graphs for higher dimensional spaces. We then introduce a diverse browsing method based on the distance browsing feature of spatial index structures, such as R-trees. The algorithm maintains a Priority Queue with mindivdist of the objects depending on both relevancy and angular diversity and efficiently prunes nondiverse items and nodes. We experiment with a number of spatial and high-dimensional data sets, including Factual´s (http://www.factual.com/) US points-of-interest data set of 13M entries. On the experimental setup, the diverse browsing method is shown to be more efficient (regarding disk accesses) than k-NN search on R-trees, and more effective (regarding Maximal Marginal Relevance (MMR)) than the diverse nearest neighbor search techniques found in the literature.
Keywords :
computational geometry; query processing; queueing theory; relevance feedback; search problems; set theory; tree data structures; trees (mathematics); λ-diverse k-nearest neighbor search problem; λ-diverse nearest neighbors browsing; 2D data; 3D data; Factual US points-of-interest data set; Gabriel graph; MMR; R-trees; angular diversity; distance browsing feature; diverse browsing method; geometric methods; high-dimensional data sets; higher dimensional spaces; incremental browsing algorithm; index-based methods; information ranking; k-NN search; maximal marginal relevance; multidimensional data; natural neighbor-based method; object mindivdist; priority queue; spatial data sets; spatial index structures; topic query; Diversity methods; Information retrieval; Nearest neighbor searches; Query processing; Search methods; Search problems; Spatial databases; Diversity; Gabriel graph; angular similarity; diverse nearest neighbor search; natural neighbors;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2011.251
Filename :
6095558
Link To Document :
بازگشت