DocumentCode :
51054
Title :
Indexing Earth Mover’s Distance over Network Metrics
Author :
Ting Wang ; Shicong Meng ; Jiang Bian
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
27
Issue :
6
fYear :
2015
fDate :
June 1 2015
Firstpage :
1588
Lastpage :
1601
Abstract :
The Earth Mover´s Distance (EMD) is a well-known distance metric for data represented as probability distributions over a predefined feature space. Supporting EMD-based similarity search has attracted intensive research effort. Despite the plethora of literature, most existing solutions are optimized for Lp feature spaces (e.g., Euclidean space); while in a spectrum of applications, the relationships between features are better captured using networks. In this paper, we study the problem of answering k-nearest neighbor (k-NN) queries under network-based EMD metrics (NEMD). We propose OASIS, a new access method which leverages the network structure of feature space and enables efficient NEMD-based similarity search. Specifically, OASIS employs three novel techniques: (i) Range Oracle, a scalable model to estimate the range of k-th nearest neighbor under NEMD, (ii) Boundary Index, a structure that efficiently fetches candidates within given range, and (iii) Network Compression Hierarchy, an incremental filtering mechanism that effectively prunes false positive candidates to save unnecessary computation. Through extensive experiments using both synthetic and real data sets, we confirmed that OASIS significantly outperforms the state-of-the-art methods in query processing cost.
Keywords :
database indexing; learning (artificial intelligence); pattern classification; query processing; statistical distributions; NEMD-based similarity search; OASIS; access method; boundary index; distance metric; earth mover distance; feature space; incremental filtering mechanism; indexing; k-NN queries; k-nearest neighbor queries; network compression hierarchy; network structure; network-based EMD metrics; probability distributions; query processing cost; range oracle; Artificial neural networks; Earth; Extraterrestrial measurements; Indexing; Query processing; Delimit and Filter; Earth Mover’s Distance; Earth mover???s distance; Network Metrics; Similarity Search; delimit and filter; network metrics; similarity search;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2014.2373359
Filename :
6963483
Link To Document :
بازگشت