DocumentCode
249516
Title
Towards Efficient KNN Joins on Data Streams
Author
Chong Yang ; Xiaohui Yu ; Yang Liu
Author_Institution
Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan, China
fYear
2014
fDate
June 27 2014-July 2 2014
Firstpage
782
Lastpage
783
Abstract
We study the problem of efficient processing of kNN joins over high-dimensional data streams, which is an operation required by many big data applications. Specifically, we are concerned with the continuous evaluation of a set of k nearest neighbor queries Q on streams of high-dimensional items at consecutive snapshots of those streams. While one possible solution is to evaluate the kNN joins starting from scratch at each snapshot, it is too expensive for large volumes of data we encounter in big data applications. We consider the data stream on a time window and maintain the join results for Q at every snapshot in main memory. Our approach to this problem is to build indexes on Q, and only update the results of the queries affected by the changes in the streams at each snapshot. We propose a main-memory structure called the High-dimensional R-tree (HDR-tree) to index the queries, which is efficient in finding affected queries with reasonable maintenance cost. HDR-tree takes advantage of the benefit of clustering and the principle component analysis (PCA) technique. Preliminary experimental results show that our index structures significantly outperform baseline methods.
Keywords
Big Data; indexing; pattern clustering; principal component analysis; query processing; tree data structures; HDR-tree; PCA technique; big data applications; clustering; high-dimensional R-tree; high-dimensional data streams; index structures; k nearest neighbor queries; kNN joins processing; main-memory structure; maintenance cost; principle component analysis; time window; Algorithm design and analysis; Big data; Clustering algorithms; Educational institutions; Indexes; Maintenance engineering; Principal component analysis; data stream; high dimensional data; k nearest neighbor join;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (BigData Congress), 2014 IEEE International Congress on
Conference_Location
Anchorage, AK
Print_ISBN
978-1-4799-5056-0
Type
conf
DOI
10.1109/BigData.Congress.2014.121
Filename
6906865
Link To Document