• DocumentCode
    249516
  • Title

    Towards Efficient KNN Joins on Data Streams

  • Author

    Chong Yang ; Xiaohui Yu ; Yang Liu

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan, China
  • fYear
    2014
  • fDate
    June 27 2014-July 2 2014
  • Firstpage
    782
  • Lastpage
    783
  • Abstract
    We study the problem of efficient processing of kNN joins over high-dimensional data streams, which is an operation required by many big data applications. Specifically, we are concerned with the continuous evaluation of a set of k nearest neighbor queries Q on streams of high-dimensional items at consecutive snapshots of those streams. While one possible solution is to evaluate the kNN joins starting from scratch at each snapshot, it is too expensive for large volumes of data we encounter in big data applications. We consider the data stream on a time window and maintain the join results for Q at every snapshot in main memory. Our approach to this problem is to build indexes on Q, and only update the results of the queries affected by the changes in the streams at each snapshot. We propose a main-memory structure called the High-dimensional R-tree (HDR-tree) to index the queries, which is efficient in finding affected queries with reasonable maintenance cost. HDR-tree takes advantage of the benefit of clustering and the principle component analysis (PCA) technique. Preliminary experimental results show that our index structures significantly outperform baseline methods.
  • Keywords
    Big Data; indexing; pattern clustering; principal component analysis; query processing; tree data structures; HDR-tree; PCA technique; big data applications; clustering; high-dimensional R-tree; high-dimensional data streams; index structures; k nearest neighbor queries; kNN joins processing; main-memory structure; maintenance cost; principle component analysis; time window; Algorithm design and analysis; Big data; Clustering algorithms; Educational institutions; Indexes; Maintenance engineering; Principal component analysis; data stream; high dimensional data; k nearest neighbor join;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2014 IEEE International Congress on
  • Conference_Location
    Anchorage, AK
  • Print_ISBN
    978-1-4799-5056-0
  • Type

    conf

  • DOI
    10.1109/BigData.Congress.2014.121
  • Filename
    6906865