• DocumentCode
    244999
  • Title

    Continuous KNN Join Processing for Real-Time Recommendation

  • Author

    Chong Yang ; Xiaohui Yu ; Yang Liu

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan, China
  • fYear
    2014
  • fDate
    14-17 Dec. 2014
  • Firstpage
    640
  • Lastpage
    649
  • Abstract
    The explosive growth of user-generated contents in social networking websites necessitates the recommendation functionality that can push to the user the content that he/she is most likely to be interested in. Such recommendation should happen in real-time as new contents become available, because "freshness" is an important consideration in people\´s content-consumption behavior. Representing users and contents as feature vectors in a high-dimensional space, we can essentially cast the problem of real-time recommendations as the problem of computing the list of k nearest neighbors of each user, which we call kNN join. Given the vast volume of contents and users, the biggest challenge is how to continuously update the kNN join results as new contents arrive. Existing methods for incremental kNN join on data streams suffer from the "curse of dimensionality" and high in-memory search cost. In this paper, we present a solution that first identifies the users whose kNN\´s might be affected by the newly arrived content, and then update their kNN\´s respectively. We propose a new index structure named HDR-tree in order to support the efficient search of affected users. HDR-tree performs dimensionality reduction through clustering and principle component analysis (PCA) in order to improve the search effectiveness. To further reduce response time, we propose a variant of HDR-tree, called HDR-tree, that supports more efficient but approximate solutions. The results of extensive experiments show that our methods significantly outperform baseline methods.
  • Keywords
    data reduction; pattern clustering; principal component analysis; recommender systems; social networking (online); tree data structures; HDR-tree index structure; PCA; Web sites; clustering analysis; continuous KNN join processing; curse of dimensionality reduction; data streams; high in-memory search cost; k nearest neighbors; people content-consumption behavior; principle component analysis; real-time recommendation; social networking; user-generated contents; Clustering algorithms; Collaboration; Educational institutions; Indexes; Principal component analysis; Real-time systems; Vegetation; high-dimensional data; k nearest neighbor join; real-time recommendation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4799-4303-6
  • Type

    conf

  • DOI
    10.1109/ICDM.2014.20
  • Filename
    7023381