DocumentCode :
244999
Title :
Continuous KNN Join Processing for Real-Time Recommendation
Author :
Chong Yang ; Xiaohui Yu ; Yang Liu
Author_Institution :
Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan, China
fYear :
2014
fDate :
14-17 Dec. 2014
Firstpage :
640
Lastpage :
649
Abstract :
The explosive growth of user-generated contents in social networking websites necessitates the recommendation functionality that can push to the user the content that he/she is most likely to be interested in. Such recommendation should happen in real-time as new contents become available, because "freshness" is an important consideration in people\´s content-consumption behavior. Representing users and contents as feature vectors in a high-dimensional space, we can essentially cast the problem of real-time recommendations as the problem of computing the list of k nearest neighbors of each user, which we call kNN join. Given the vast volume of contents and users, the biggest challenge is how to continuously update the kNN join results as new contents arrive. Existing methods for incremental kNN join on data streams suffer from the "curse of dimensionality" and high in-memory search cost. In this paper, we present a solution that first identifies the users whose kNN\´s might be affected by the newly arrived content, and then update their kNN\´s respectively. We propose a new index structure named HDR-tree in order to support the efficient search of affected users. HDR-tree performs dimensionality reduction through clustering and principle component analysis (PCA) in order to improve the search effectiveness. To further reduce response time, we propose a variant of HDR-tree, called HDR-tree, that supports more efficient but approximate solutions. The results of extensive experiments show that our methods significantly outperform baseline methods.
Keywords :
data reduction; pattern clustering; principal component analysis; recommender systems; social networking (online); tree data structures; HDR-tree index structure; PCA; Web sites; clustering analysis; continuous KNN join processing; curse of dimensionality reduction; data streams; high in-memory search cost; k nearest neighbors; people content-consumption behavior; principle component analysis; real-time recommendation; social networking; user-generated contents; Clustering algorithms; Collaboration; Educational institutions; Indexes; Principal component analysis; Real-time systems; Vegetation; high-dimensional data; k nearest neighbor join; real-time recommendation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
ISSN :
1550-4786
Print_ISBN :
978-1-4799-4303-6
Type :
conf
DOI :
10.1109/ICDM.2014.20
Filename :
7023381
Link To Document :
بازگشت