Title of article :
TST: Threshold Based Similarity Transitivity Method in Collaborative Filtering with Cloud Computing
Author/Authors :
Xie, Feng Tsinghua University - Research Institute of Information Technology - Department ofAutomation, Tsinghua National Laboratory for Information Scienceand Technology (TNList), China , Chen, Zhen Tsinghua University - Research Institute of InformationTechnology - Tsinghua National Laboratory for InformationScience and Technology (TNList), China , Xu, Hongfeng Tsinghua University - Department of Computer Scienceand Technologies and Tsinghua National Laboratoryfor Information Science and Technology (TNList), China , Feng, Xiwei Tsinghua University - Research Institute of Information Technology - Department ofAutomation, Tsinghua National Laboratory for Information Scienceand Technology (TNList), China , Hou, Qi Tsinghua University - Department of Electronic Engineering and Tsinghua National Laboratory for Information Science andTechnology (TNList), China
Abstract :
Collaborative filtering solves information overload problem by presenting personalized content toindividual users based on their interests, which has been extensively applied in real-world recommender systems. As a class of simple but efficient collaborative filtering method, similarity based approaches make predictions by finding users with similar taste or items that have been similarly chosen. However, as the number of users or items grows rapidly, the traditional approach is suffering from the data sparsity problem. Inaccurate similarities derived from the sparse user-item associations would generate the inaccurate neighborhood for each user or item. Consequently, its poor recommendation drives us to propose a Threshold based Similarity Transitivity (TST) method in this paper. TST firstly filters out those inaccurate similarities by setting an intersection threshold and then replaces them with the transitivity similarity. Besides, the TST method is designed to be scalable with MapReduce framework based on cloud computing platform. We evaluate our algorithm on the public data set MovieLens and a real-world data set from AppChina (an Android application market) with several well-known metrics including precision, recall, coverage, and popularity. The experimental results demonstrate that TST copes well with the tradeoff between quality and quantity of similarity by setting an appropriate threshold. Moreover, we can experimentally find the optimal threshold which will be smaller as the data set becomes sparser. The experimental results also show that TST significantly outperforms the traditional approach even when the data becomes sparser.
Keywords :
cloud computing , recommender systems , big data , collaborative filtering , data mining , similaritytransitivity , machine learning , mapReduce , android applications
Journal title :
Tsinghua Science and Technology
Journal title :
Tsinghua Science and Technology