Author :
Zaier, Zied ; Godin, Robert ; Faucher, Luc
Abstract :
Recommender systems are considered as an answer to the information overload in a Web environment. Such systems recommend items (movies, music, books, news, web pages, etc.) that the user should be interested in. Collaborative filtering recommender systems have a huge success in commercial applications. The sales in these applications follow a power law distribution. However, with the increase of the number of recommendation techniques and algorithms in the literature, there is no indication that the datasets used for the evaluation follow a real world distribution. This paper introduces the long tail theory and its impact on recommender systems. It also provides a comprehensive review of the different datasets used to evaluate collaborative filtering recommender systems techniques and algorithms (EachMovie, MovieLens, Jester, BookCrossing, and Netflix). Finally, it investigates which of these datasets present a distribution that follows this power law distribution and which distribution would be the most relevant.
Keywords :
Internet; information filtering; collaborative filtering; power law distribution; recommendation techniques; recommender systems; Books; Collaboration; Filtering; Frequency; Information resources; Marketing and sales; Motion pictures; Probability distribution; Recommender systems; Web pages; Collaborative filtering; Neighbors Discrimination; Recommender system; dataset; evaluation;