Title :
An improved referrer-based session identification algorithm using MapReduce
Author :
Peng Huang ; Dehua Chen ; Jiajin Le
Author_Institution :
Sch. of Comput. Sci. & Technol., Donghua Univ., Shanghai, China
Abstract :
Session identification is an important process in web log mining for predictive prefetching of users´ next request based on their navigation behavior. However, there are mainly two challenges towards this problem: one is how to effectively deal with the huge dataset and the other is how to accurately identify user´s session boundaries. To meet the challenges, we proposed a novel session identification algorithm which combines the time based algorithm with the referrer based algorithm and implemented it in the popular MapReduce framework on Hadoop platform to achieve higher performance. Experimental evidence using real-world data reveals that, compared to the traditional session identification methods, the algorithm we proposed is more effective and can identify more long sessions which makes it achieve a higher accuracy.
Keywords :
Internet; data mining; distributed processing; Hadoop platform; MapReduce; Web log mining; improved referrer-based session identification algorithm; navigation behavior; referrer based algorithm; time based algorithm; Accuracy; Algorithm design and analysis; Clustering algorithms; Computers; Data mining; Data preprocessing; Educational institutions; Data preprocessing; Hadoop; MapReduce; Session identification; Web log mining;
Conference_Titel :
Natural Computation (ICNC), 2013 Ninth International Conference on
Conference_Location :
Shenyang
DOI :
10.1109/ICNC.2013.6818136