Title :
An efficient Web traversal pattern mining algorithm based on suffix array
Author :
Jing, Tao ; Zuo, Wan-Li ; Zhang, Bang-Zuo
Author_Institution :
Coll. of Comput. Sci. & Technol., Jilin Univ., Changchun, China
Abstract :
Traversal patterns reflect regularities of Web users browsing and selecting Web pages along URL hyper-links. The discovery of traversal patterns is quite useful for improving the Web site design, offering clients personalized service, carrying on e-commercial activities, constructing intelligent Web sites and so on. The discovery of user traversal patterns comprises the following three steps: 1) extracting maximal forward reference paths from server log; 2) discovering frequent reference paths based on the result of the first step; and 3) filtering to get maximal frequent reference paths from the output of the second step. The second step constitutes the core of the whole mining process. Essentially, a Web access pattern is a sequential pattern in a large set of pieces of Web logs, which is pursued frequently by users. Although some attempts have been made to mine traversal patterns from Web logs, most of the research efforts try to employ techniques of sequential pattern mining, which is based on a generate-and-test paradigm, involving multi-scan of the entire dataset. This paper presents a novel approach based on suffix array for frequent reference path generation. Experimental results on both synthetic and real-life data sets show the effectiveness of the novel algorithm.
Keywords :
Internet; Web design; data mining; data structures; information filtering; online front-ends; URL hyperlinks; Web logs; Web pages selection; Web site design; Web traversal pattern mining algorithm; Web users browsing; e-commercial activities; information filtering; intelligent Web sites construction; maximal forward reference paths; maximal frequent reference path generation; real life data sets; sequential pattern mining; server log; suffix array; synthetic data sets; Computer science; Data mining; Data structures; Educational institutions; Lattices; Terminology; Uniform resource locators; Web design; Web mining; Web pages;
Conference_Titel :
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
Print_ISBN :
0-7803-8403-2
DOI :
10.1109/ICMLC.2004.1382017