DocumentCode
424093
Title
An efficient Web traversal pattern mining algorithm based on suffix array
Author
Jing, Tao ; Zuo, Wan-Li ; Zhang, Bang-Zuo
Author_Institution
Coll. of Comput. Sci. & Technol., Jilin Univ., Changchun, China
Volume
3
fYear
2004
fDate
26-29 Aug. 2004
Firstpage
1535
Abstract
Traversal patterns reflect regularities of Web users browsing and selecting Web pages along URL hyper-links. The discovery of traversal patterns is quite useful for improving the Web site design, offering clients personalized service, carrying on e-commercial activities, constructing intelligent Web sites and so on. The discovery of user traversal patterns comprises the following three steps: 1) extracting maximal forward reference paths from server log; 2) discovering frequent reference paths based on the result of the first step; and 3) filtering to get maximal frequent reference paths from the output of the second step. The second step constitutes the core of the whole mining process. Essentially, a Web access pattern is a sequential pattern in a large set of pieces of Web logs, which is pursued frequently by users. Although some attempts have been made to mine traversal patterns from Web logs, most of the research efforts try to employ techniques of sequential pattern mining, which is based on a generate-and-test paradigm, involving multi-scan of the entire dataset. This paper presents a novel approach based on suffix array for frequent reference path generation. Experimental results on both synthetic and real-life data sets show the effectiveness of the novel algorithm.
Keywords
Internet; Web design; data mining; data structures; information filtering; online front-ends; URL hyperlinks; Web logs; Web pages selection; Web site design; Web traversal pattern mining algorithm; Web users browsing; e-commercial activities; information filtering; intelligent Web sites construction; maximal forward reference paths; maximal frequent reference path generation; real life data sets; sequential pattern mining; server log; suffix array; synthetic data sets; Computer science; Data mining; Data structures; Educational institutions; Lattices; Terminology; Uniform resource locators; Web design; Web mining; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
Print_ISBN
0-7803-8403-2
Type
conf
DOI
10.1109/ICMLC.2004.1382017
Filename
1382017
Link To Document