DocumentCode
1819982
Title
Using probabilistic latent semantic analysis for Web page grouping
Author
Xu, Guandong ; Zhang, Yanchun ; Zhou, Xiaofang
fYear
2005
fDate
3-4 April 2005
Firstpage
29
Lastpage
36
Abstract
The locality of Web pages within a Web site is initially determined by the designer\´s expectation. Web usage mining can discover the patterns in the navigational behaviour of Web visitors, in turn, improve Web site functionality and service designing by considering users\´ actual opinion. Conventional Web page clustering technique is often utilized to reveal the functional similarity of Web pages. However, high-dimensional computation problem will be incurred due to taking user transaction as dimension. In this paper, we propose a new Web page grouping approach based on a probabilistic latent semantic analysis (PLSA) model. An iterative algorithm based on maximum likelihood principle is employed to overcome the aforementioned computational shortcoming. The Web pages are classified into various groups according to user access patterns. Meanwhile, the semantic latent factors or tasks are characterized by extracting the content of "dominant" pages related to the factors. We demonstrate the effectiveness of our approach by conducting experiments on real world data sets.
Keywords
Internet; data mining; iterative methods; maximum likelihood estimation; pattern clustering; probability; Web page clustering; Web page grouping; Web site functionality; Web usage mining; iterative algorithm; maximum likelihood principle; navigational behaviour; probabilistic latent semantic analysis; real world data sets; user access patterns; Australia; Computer science; Data mining; Design engineering; Information technology; Mathematics; Navigation; Web mining; Web page design; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Research Issues in Data Engineering: Stream Data Mining and Applications, 2005. RIDE-SDMA 2005. 15th International Workshop on
ISSN
1097-8585
Print_ISBN
0-7695-2390-0
Type
conf
DOI
10.1109/RIDE.2005.16
Filename
1498228
Link To Document