DocumentCode :
2226220
Title :
Mining Web site´s clusters from link topology and site hierarchy
Author :
Cheung, Kwok-Wai ; Sun, Yuxiang
Author_Institution :
Dept. of Comput. Sci., Hong Kong Baptist Univ., Kowloon, Hong Kong
fYear :
2003
fDate :
13-17 Oct. 2003
Firstpage :
271
Lastpage :
277
Abstract :
Foraging information in large and complex Web sites simply using keyword search usually results in unpleasant experience due to the overloaded search results. To support more effective information search, some descriptive abstractions of the Web sites (e.g., sitemaps) are mostly needed. However, their creation and maintenance normally requires recurrent manual effort due to the fast-changing Web contents. We extend the HITS algorithm and integrate hyperlink topology and Web site hierarchy to identify a hierarchy of Web page clusters as the abstraction of a Web site. As the algorithm is based on HITS, each identified cluster follows the bipartite graph structure, with an authority and hub pair as the cluster summary. The effectiveness of the algorithm has been evaluated using three different Web sites (containing ∼6000-14000 Web pages) with promising results. Detailed interpretation of the experimental results as well as qualitative comparison with other related works are also included.
Keywords :
Web sites; data mining; graph theory; hypermedia; information retrieval; statistical analysis; HITS algorithm; Web content; Web page; Web site analysis; bipartite graph structure; cluster mining; hyperlink topology; information retrieval; keyword search; Algorithm design and analysis; Bipartite graph; Clustering algorithms; Computer science; Iterative algorithms; Keyword search; Search engines; Sun; Topology; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on
Print_ISBN :
0-7695-1932-6
Type :
conf
DOI :
10.1109/WI.2003.1241204
Filename :
1241204
Link To Document :
بازگشت