DocumentCode :
2861896
Title :
Focused Crawling by Learning HMM from User´s Topic-specific Browsing
Author :
Liu, Hongyu ; Milios, Evangelos ; Janssen, Jeannette
Author_Institution :
Dalhousie University, Canada
fYear :
2004
fDate :
20-24 Sept. 2004
Firstpage :
732
Lastpage :
732
Abstract :
A focused crawler is designed to traverse the Web to gather documents on a specific topic. It is not an easy task to predict which links lead to good pages. In this paper, we present a new approach for prediction of the important links to relevant pages based on a learned user model. In particular, we first collect pages that a user visits during a learning session, where the user browses the Web and specifically marks which pages she is interested in. We then examine the semantic content of these pages to construct a concept graph, which is used to learn the dominant content and link structure leading to target pages using a Hidden Markov Model (HMM). Experiments show that with learned HMM from a user´s browsing, the crawling performs better than Best-First strategy.
Keywords :
Computer science; Crawlers; Hidden Markov models; Information retrieval; Mathematics; Predictive models; Search engines; Statistics; Web pages; World Wide Web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2100-2
Type :
conf
DOI :
10.1109/WI.2004.10057
Filename :
1410908
Link To Document :
بازگشت