• DocumentCode
    2861896
  • Title

    Focused Crawling by Learning HMM from User´s Topic-specific Browsing

  • Author

    Liu, Hongyu ; Milios, Evangelos ; Janssen, Jeannette

  • Author_Institution
    Dalhousie University, Canada
  • fYear
    2004
  • fDate
    20-24 Sept. 2004
  • Firstpage
    732
  • Lastpage
    732
  • Abstract
    A focused crawler is designed to traverse the Web to gather documents on a specific topic. It is not an easy task to predict which links lead to good pages. In this paper, we present a new approach for prediction of the important links to relevant pages based on a learned user model. In particular, we first collect pages that a user visits during a learning session, where the user browses the Web and specifically marks which pages she is interested in. We then examine the semantic content of these pages to construct a concept graph, which is used to learn the dominant content and link structure leading to target pages using a Hidden Markov Model (HMM). Experiments show that with learned HMM from a user´s browsing, the crawling performs better than Best-First strategy.
  • Keywords
    Computer science; Crawlers; Hidden Markov models; Information retrieval; Mathematics; Predictive models; Search engines; Statistics; Web pages; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2100-2
  • Type

    conf

  • DOI
    10.1109/WI.2004.10057
  • Filename
    1410908