• DocumentCode
    2726026
  • Title

    The Design and Implementation of a Topic-Driven Crawler

  • Author

    Li, Qiong ; Jin, Tao ; Fu, Yuchen ; Liu, Quan ; Cui, Zhiming

  • fYear
    2007
  • fDate
    2-3 Dec. 2007
  • Firstpage
    153
  • Lastpage
    156
  • Abstract
    It is indispensable that the users surfing on the Internet could have web pages classified into a given topic as correct as possible. As a result, topic-driven crawlers are becoming important tools to support applications such as specialized web portals, online searching, and competitive intelligence. This paper presents a topic-driven crawler computing the degree of relevance and refining the preliminary set of related web pages using term frequency/document frequency, entropy, and compiled rules. This paper also gives a kind of comparatively ideal system architecture and the relationship of each module of a topic-driven crawler, and describes several modules on the details.
  • Keywords
    Application software; Competitive intelligence; Crawlers; Entropy; Frequency; Internet; Search engines; Sorting; Uniform resource locators; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Information Technology Application, Workshop on
  • Conference_Location
    Zhang Jiajie
  • Print_ISBN
    978-0-7695-3063-5
  • Type

    conf

  • DOI
    10.1109/IITA.2007.33
  • Filename
    4426987