DocumentCode
2726026
Title
The Design and Implementation of a Topic-Driven Crawler
Author
Li, Qiong ; Jin, Tao ; Fu, Yuchen ; Liu, Quan ; Cui, Zhiming
fYear
2007
fDate
2-3 Dec. 2007
Firstpage
153
Lastpage
156
Abstract
It is indispensable that the users surfing on the Internet could have web pages classified into a given topic as correct as possible. As a result, topic-driven crawlers are becoming important tools to support applications such as specialized web portals, online searching, and competitive intelligence. This paper presents a topic-driven crawler computing the degree of relevance and refining the preliminary set of related web pages using term frequency/document frequency, entropy, and compiled rules. This paper also gives a kind of comparatively ideal system architecture and the relationship of each module of a topic-driven crawler, and describes several modules on the details.
Keywords
Application software; Competitive intelligence; Crawlers; Entropy; Frequency; Internet; Search engines; Sorting; Uniform resource locators; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Information Technology Application, Workshop on
Conference_Location
Zhang Jiajie
Print_ISBN
978-0-7695-3063-5
Type
conf
DOI
10.1109/IITA.2007.33
Filename
4426987
Link To Document