DocumentCode :
2291838
Title :
A probabilistic model for intelligent Web crawlers
Author :
Hu, Ke ; Wong, Wing Shing
Author_Institution :
Dept. of Inf. Eng., Chinese Univ. of Hong Kong, Shatin, China
fYear :
2003
fDate :
3-6 Nov. 2003
Firstpage :
278
Lastpage :
282
Abstract :
With the enormous growth of the World Wide Web in recent years, the issue of how to discover Web pages efficiently has become an important challenge for Web crawler designers. In this paper, we will outline a simple model to predict the distribution of the search depth in a breadth-first search to reach the first Web pages relevant to a user query. We define this probability as the crawler confidence. Recent studies by Y. Deshpande and S. Hansen (2001) indicate that at a large scale the Web structure subscribes to power law distribution on several aspects. However, our work tries to model a microscopic linkage structure of the Web from an intelligent crawler´s point of view. With the information provided by crawler confidence, an intelligent crawler can adjust its crawling behavior to achieve a higher harvest rate.
Keywords :
Internet; Web design; data mining; probability; search engines; Web crawler design; Web crawlers; Web pages; Web structure; World Wide Web; breadth-first search; intelligent crawler; microscopic linkage structure; power law distribution; probabilistic model; search engines; user querying; Couplings; Crawlers; Databases; Design engineering; Indexing; Intelligent robots; Intelligent structures; Search engines; Web pages; World Wide Web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Software and Applications Conference, 2003. COMPSAC 2003. Proceedings. 27th Annual International
ISSN :
0730-3157
Print_ISBN :
0-7695-2020-0
Type :
conf
DOI :
10.1109/CMPSAC.2003.1245354
Filename :
1245354
Link To Document :
بازگشت