Title :
A probabilistic model for intelligent Web crawlers
Author :
Hu, Ke ; Wong, Wing Shing
Author_Institution :
Dept. of Inf. Eng., Chinese Univ. of Hong Kong, Shatin, China
Abstract :
With the enormous growth of the World Wide Web in recent years, the issue of how to discover Web pages efficiently has become an important challenge for Web crawler designers. In this paper, we will outline a simple model to predict the distribution of the search depth in a breadth-first search to reach the first Web pages relevant to a user query. We define this probability as the crawler confidence. Recent studies by Y. Deshpande and S. Hansen (2001) indicate that at a large scale the Web structure subscribes to power law distribution on several aspects. However, our work tries to model a microscopic linkage structure of the Web from an intelligent crawler´s point of view. With the information provided by crawler confidence, an intelligent crawler can adjust its crawling behavior to achieve a higher harvest rate.
Keywords :
Internet; Web design; data mining; probability; search engines; Web crawler design; Web crawlers; Web pages; Web structure; World Wide Web; breadth-first search; intelligent crawler; microscopic linkage structure; power law distribution; probabilistic model; search engines; user querying; Couplings; Crawlers; Databases; Design engineering; Indexing; Intelligent robots; Intelligent structures; Search engines; Web pages; World Wide Web;
Conference_Titel :
Computer Software and Applications Conference, 2003. COMPSAC 2003. Proceedings. 27th Annual International
Print_ISBN :
0-7695-2020-0
DOI :
10.1109/CMPSAC.2003.1245354