Title :
Adaptive focused crawler based on tunneling and link analysis
Author :
Zhang, Xiaoming ; Li, Zhoujun ; Hu, Chaojian
Author_Institution :
Sch. of Comput. Sci. & Eng., Beihang Univ., Beijing
Abstract :
At present, using focused crawler becomes a way to seek the needed information. The main characteristic of a focused web crawler is to select and retrieve only relevant web pages in each crawling process. In this paper, we propose a learnable algorithm that combines link analysis with web content in order to retrieve specific web documents, and it can predict the next URL through learning. The algorithm also uses an adaptive tunneling to overcome some of the limitations of normal focused crawlers. We apply three metrics to compare its efficiency with other well-known Web crawling techniques based.
Keywords :
Internet; information retrieval; information retrieval systems; Web content; Web document retrieval; adaptive focused Web crawler; learnable algorithm; link analysis; tunneling analysis; Algorithm design and analysis; Chaos; Computer science; Content based retrieval; Crawlers; Information analysis; Testing; Tunneling; Uniform resource locators; Web pages;
Conference_Titel :
Advanced Communication Technology, 2009. ICACT 2009. 11th International Conference on
Print_ISBN :
978-89-5519-138-7
Electronic_ISBN :
1738-9445