Title :
Offensive and defensive strategy of web crawler
Author :
Jiang Yuanshu ; Tang Wenzhong ; Guo Liyong
Author_Institution :
Key Lab. of Beijing Network Technol., Beihang Univ., Beijing, China
Abstract :
Crawling strategies of web crawler affect not only the quality of search engine, but also the working status of web server. Many web servers restrict the access of unknown crawler or the crawler with excessive visiting frequency. This paper analyzes these restrictions and proposes a strategy of proxy-based, login by simulating verification code automatically; give some guidance on the design of web crawler.
Keywords :
Internet; online front-ends; query processing; search engines; Web crawler; Web server; crawling strategy; defensive strategy; offensive strategy; search engine; verification code; Browsers; Crawlers; IP networks; Search engines; Time frequency analysis; Web servers; proxy server; recognition of verification code; web crawler;
Conference_Titel :
Intelligent Control and Automation (WCICA), 2012 10th World Congress on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-1397-1
DOI :
10.1109/WCICA.2012.6357898