DocumentCode :
590178
Title :
Defense response of search engine websites to non cooperating crawlers
Author :
Dev Chandna, Rishabh ; Chaubey, P. ; Gupta, S.C.
Author_Institution :
Indian Inst. of Technol. (BHU), Varanasi, India
fYear :
2012
fDate :
Oct. 30 2012-Nov. 2 2012
Firstpage :
219
Lastpage :
223
Abstract :
Robots.txt non cooperating web crawlers are unwanted by any website as they can create serious negative impact in terms of denial of service, privacy and cost. Defense mechanisms such as automated content access protocol, captcha, web crawler trap, real time bot detection etc. have been proposed to protect websites from unwanted crawler access. Although, the extent of these mechanisms being practically applied against such crawlers is not known clearly. In this paper we present an investigation carried out to get insights about defense mechanisms used by websites against robots.txt non cooperating web crawlers. This investigation is limited only to search engine class of websites. MBot, a self-developed non cooperating web crawler is the primary tool used for investigation. On investigation we find that search engine websites do have defense mechanisms to prevent non cooperating crawler access on them. Although, absence of any kind of defense phenomena to prevent MBot´s access is also observed on some of the investigated websites. Robustness in observed defense mechanisms to basic network and application parameters like proxy, port number, user agent, IP address etc. is also observed.
Keywords :
Web sites; data privacy; information retrieval; search engines; MBot; Robots.txt noncooperating Web crawlers; Web crawler trap; automated content access protocol; captcha; defense mechanisms; defense response; real time bot detection; search engine Websites; self-developed non cooperating Web crawler; Communications technology; Decision support systems; Helium; defense mechanism; robots exclusion protocol; robots.txt; web crawler; website;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technologies (WICT), 2012 World Congress on
Conference_Location :
Trivandrum
Print_ISBN :
978-1-4673-4806-5
Type :
conf
DOI :
10.1109/WICT.2012.6409078
Filename :
6409078
Link To Document :
بازگشت