DocumentCode
1801141
Title
Real-time web crawler detection
Author
Balla, Andoena ; Stassopoulou, Athena ; Dikaiakos, Marios D.
Author_Institution
Dept. of Comput. Sci., Univ. of Cyprus, Nicosia, Cyprus
fYear
2011
fDate
8-11 May 2011
Firstpage
428
Lastpage
432
Abstract
In this paper we present a methodology for detecting web crawlers in real time. We use decision trees to classify requests in real time, as originating from a crawler or human, while their session is ongoing. For this purpose we used machine learning techniques to identify the most important features that differentiate humans from crawlers. The method was tested in real time with the help of an emulator, using only a small number of requests. Our results demonstrate the effectiveness and applicability of our approach.
Keywords
Internet; decision trees; learning (artificial intelligence); search engines; Web crawler detection; decision tree; machine learning technique; Crawlers; Feature extraction; Humans; IP networks; Measurement; Real time systems; Robots;
fLanguage
English
Publisher
ieee
Conference_Titel
Telecommunications (ICT), 2011 18th International Conference on
Conference_Location
Ayia Napa
Print_ISBN
978-1-4577-0025-5
Type
conf
DOI
10.1109/CTS.2011.5898963
Filename
5898963
Link To Document