• DocumentCode
    1801141
  • Title

    Real-time web crawler detection

  • Author

    Balla, Andoena ; Stassopoulou, Athena ; Dikaiakos, Marios D.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Cyprus, Nicosia, Cyprus
  • fYear
    2011
  • fDate
    8-11 May 2011
  • Firstpage
    428
  • Lastpage
    432
  • Abstract
    In this paper we present a methodology for detecting web crawlers in real time. We use decision trees to classify requests in real time, as originating from a crawler or human, while their session is ongoing. For this purpose we used machine learning techniques to identify the most important features that differentiate humans from crawlers. The method was tested in real time with the help of an emulator, using only a small number of requests. Our results demonstrate the effectiveness and applicability of our approach.
  • Keywords
    Internet; decision trees; learning (artificial intelligence); search engines; Web crawler detection; decision tree; machine learning technique; Crawlers; Feature extraction; Humans; IP networks; Measurement; Real time systems; Robots;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Telecommunications (ICT), 2011 18th International Conference on
  • Conference_Location
    Ayia Napa
  • Print_ISBN
    978-1-4577-0025-5
  • Type

    conf

  • DOI
    10.1109/CTS.2011.5898963
  • Filename
    5898963