• DocumentCode
    3226871
  • Title

    Detecting Impolite Crawler by Using Time Series Analysis

  • Author

    Zhiqian Chen ; Wenya Feng

  • Author_Institution
    Dept. of Software Eng., Peking Univ., Beijing, China
  • fYear
    2013
  • fDate
    4-6 Nov. 2013
  • Firstpage
    123
  • Lastpage
    126
  • Abstract
    Numerous web crawlers especially impolite crawlers visit websites to get contents every day, which yields higher access frequency than the websites can hold. The big traffic of impolite crawlers causes a strong hazard on analysis of normal users and advertisement income. In this paper, we present a method to detect impolite crawlers by using time series analysis. This method is applied to real data of web server logs. Compared with the old methods only using common log attributes as features, the method using time series features improves detection accuracy by at least 20%.
  • Keywords
    Web sites; advertising; indexing; time series; Web crawlers; Web server logs; Websites; advertisement income; common log attributes; detection accuracy; impolite crawler detection; time series analysis; Accuracy; Crawlers; Data mining; Feature extraction; Machine learning algorithms; Predictive models; Time series analysis; data mining; impolite crawlers; time series; web analysis; web server log;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on
  • Conference_Location
    Herndon, VA
  • ISSN
    1082-3409
  • Print_ISBN
    978-1-4799-2971-9
  • Type

    conf

  • DOI
    10.1109/ICTAI.2013.28
  • Filename
    6735239