Title :
Detecting Impolite Crawler by Using Time Series Analysis
Author :
Zhiqian Chen ; Wenya Feng
Author_Institution :
Dept. of Software Eng., Peking Univ., Beijing, China
Abstract :
Numerous web crawlers especially impolite crawlers visit websites to get contents every day, which yields higher access frequency than the websites can hold. The big traffic of impolite crawlers causes a strong hazard on analysis of normal users and advertisement income. In this paper, we present a method to detect impolite crawlers by using time series analysis. This method is applied to real data of web server logs. Compared with the old methods only using common log attributes as features, the method using time series features improves detection accuracy by at least 20%.
Keywords :
Web sites; advertising; indexing; time series; Web crawlers; Web server logs; Websites; advertisement income; common log attributes; detection accuracy; impolite crawler detection; time series analysis; Accuracy; Crawlers; Data mining; Feature extraction; Machine learning algorithms; Predictive models; Time series analysis; data mining; impolite crawlers; time series; web analysis; web server log;
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on
Conference_Location :
Herndon, VA
Print_ISBN :
978-1-4799-2971-9
DOI :
10.1109/ICTAI.2013.28