DocumentCode :
1776923
Title :
A density based clustering approach for web robot detection
Author :
Zabihi, Mahdieh ; Jahan, Majid Vafaei ; Hamidzadeh, Javad
Author_Institution :
Comput. Eng., Imam Reza Int. Univ., Mashhad, Iran
fYear :
2014
fDate :
29-30 Oct. 2014
Firstpage :
23
Lastpage :
28
Abstract :
Distinction between humans and Web robots, in terms of computer network security, has led to the robot detection problem. An exact solution for this issue can preserve Web sites from the intrusion of malicious robots and increase the performance of Web servers by prioritizing human users. In this article, we propose a density based method called DBC_WRD (Density Based Clustering for Web Robot Detection) to discover the traffic of Web robots on two large real data sets. So, we assume the visitors as the spatial instances and introduce two new features to describe and distinguish them. These attributes are based on the behavioral patterns of Web visitors and remain invariant over time. By focusing on one of the disadvantages of DBSCAN as the density based clustering algorithm used in this paper, we just utilize 4 features to reduce the dimensions. According to the supervised evaluations, DBC_WRD can have the 96% of Jaccard metric and produce two clusters which have the entropy and purity rates of 0.0215 and 0.97, respectively. Furthermore, the comparisons show that from the standpoint of clustering quality and accuracy, DBC_WRD performs better than state-of-the-art algorithms. Finally, it can be concluded that some non-malicious popular Web robots, through imitating the human´s behavior, make it difficult to be identified.
Keywords :
computer network security; data mining; entropy; pattern clustering; telecommunication traffic; DBC_WRD; DBSCAN; Jaccard metric; Web robot traffic; Web server performance enhancement; Web sites; Web visitor behavioral patterns; clustering accuracy; clustering quality; computer network security; density-based clustering-for-Web robot detection; dimension reduction; entropy rate; human behavior imitation; human user prioritization; large-real data sets; malicious robot intrusion; nonmalicious Web robots; purity rate; spatial instances; supervised evaluations; Browsers; Clustering algorithms; Feature extraction; Labeling; Measurement; Robots; Web servers; DBSCAN; behavioral patterns of web visitors; data mining; density based Clustering; web robot detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Knowledge Engineering (ICCKE), 2014 4th International eConference on
Conference_Location :
Mashhad
Print_ISBN :
978-1-4799-5486-5
Type :
conf
DOI :
10.1109/ICCKE.2014.6993362
Filename :
6993362
Link To Document :
بازگشت