• DocumentCode
    1776923
  • Title

    A density based clustering approach for web robot detection

  • Author

    Zabihi, Mahdieh ; Jahan, Majid Vafaei ; Hamidzadeh, Javad

  • Author_Institution
    Comput. Eng., Imam Reza Int. Univ., Mashhad, Iran
  • fYear
    2014
  • fDate
    29-30 Oct. 2014
  • Firstpage
    23
  • Lastpage
    28
  • Abstract
    Distinction between humans and Web robots, in terms of computer network security, has led to the robot detection problem. An exact solution for this issue can preserve Web sites from the intrusion of malicious robots and increase the performance of Web servers by prioritizing human users. In this article, we propose a density based method called DBC_WRD (Density Based Clustering for Web Robot Detection) to discover the traffic of Web robots on two large real data sets. So, we assume the visitors as the spatial instances and introduce two new features to describe and distinguish them. These attributes are based on the behavioral patterns of Web visitors and remain invariant over time. By focusing on one of the disadvantages of DBSCAN as the density based clustering algorithm used in this paper, we just utilize 4 features to reduce the dimensions. According to the supervised evaluations, DBC_WRD can have the 96% of Jaccard metric and produce two clusters which have the entropy and purity rates of 0.0215 and 0.97, respectively. Furthermore, the comparisons show that from the standpoint of clustering quality and accuracy, DBC_WRD performs better than state-of-the-art algorithms. Finally, it can be concluded that some non-malicious popular Web robots, through imitating the human´s behavior, make it difficult to be identified.
  • Keywords
    computer network security; data mining; entropy; pattern clustering; telecommunication traffic; DBC_WRD; DBSCAN; Jaccard metric; Web robot traffic; Web server performance enhancement; Web sites; Web visitor behavioral patterns; clustering accuracy; clustering quality; computer network security; density-based clustering-for-Web robot detection; dimension reduction; entropy rate; human behavior imitation; human user prioritization; large-real data sets; malicious robot intrusion; nonmalicious Web robots; purity rate; spatial instances; supervised evaluations; Browsers; Clustering algorithms; Feature extraction; Labeling; Measurement; Robots; Web servers; DBSCAN; behavioral patterns of web visitors; data mining; density based Clustering; web robot detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Knowledge Engineering (ICCKE), 2014 4th International eConference on
  • Conference_Location
    Mashhad
  • Print_ISBN
    978-1-4799-5486-5
  • Type

    conf

  • DOI
    10.1109/ICCKE.2014.6993362
  • Filename
    6993362