• DocumentCode
    3362573
  • Title

    A Web Crawler Detection Algorithm Based on Web Page Member List

  • Author

    Guo, Weigang ; Zhong, Yong ; Xie, Jianqin

  • Author_Institution
    Sch. of Electron. & Inf. Eng., Foshan Univ. Foshan, Foshan, China
  • Volume
    1
  • fYear
    2012
  • fDate
    26-27 Aug. 2012
  • Firstpage
    189
  • Lastpage
    192
  • Abstract
    Following the widely use of search engines, the impact Web crawlers have on the Web sites should not be ignored. After analyzing the navigational patterns of Web crawlers from Web logs, a new algorithm based on Web page member list is proposed. The algorithm constructs one member list for every Web page and one show table for every visitor. The experiment shows that the new algorithm can detect the unknown crawlers and unfriendly crawlers who do not obey the Standard for Robot Exclusion.
  • Keywords
    Web sites; information retrieval; online front-ends; search engines; Web crawler detection algorithm; Web crawler navigational patterns; Web logs; Web page member list; Web search engines; Web sites; unfriendly crawler detection; unknown crawler detection; Browsers; Crawlers; HTML; IP networks; Servers; Web pages; Search engine; Web crawler detection; Web log; Web page member list;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2012 4th International Conference on
  • Conference_Location
    Nanchang, Jiangxi
  • Print_ISBN
    978-1-4673-1902-7
  • Type

    conf

  • DOI
    10.1109/IHMSC.2012.54
  • Filename
    6305658