• DocumentCode
    3230566
  • Title

    Board Forum Crawling: A Web Crawling Method for Web Forum

  • Author

    Guo, Yan ; Li, Kui ; Zhang, Kai ; Zhang, Gang

  • Author_Institution
    Software Div., Chinese Acad. of Sci., Beijing
  • fYear
    2006
  • fDate
    18-22 Dec. 2006
  • Firstpage
    745
  • Lastpage
    748
  • Abstract
    We present a new method of board forum crawling to crawl Web forum. This method exploits the organized characteristics of the Web forum sites and simulates human behavior of visiting Web forums. The method starts crawling from the homepage, and then enters each board of the site, and then crawls all the posts of the site directly. Board forum crawling can crawl most meaningful information of a Web forum site efficiently and simply. We experimentally evaluated the effectiveness of the method on real Web forum sites by comparing with the traditional breadth-first crawling. We also used this method in a real project, and 12000 Web forum sites have been crawled successfully. These results show the effectiveness of our method
  • Keywords
    Web sites; search engines; Web crawling method; Web forum sites; board forum crawling; human behavior simulation; Content addressable storage; Crawlers; Databases; Humans; Noise level; Robots; Uniform resource locators; Web page design; Web pages; Web server;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    0-7695-2747-7
  • Type

    conf

  • DOI
    10.1109/WI.2006.52
  • Filename
    4061464