• DocumentCode
    2900813
  • Title

    The Design and Implementation of the Crawler-Inar

  • Author

    Ding, Yu-xin ; Wang, Xiao-long ; Lin, Le-bin ; Zhang, Qi ; Wu, Yong-hui

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Harbin Inst. of Technol., Shenzhen
  • fYear
    2006
  • fDate
    13-16 Aug. 2006
  • Firstpage
    4527
  • Lastpage
    4530
  • Abstract
    This paper discusses the design and implementation of a Web crawler - Inar written in C++ executed on Linux. It is a single-threaded crawler base on asynchronous I/O technology, which is under development. This paper describes the architecture of the Web crawler and discusses the design and the function of its each component in detail. For some design problems that we met in practice, such as URL queues design, hash algorithm design, we proposed our solution
  • Keywords
    C++ language; Internet; Linux; search engines; C++; Linux; URL queue design; Web crawler-Inar; asynchronous I/O technology; hash algorithm design; search engine; single-threaded crawler; Algorithm design and analysis; Computer science; Crawlers; Cybernetics; HTML; Machine learning; Paper technology; Search engines; Service oriented architecture; Uniform resource locators; Web pages; Web server; Crawler; asynchronous I/O; single thread; web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2006 International Conference on
  • Conference_Location
    Dalian, China
  • Print_ISBN
    1-4244-0061-9
  • Type

    conf

  • DOI
    10.1109/ICMLC.2006.259171
  • Filename
    4028869