DocumentCode
2900813
Title
The Design and Implementation of the Crawler-Inar
Author
Ding, Yu-xin ; Wang, Xiao-long ; Lin, Le-bin ; Zhang, Qi ; Wu, Yong-hui
Author_Institution
Dept. of Comput. Sci. & Technol., Harbin Inst. of Technol., Shenzhen
fYear
2006
fDate
13-16 Aug. 2006
Firstpage
4527
Lastpage
4530
Abstract
This paper discusses the design and implementation of a Web crawler - Inar written in C++ executed on Linux. It is a single-threaded crawler base on asynchronous I/O technology, which is under development. This paper describes the architecture of the Web crawler and discusses the design and the function of its each component in detail. For some design problems that we met in practice, such as URL queues design, hash algorithm design, we proposed our solution
Keywords
C++ language; Internet; Linux; search engines; C++; Linux; URL queue design; Web crawler-Inar; asynchronous I/O technology; hash algorithm design; search engine; single-threaded crawler; Algorithm design and analysis; Computer science; Crawlers; Cybernetics; HTML; Machine learning; Paper technology; Search engines; Service oriented architecture; Uniform resource locators; Web pages; Web server; Crawler; asynchronous I/O; single thread; web;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2006 International Conference on
Conference_Location
Dalian, China
Print_ISBN
1-4244-0061-9
Type
conf
DOI
10.1109/ICMLC.2006.259171
Filename
4028869
Link To Document