Title :
A Large Scale Domains Resolving System
Author_Institution :
Sch. of Software Eng., Huazhong Univ. of Sci. & Technol., Wuhan, China
Abstract :
Efficient domain resolving is essential for large scale Web crawl. Based on batch processing and caching, data structures and algorithms are presented for maintaining domains and addresses in crawling, and their performances are analyzed mathematically. Large scale domain resolving system is designed with proposed data structure. The theoretical analysis and experiments show that the speed of several thousand links per second for billions of links or hundreds of millions hosts can be achieved on one common personal computer.
Keywords :
information retrieval; mathematical analysis; Web crawl; data structures; large scale domain resolving system; mathematical analysis; personal computer; Algorithm design and analysis; Crawlers; Data structures; Maintenance engineering; Merging; Random access memory; Web sites;
Conference_Titel :
Wireless Communications, Networking and Mobile Computing (WiCOM), 2011 7th International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-6250-6
DOI :
10.1109/wicom.2011.6040466