Title of article :
A Novel Architecture for Domain Specific Parallel Crawler
Author/Authors :
Nidhi Tyagi & Deepti Gupta، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2010
Abstract :
The World Wide Web is an interlinked collection of billions of documents formatted using HTML. Due to the growing anddynamic nature of the web, it has become a challenge to traverse all URLs in the web documents and handle these URLs, so ithas become imperative to parallelize a crawling process. The crawler process is further being parallelized in the form ecology ofcrawler workers that parallely download information from the web. This paper proposes a novel architecture of parallel crawler, which is based on domain specific crawling, makes crawling task more effective, scalable and load-sharing among the differentcrawlers which parallel download web pages related to different domains specific URLs
Keywords :
WWW , URLs , crawling process , parallel crawlers
Journal title :
Indian Journal of Computer Science and Engineering
Journal title :
Indian Journal of Computer Science and Engineering