Title of article :
AN ENHANCED PERFORMANCE DISTRIBUTED ARABIC WEB CRAWLER
Author/Authors :
Abdeen, M. A. Ain-Shams University - faculty of Computer and Information Sciences, Egypt , Ezzat, D. Ain-Shams University - faculty of Computer and Information Sciences, Egypt , Tolba, M. F. Ain Shams University - faculty of Computer and Information Sciences, Egypt
From page :
1
To page :
10
Abstract :
Web crawlers represent a significant component in web search engines. They are responsible for making a local copy of web pages and keeping this local copy up-to-date by periodically refreshing these pages. There are different policies used for determining when this refresh is performed. A major factor that determines the refresh policy is the change rate of the contents of a web page. For a language specific web crawler, this change rate depends only on the content written with this language. In a previous work we showed that by applying some morphological analysis techniques we were able to reduce memory requirements for the web crawler by 90% (the Arabic web is considered). In this paper we present a performance enhancement of the web crawler developed in our previous work. This enhancement is achieved by a parallel crawler scheme. We also present a performance study of this parallel crawler and show the optimal number of processors for a given crawl configuration. The Arabic web is presented as an example but the techniques are applicable to other language specific web.
Keywords :
information retrieval , Search Engines , Web Crawlers
Journal title :
International Journal of Intelligent Computing and Information Sciences
Journal title :
International Journal of Intelligent Computing and Information Sciences
Record number :
2570574
Link To Document :
بازگشت