DocumentCode :
2124378
Title :
A Memory Efficient Approach for Crawling Language Specific Web: The Arabic Web as a Case Study
Author :
Ezzat, D. ; Abdeen, M. ; Tolba, M.F.
Author_Institution :
Fac. of Comput. & Inf. & Sci., Ain-Shams Univ., Cairo
fYear :
2009
fDate :
3-5 April 2009
Firstpage :
584
Lastpage :
587
Abstract :
Web crawlers represent a significant component in Web search engines. They are responsible for making a local copy of Web pages and keeping this local copy up-to-date by periodically refreshing these pages. The decision to refresh a Web page is a tradeoff between the resource utilization and the freshness of the page content. There are various policies as to when to perform a page refresh. A major factor that determines the refresh policy is the change rate of a Web page. In this paper we address the problem of page refresh for the Arabic Web. We present a novel approach that improves the re-crawl scheduling. The proposed technique modifies the information longevity approach to be more suitable for Arabic Web pages. This is done by extracting the Arabic content, and excluding stop list and redundancies that might not contribute significantly to the meaning. This technique saves the scarce memory space in a semantic Arabic Web search engine.
Keywords :
natural languages; resource allocation; scheduling; search engines; semantic Web; storage management; Arabic Web crawling language; Web page; memory space; resource utilization; scheduling; semantic Web search engine; Crawlers; Curve fitting; Data mining; History; Information management; Memory management; Resource management; Search engines; Web pages; Web search; Arabic search engines; Refreshing web pages; Web crawling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Management and Engineering, 2009. ICIME '09. International Conference on
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-0-7695-3595-1
Type :
conf
DOI :
10.1109/ICIME.2009.105
Filename :
5077102
Link To Document :
بازگشت