Title :
A Hierarchical Cache Scheme for the Large-scale Web Search Engine
Author :
Lim, Sungchae ; Ahn, Joonseon
Author_Institution :
Dongduk Women´´s Univ., Seoul
Abstract :
Over the past decade, much research has been done to solve technical challenges regarding the Web search engine, such as crawling Web documents, high performance indexes, and ranking systems using hyperlink analysis. However, implementation details of its query processing system are rarely dealt with in the literature. In this paper we present a distributed architecture for the query processing system and its hierarchal cache scheme. Our paper is based on the development experience of a commercial Web search engine designed to answer 5 million user queries against over 6.5 million Web pages per day. Using the hierarchal cache scheme, we keep a portion of query results in multi-level caches so that excessive I/O or CPU time is not used for query processing. With that scheme, it is possible to reduce around 70% of the server costs.
Keywords :
Internet; cache storage; query processing; search engines; Web document crawling; distributed architecture; hierarchical cache scheme; hyperlink analysis; large-scale Web search engine; query processing system; ranking system; Costs; Internet; Large-scale systems; Performance analysis; Performance evaluation; Query processing; Search engines; Uniform resource locators; Web search; Web server; large-scale cache; searche engine;
Conference_Titel :
Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2008. SNPD '08. Ninth ACIS International Conference on
Conference_Location :
Phuket
Print_ISBN :
978-0-7695-3263-9
DOI :
10.1109/SNPD.2008.107