DocumentCode
475874
Title
A Hierarchical Cache Scheme for the Large-scale Web Search Engine
Author
Lim, Sungchae ; Ahn, Joonseon
Author_Institution
Dongduk Women´´s Univ., Seoul
fYear
2008
fDate
6-8 Aug. 2008
Firstpage
925
Lastpage
930
Abstract
Over the past decade, much research has been done to solve technical challenges regarding the Web search engine, such as crawling Web documents, high performance indexes, and ranking systems using hyperlink analysis. However, implementation details of its query processing system are rarely dealt with in the literature. In this paper we present a distributed architecture for the query processing system and its hierarchal cache scheme. Our paper is based on the development experience of a commercial Web search engine designed to answer 5 million user queries against over 6.5 million Web pages per day. Using the hierarchal cache scheme, we keep a portion of query results in multi-level caches so that excessive I/O or CPU time is not used for query processing. With that scheme, it is possible to reduce around 70% of the server costs.
Keywords
Internet; cache storage; query processing; search engines; Web document crawling; distributed architecture; hierarchical cache scheme; hyperlink analysis; large-scale Web search engine; query processing system; ranking system; Costs; Internet; Large-scale systems; Performance analysis; Performance evaluation; Query processing; Search engines; Uniform resource locators; Web search; Web server; large-scale cache; searche engine;
fLanguage
English
Publisher
ieee
Conference_Titel
Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2008. SNPD '08. Ninth ACIS International Conference on
Conference_Location
Phuket
Print_ISBN
978-0-7695-3263-9
Type
conf
DOI
10.1109/SNPD.2008.107
Filename
4617487
Link To Document