• DocumentCode
    475874
  • Title

    A Hierarchical Cache Scheme for the Large-scale Web Search Engine

  • Author

    Lim, Sungchae ; Ahn, Joonseon

  • Author_Institution
    Dongduk Women´´s Univ., Seoul
  • fYear
    2008
  • fDate
    6-8 Aug. 2008
  • Firstpage
    925
  • Lastpage
    930
  • Abstract
    Over the past decade, much research has been done to solve technical challenges regarding the Web search engine, such as crawling Web documents, high performance indexes, and ranking systems using hyperlink analysis. However, implementation details of its query processing system are rarely dealt with in the literature. In this paper we present a distributed architecture for the query processing system and its hierarchal cache scheme. Our paper is based on the development experience of a commercial Web search engine designed to answer 5 million user queries against over 6.5 million Web pages per day. Using the hierarchal cache scheme, we keep a portion of query results in multi-level caches so that excessive I/O or CPU time is not used for query processing. With that scheme, it is possible to reduce around 70% of the server costs.
  • Keywords
    Internet; cache storage; query processing; search engines; Web document crawling; distributed architecture; hierarchical cache scheme; hyperlink analysis; large-scale Web search engine; query processing system; ranking system; Costs; Internet; Large-scale systems; Performance analysis; Performance evaluation; Query processing; Search engines; Uniform resource locators; Web search; Web server; large-scale cache; searche engine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2008. SNPD '08. Ninth ACIS International Conference on
  • Conference_Location
    Phuket
  • Print_ISBN
    978-0-7695-3263-9
  • Type

    conf

  • DOI
    10.1109/SNPD.2008.107
  • Filename
    4617487