• DocumentCode
    2719498
  • Title

    Diversified caching for replicated web search engines

  • Author

    Chuanfei Xu ; Bo Tang ; Man Lung Yiu

  • Author_Institution
    Dept. of Comput., Hong Kong Polytech. Univ., Hong Kong, China
  • fYear
    2015
  • fDate
    13-17 April 2015
  • Firstpage
    207
  • Lastpage
    218
  • Abstract
    Commercial web search engines adopt parallel and replicated architecture in order to support high query throughput. In this paper, we investigate the effect of caching on the throughput in such a setting. A simple scheme, called uniform caching, would replicate the cache content to all servers. Unfortunately, it does not exploit the variations among queries, thus wasting memory space on caching the same cache content redundantly on multiple servers. To tackle this limitation, we propose a diversified caching problem, which aims to diversify the types of queries served by different servers, and maximize the sharing of terms among queries assigned to the same server. We show that it is NP-hard to find the optimal diversified caching scheme, and identify intuitive properties to seek good solutions. Then we present a framework with a suite of techniques and heuristics for diversified caching. Finally, we evaluate the proposed solution with competitors by using a real dataset and a real query log.
  • Keywords
    cache storage; query processing; search engines; NP-hard; optimal diversified caching scheme; parallel architecture; real query log; replicated Web search engines; replicated architecture; Computer architecture; Indexes; Search engines; Servers; Silicon; Throughput; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2015 IEEE 31st International Conference on
  • Conference_Location
    Seoul
  • Type

    conf

  • DOI
    10.1109/ICDE.2015.7113285
  • Filename
    7113285