• DocumentCode
    2729545
  • Title

    Challenges on Distributed Web Retrieval

  • Author

    Baeza-Yates, R. ; Castillo, Claris ; Junqueira, Fabricio ; Plachouras, V. ; Silvestri, F.

  • Author_Institution
    Yahoo! Res. Barcelona, Spain
  • fYear
    2007
  • fDate
    15-20 April 2007
  • Firstpage
    6
  • Lastpage
    20
  • Abstract
    In the ocean of Web data, Web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters. Web data, however, is always evolving. The number of Web sites continues to grow rapidly and there are currently more than 20 billion indexed pages. In the near future, centralized systems are likely to become ineffective against such a load, thus suggesting the need of fully distributed search engines. Such engines need to achieve the following goals: high quality answers, fast response time, high query throughput, and scalability. In this paper we survey and organize recent research results, outlining the main challenges of designing a distributed Web retrieval system.
  • Keywords
    Internet; Web sites; information retrieval; search engines; Web data; Web search engine; Web sites; centralized system; distributed Web retrieval system; distributed search engine; query throughput; replicated cluster; Costs; Crawlers; Delay; Hardware; Oceans; Scalability; Search engines; Throughput; Web pages; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on
  • Conference_Location
    Istanbul
  • Print_ISBN
    1-4244-0802-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2007.367846
  • Filename
    4221649