• DocumentCode
    633704
  • Title

    Invited Abstract: Ricardo Baez-Yates

  • Author

    Baeza-Yates, R.

  • Author_Institution
    Yahoo! Labs., Barcelona, Spain
  • fYear
    2013
  • fDate
    8-10 July 2013
  • Abstract
    In the dynamic ocean of web data, where we have over 200 million websites, web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters, where easily more than 100 billion web pages are indexed. On the other hand, Internet users are above two billion and hundreds of million of queries are issued each day. In the near future, centralized systems are likely to become less effective against such a data-query load, thus suggesting the need of fully distributed search engines. Such engines need to maintain high quality answers, fast response time, high query throughput, high availability and scalability; in spite of network latency and scattered data. In this talk we present the main challenges behind the design of a distributed web retrieval system and our research in all the components of a search engine: crawling, indexing, and query processing.
  • Keywords
    indexing; information retrieval systems; query processing; search engines; Web data; Web search engine; crawling component; data-query load; distributed search engine; distributed web retrieval system; indexing component; query processing component; replicated clusters;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Application of Concurrency to System Design (ACSD), 2013 13th International Conference on
  • Conference_Location
    Barcelona
  • Type

    conf

  • DOI
    10.1109/ACSD.2013.38
  • Filename
    6598332