Invited Abstract: Ricardo Baez-Yates

Author

Baeza-Yates, R.

Author_Institution

Yahoo! Labs., Barcelona, Spain

fYear

2013

fDate

8-10 July 2013

Abstract

In the dynamic ocean of web data, where we have over 200 million websites, web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters, where easily more than 100 billion web pages are indexed. On the other hand, Internet users are above two billion and hundreds of million of queries are issued each day. In the near future, centralized systems are likely to become less effective against such a data-query load, thus suggesting the need of fully distributed search engines. Such engines need to maintain high quality answers, fast response time, high query throughput, high availability and scalability; in spite of network latency and scattered data. In this talk we present the main challenges behind the design of a distributed web retrieval system and our research in all the components of a search engine: crawling, indexing, and query processing.

Keywords

indexing; information retrieval systems; query processing; search engines; Web data; Web search engine; crawling component; data-query load; distributed search engine; distributed web retrieval system; indexing component; query processing component; replicated clusters;

fLanguage

English

Publisher

ieee

Conference_Titel

Application of Concurrency to System Design (ACSD), 2013 13th International Conference on

Conference_Location

Barcelona

Type

conf

DOI

10.1109/ACSD.2013.38

Filename

6598332