DocumentCode :
633704
Title :
Invited Abstract: Ricardo Baez-Yates
Author :
Baeza-Yates, R.
Author_Institution :
Yahoo! Labs., Barcelona, Spain
fYear :
2013
fDate :
8-10 July 2013
Abstract :
In the dynamic ocean of web data, where we have over 200 million websites, web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters, where easily more than 100 billion web pages are indexed. On the other hand, Internet users are above two billion and hundreds of million of queries are issued each day. In the near future, centralized systems are likely to become less effective against such a data-query load, thus suggesting the need of fully distributed search engines. Such engines need to maintain high quality answers, fast response time, high query throughput, high availability and scalability; in spite of network latency and scattered data. In this talk we present the main challenges behind the design of a distributed web retrieval system and our research in all the components of a search engine: crawling, indexing, and query processing.
Keywords :
indexing; information retrieval systems; query processing; search engines; Web data; Web search engine; crawling component; data-query load; distributed search engine; distributed web retrieval system; indexing component; query processing component; replicated clusters;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Application of Concurrency to System Design (ACSD), 2013 13th International Conference on
Conference_Location :
Barcelona
Type :
conf
DOI :
10.1109/ACSD.2013.38
Filename :
6598332
Link To Document :
بازگشت