DocumentCode
2536042
Title
Distributing a Metric-Space Search Index onto Processors
Author
Marin, Mauricio ; Ferrarotti, Flavio ; Gil-Costa, Veronica
Author_Institution
Yahoo! Res. Latin America, Santiago, Chile
fYear
2010
fDate
13-16 Sept. 2010
Firstpage
433
Lastpage
442
Abstract
This paper studies the problem of distributing a metric-space search index based on compact clustering onto a set of distributed memory processors. The aim is enabling efficient similarity search in large-scale Web search engines. The index data structure is composed of a set of clusters enclosing the database objects and we propose distribution methods based on two different solution approaches. The first one makes use of specific knowledge about the work-load generated by user queries. Here the challenge is how to represent and use such a knowledge into a method capable of producing a cluster distribution leading to high performance. The second one follows a novel direction by completely disregarding user behavior to look instead at the relationships among the index clusters themselves to decide their placement onto processors. Both methods perform efficiently depending on the context and they are generic enough to be applied to different distributed index data structures for metric-space databases.
Keywords
data structures; distributed memory systems; indexing; pattern clustering; query formulation; search engines; cluster distribution; compact clustering; distributed index data structures; distributed memory processor; large scale Web search engine; metric space search index; Clustering algorithms; Data structures; Indexing; Program processors; Search problems; Distributed Search; Metric Space Indexing;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing (ICPP), 2010 39th International Conference on
Conference_Location
San Diego, CA
ISSN
0190-3918
Print_ISBN
978-1-4244-7913-9
Electronic_ISBN
0190-3918
Type
conf
DOI
10.1109/ICPP.2010.51
Filename
5599189
Link To Document