• DocumentCode
    2536042
  • Title

    Distributing a Metric-Space Search Index onto Processors

  • Author

    Marin, Mauricio ; Ferrarotti, Flavio ; Gil-Costa, Veronica

  • Author_Institution
    Yahoo! Res. Latin America, Santiago, Chile
  • fYear
    2010
  • fDate
    13-16 Sept. 2010
  • Firstpage
    433
  • Lastpage
    442
  • Abstract
    This paper studies the problem of distributing a metric-space search index based on compact clustering onto a set of distributed memory processors. The aim is enabling efficient similarity search in large-scale Web search engines. The index data structure is composed of a set of clusters enclosing the database objects and we propose distribution methods based on two different solution approaches. The first one makes use of specific knowledge about the work-load generated by user queries. Here the challenge is how to represent and use such a knowledge into a method capable of producing a cluster distribution leading to high performance. The second one follows a novel direction by completely disregarding user behavior to look instead at the relationships among the index clusters themselves to decide their placement onto processors. Both methods perform efficiently depending on the context and they are generic enough to be applied to different distributed index data structures for metric-space databases.
  • Keywords
    data structures; distributed memory systems; indexing; pattern clustering; query formulation; search engines; cluster distribution; compact clustering; distributed index data structures; distributed memory processor; large scale Web search engine; metric space search index; Clustering algorithms; Data structures; Indexing; Program processors; Search problems; Distributed Search; Metric Space Indexing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2010 39th International Conference on
  • Conference_Location
    San Diego, CA
  • ISSN
    0190-3918
  • Print_ISBN
    978-1-4244-7913-9
  • Electronic_ISBN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2010.51
  • Filename
    5599189