• DocumentCode
    473308
  • Title

    COSTA: Adaptive Indexing for Terms in a Large-scale Distributed System

  • Author

    Zhou, Aoying ; Zhang, Rong ; Vu, Quang Hieu ; Qian, Weining

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Fudan Univ., Shanghai
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    1544
  • Lastpage
    1547
  • Abstract
    We introduce COSTA, for content-based search using term aggregation. Besides advantages shared with other P2P-based information retrieval systems, the system has several characteristics that distinguish itself from other systems: First, an adaptive indexing scheme which can dynamically identify important terms is used. Important terms are indexed in a chord-like ring, while other terms are aggregated in a balanced tree. We argue that this architecture is more flexible and suitable for term indexing than DHT-based methods. Furthermore, this structure allows to eliminate the requirement of maintaining global knowledge, and hence we can avoid the difficulty in maintaining such knowledge. Term aggregation is useful not only for performance enhancement, but also for improving the quality of search, by using of the term statistics information obtained via the aggregation. Traditional IR techniques such as query expansion can be utilized based on the information. Therefore, COSTA finely integrates distributed indexing with information retrieval. Advanced techniques, such as node clustering, caching and workload balance, are employed. We show that more existing optimization techniques can be adopted for further improvement of the system´s performance.
  • Keywords
    content-based retrieval; indexing; pattern clustering; peer-to-peer computing; P2P; adaptive indexing; caching; content-based search; distributed indexing; information retrieval system; large-scale distributed system; node clustering; query expansion; term aggregation; term statistics information; workload balance; Delta modulation; Graphical user interfaces; Indexing; Information retrieval; Large-scale systems; Peer to peer computing; Quadratic programming; Samarium; Statistics; Tree data structures;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-1836-7
  • Electronic_ISBN
    978-1-4244-1837-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2008.4497617
  • Filename
    4497617