• DocumentCode
    2731547
  • Title

    Propagating Updates in SPIDER

  • Author

    Koudas, N. ; Marathe, A. ; Srivastava, Divesh

  • Author_Institution
    Toronto Univ., Ont., Canada
  • fYear
    2007
  • fDate
    15-20 April 2007
  • Firstpage
    1146
  • Lastpage
    1153
  • Abstract
    SPIDER, developed at AT&T Labs-Research, is a system that efficiently supports flexible string matching against attribute values in large databases, and is extensively used in AT&T. The scoring methodology is based on tf.idf weighting and cosine similarity, and SPIDER maintains indexes containing string tokens and their weights, for fast matching at query time. Given the "global" nature of the weights maintained in the indexes, even a few updates to the underlying database tables would necessitate a (near-complete recomputation of the indexes, which can be prohibitively expensive. In this paper, we explore novel techniques to considerably reduce the cost of propagating updates in SPIDER, without a significant degradation of answer accuracy or query performance. We present experimental evidence using real data sets to demonstrate the practical benefits of our techniques.
  • Keywords
    indexing; query processing; string matching; very large databases; SPIDER; answer accuracy; database tables; indexing; large databases; query performance; string matching; string tokens; Costs; Customer relationship management; Databases; Degradation; Delay; Density estimation robust algorithm; Indexes; Information processing; Pressing; Prototypes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on
  • Conference_Location
    Istanbul
  • Print_ISBN
    1-4244-0802-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2007.368973
  • Filename
    4221763