Propagating Updates in SPIDER

Author

Koudas, N. ; Marathe, A. ; Srivastava, Divesh

Author_Institution

Toronto Univ., Ont., Canada

fYear

2007

fDate

15-20 April 2007

Firstpage

1146

Lastpage

1153

Abstract

SPIDER, developed at AT&T Labs-Research, is a system that efficiently supports flexible string matching against attribute values in large databases, and is extensively used in AT&T. The scoring methodology is based on tf.idf weighting and cosine similarity, and SPIDER maintains indexes containing string tokens and their weights, for fast matching at query time. Given the "global" nature of the weights maintained in the indexes, even a few updates to the underlying database tables would necessitate a (near-complete recomputation of the indexes, which can be prohibitively expensive. In this paper, we explore novel techniques to considerably reduce the cost of propagating updates in SPIDER, without a significant degradation of answer accuracy or query performance. We present experimental evidence using real data sets to demonstrate the practical benefits of our techniques.

Keywords

indexing; query processing; string matching; very large databases; SPIDER; answer accuracy; database tables; indexing; large databases; query performance; string matching; string tokens; Costs; Customer relationship management; Databases; Degradation; Delay; Density estimation robust algorithm; Indexes; Information processing; Pressing; Prototypes;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on

Conference_Location

Istanbul

Print_ISBN

1-4244-0802-4

Type

conf

DOI

10.1109/ICDE.2007.368973

Filename

4221763