DocumentCode
2731547
Title
Propagating Updates in SPIDER
Author
Koudas, N. ; Marathe, A. ; Srivastava, Divesh
Author_Institution
Toronto Univ., Ont., Canada
fYear
2007
fDate
15-20 April 2007
Firstpage
1146
Lastpage
1153
Abstract
SPIDER, developed at AT&T Labs-Research, is a system that efficiently supports flexible string matching against attribute values in large databases, and is extensively used in AT&T. The scoring methodology is based on tf.idf weighting and cosine similarity, and SPIDER maintains indexes containing string tokens and their weights, for fast matching at query time. Given the "global" nature of the weights maintained in the indexes, even a few updates to the underlying database tables would necessitate a (near-complete recomputation of the indexes, which can be prohibitively expensive. In this paper, we explore novel techniques to considerably reduce the cost of propagating updates in SPIDER, without a significant degradation of answer accuracy or query performance. We present experimental evidence using real data sets to demonstrate the practical benefits of our techniques.
Keywords
indexing; query processing; string matching; very large databases; SPIDER; answer accuracy; database tables; indexing; large databases; query performance; string matching; string tokens; Costs; Customer relationship management; Databases; Degradation; Delay; Density estimation robust algorithm; Indexes; Information processing; Pressing; Prototypes;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on
Conference_Location
Istanbul
Print_ISBN
1-4244-0802-4
Type
conf
DOI
10.1109/ICDE.2007.368973
Filename
4221763
Link To Document