مرکز منطقه ای اطلاع رساني علوم و فناوري - Weighted Set Similarity: Queries and Updates

DocumentCode :

3125061

Title :

Weighted Set Similarity: Queries and Updates

Author :

Srivastava, Divesh

Author_Institution :

AT & T Labs.-Res., Florham Park, NJ

fYear :

2009

fDate :

March 29 2009-April 2 2009

Firstpage :

1559

Lastpage :

1559

Abstract :

Summary form only given. Consider a universe of items, each of which is associated with a weight, and a database consisting of subsets of these items. Given a query set, a weighted set similarity query identifies either (i) all sets in the database whose normalized similarity to the query set is above a pre-specified threshold, or (ii) the sets in the database with the k highest similarity values to the query set. Weighted set similarity queries are useful in applications like data cleaning and integration for finding approximate matches in the presence of typographical mistakes, multiple formatting conventions, transformation errors, etc. We show that this problem has semantic properties that can be exploited to design index structures that support efficient algorithms for answering queries; these algorithms can achieve arbitrarily stronger pruning than the family of Threshold Algorithms. We describe how these index structures can be efficiently updated using lazy propagation in a way that gives strict guarantees on the quality of subsequent query answers. Finally, we illustrate that our proposed ideas work well in practice for real datasets.

Keywords :

database indexing; query processing; set theory; database system; index structure updation; lazy propagation; query answering; query set; semantic property; weighted set similarity; Algorithm design and analysis; Cleaning; Data engineering; Databases; USA Councils;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on

Conference_Location :

Shanghai

ISSN :

1084-4627

Print_ISBN :

978-1-4244-3422-0

Electronic_ISBN :

1084-4627

Type :

conf

DOI :

10.1109/ICDE.2009.179

Filename :

4812572

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3125061