DocumentCode
140779
Title
Mapping and cleaning
Author
Geerts, F. ; Mecca, Giansalvatore ; Papotti, P. ; Santoro, Diego
Author_Institution
Univ. of Antwerp, Antwerp, Belgium
fYear
2014
fDate
March 31 2014-April 4 2014
Firstpage
232
Lastpage
243
Abstract
We address the challenging and open problem of bringing together two crucial activities in data integration and data quality, i.e., transforming data using schema mappings, and fixing conflicts and inconsistencies using data repairing. This problem is made complex by several factors. First, schema mappings and data repairing have traditionally been considered as separate activities, and research has progressed in a largely independent way in the two fields. Second, the elegant formalizations and the algorithms that have been proposed for both tasks have had mixed fortune in scaling to large databases. In the paper, we introduce a very general notion of a mapping and cleaning scenario that incorporates a wide variety of features, like, for example, user interventions. We develop a new semantics for these scenarios that represents a conservative extension of previous semantics for schema mappings and data repairing. Based on the semantics, we introduce a chase-based algorithm to compute solutions. Appropriate care is devoted to developing a scalable implementation of the chase algorithm. To the best of our knowledge, this is the first general and scalable proposal in this direction.
Keywords
data handling; chase-based algorithm; cleaning scenario; data cleaning; data integration; data quality; data repairing; data transformation; mapping notion; schema mappings; user interventions; Cleaning; Databases; Hospitals; Maintenance engineering; Semantics; Standards;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering (ICDE), 2014 IEEE 30th International Conference on
Conference_Location
Chicago, IL
Type
conf
DOI
10.1109/ICDE.2014.6816654
Filename
6816654
Link To Document