DocumentCode
2846822
Title
Schema matching using duplicates
Author
Bilke, Alexander ; Naumann, Felix
Author_Institution
Technische Univ. Berlin, Germany
fYear
2005
fDate
5-8 April 2005
Firstpage
69
Lastpage
80
Abstract
Most data integration applications require a matching between the schemas of the respective data sets. We show how the existence of duplicates within these data sets can be exploited to automatically identify matching attributes. We describe an algorithm that first discovers duplicates among data sets with unaligned schemas and then uses these duplicates to perform schema matching between schemas with opaque column names. Discovering duplicates among data sets with unaligned schemas is more difficult than in the usual setting, because it is not clear which fields in one object should be compared with which fields in the other. We have developed a new algorithm that efficiently finds the most likely duplicates in such a setting. Now, our schema matching algorithm is able to identify corresponding attributes by comparing data values within those duplicate records. An experimental study on real-world data shows the effectiveness of this approach.
Keywords
data mining; database management systems; data integration application; data mining; duplicate discovery; real-world data sets; schema matching algorithm; unaligned schema; Access protocols; Algorithm design and analysis; Couplings; Data engineering; Data mining; Data models; Database languages; Fuzzy sets; Humans; Object detection;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on
ISSN
1084-4627
Print_ISBN
0-7695-2285-8
Type
conf
DOI
10.1109/ICDE.2005.126
Filename
1410107
Link To Document