DocumentCode :
2053807
Title :
Semantic Schema Matching without Shared Instances
Author :
Partyka, Jeffrey ; Khan, Latifur ; Thuraisingham, Bhavani
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Dallas, Richardson, TX, USA
fYear :
2009
fDate :
14-16 Sept. 2009
Firstpage :
297
Lastpage :
302
Abstract :
Semantic heterogeneity across data sources remains a widespread and relevant problem requiring innovative solutions. Our approach towards resolving semantic disparities among distinct data sources aligns their constituent tables by first choosing attributes for comparison. We then examine their instances and calculate a similarity value between them known as entropy-based distribution (EBD). One method of calculating EBD applies a state-of-the-art instance matching strategy based on N-grams in the data. However, this method often fails because it relies on shared instance data to determine similarity. This results in an overestimation of semantic similarity between unrelated attributes and an underestimation of semantic similarity between related attributes. Our method resolves this using clustering and a measure known as Normalized Google Distance. The EBD is then calculated among all clusters by treating each as a type. We show the effectiveness of our approach over the traditional N-gram approach across multi-jurisdictional datasets by generating impressive results.
Keywords :
distributed processing; semantic Web; N-gram; Normalized Google Distance; data sources; entropy-based distribution; instance data; instance matching; multijurisdictional datasets; semantic heterogeneity; semantic schema matching; semantic similarity; similarity value; Clustering algorithms; Computer science; Data mining; Displays; Entropy; Relational databases; Testing; Transportation; K-medoid clustering; N-gram; Normalized Google Distance; schema matching; semantic similarity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Semantic Computing, 2009. ICSC '09. IEEE International Conference on
Conference_Location :
Berkeley, CA
Print_ISBN :
978-1-4244-4962-0
Electronic_ISBN :
978-0-7695-3800-6
Type :
conf
DOI :
10.1109/ICSC.2009.64
Filename :
5298637
Link To Document :
بازگشت