Title :
Accuracy vs. Speed: Scalable Entity Coreference on the Semantic Web with On-the-Fly Pruning
Author :
Dezhao Song ; Heflin, Jeff
Author_Institution :
Dept. of Comput. Sci. & Eng., Lehigh Univ., Bethlehem, PA, USA
Abstract :
One challenge for the Semantic Web is to scalably establish high quality owl: same As links between co referent ontology instances in different data sources, traditional approaches that exhaustively compare every pair of instances do not scale well to large datasets. In this paper, we propose a pruning-based algorithm for reducing the complexity of entity co reference. First, we discard candidate pairs of instances that are not sufficiently similar to the same pool of other instances. A sigmoid function based thresholding method is proposed to automatically adjust the threshold for such commonality on-the-fly. In our prior work, each instance is associated with a context graph consisting of neighboring RDF nodes. In this paper, we speed up the comparison for a single pair of instances by pruning insignificant context in the graph, this is accomplished by evaluating its potential contribution to the final similarity measure. We evaluate our system on three Semantic Web instance categories. We verify the effectiveness of our thresholding and context pruning methods by comparing to nine state-of-the-art systems. We show that our algorithm frequently outperforms those systems with a runtime speedup factor of 18 to 24 while maintaining competitive F1-scores. For datasets of up to 1 million instances, this translates to as much as 370 hours improvement in runtime.
Keywords :
graph theory; ontologies (artificial intelligence); semantic Web; RDF nodes; context graph; context pruning method; pruning based algorithm; referent ontology; scalable entity coreference; semantic Web instance category; sigmoid function based thresholding method; similarity measure; Entity Coreference; Linked Data; Pruning; Scalability; Semantic Web;
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2012 IEEE/WIC/ACM International Conferences on
Conference_Location :
Macau
Print_ISBN :
978-1-4673-6057-9
DOI :
10.1109/WI-IAT.2012.24