• DocumentCode
    2117946
  • Title

    Accuracy vs. Speed: Scalable Entity Coreference on the Semantic Web with On-the-Fly Pruning

  • Author

    Dezhao Song ; Heflin, Jeff

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Lehigh Univ., Bethlehem, PA, USA
  • Volume
    1
  • fYear
    2012
  • fDate
    4-7 Dec. 2012
  • Firstpage
    66
  • Lastpage
    73
  • Abstract
    One challenge for the Semantic Web is to scalably establish high quality owl: same As links between co referent ontology instances in different data sources, traditional approaches that exhaustively compare every pair of instances do not scale well to large datasets. In this paper, we propose a pruning-based algorithm for reducing the complexity of entity co reference. First, we discard candidate pairs of instances that are not sufficiently similar to the same pool of other instances. A sigmoid function based thresholding method is proposed to automatically adjust the threshold for such commonality on-the-fly. In our prior work, each instance is associated with a context graph consisting of neighboring RDF nodes. In this paper, we speed up the comparison for a single pair of instances by pruning insignificant context in the graph, this is accomplished by evaluating its potential contribution to the final similarity measure. We evaluate our system on three Semantic Web instance categories. We verify the effectiveness of our thresholding and context pruning methods by comparing to nine state-of-the-art systems. We show that our algorithm frequently outperforms those systems with a runtime speedup factor of 18 to 24 while maintaining competitive F1-scores. For datasets of up to 1 million instances, this translates to as much as 370 hours improvement in runtime.
  • Keywords
    graph theory; ontologies (artificial intelligence); semantic Web; RDF nodes; context graph; context pruning method; pruning based algorithm; referent ontology; scalable entity coreference; semantic Web instance category; sigmoid function based thresholding method; similarity measure; Entity Coreference; Linked Data; Pruning; Scalability; Semantic Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology (WI-IAT), 2012 IEEE/WIC/ACM International Conferences on
  • Conference_Location
    Macau
  • Print_ISBN
    978-1-4673-6057-9
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2012.24
  • Filename
    6511867