• DocumentCode
    1879479
  • Title

    Detecting abnormal data for ontology based information integration

  • Author

    Yu, Yang ; Heflin, Jeff

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Lehigh Univ., Bethlehem, PA, USA
  • fYear
    2011
  • fDate
    23-27 May 2011
  • Firstpage
    431
  • Lastpage
    438
  • Abstract
    To better support information integration on Semantic Web data with varying degrees of quality, this paper proposes an approach to detect triples which reflect some sort of error. In particular, erroneous triples may occur due to factual errors in the original data source, misuse of the ontology by the original data source, or errors in the integration process. Although diagnosing such errors is a difficult problem, we propose that the degree to which a triple deviates from similar triples can be an important heuristic for identifying errors. We detect such “abnormal triples” by learning probabilistic rules from the reference data and checking to what extent these rules agree with the triples. The system consists of two components for two types of abnormal relational descriptions that a Semantic Web statement could have, whether accidentally or maliciously: a statement could relate two resources that are unlikely to have anything in common or an inappropriate predicate could be used to describe the relation between the two resources. The classification technique is adopted to learn statistical characteristics for detecting a suspect resource pair, i.e. there is no significant relation between the subject and the object in the statement. For the suspect usages of a predicate, the system learns semantic patterns for each predicate from indirect semantic connections between the subject / object pairs.
  • Keywords
    ontologies (artificial intelligence); semantic Web; abnormal data detection; classification technique; indirect semantic connections; ontology based information integration; reference data; semantic Web data; statistical characteristics; subject-object pairs; triples detection; Context; Joining processes; Neodymium; Ontologies; Probabilistic logic; Semantic Web; Semantics; Detecting abnormal data; Ontology based information integration;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Collaboration Technologies and Systems (CTS), 2011 International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    978-1-61284-638-5
  • Type

    conf

  • DOI
    10.1109/CTS.2011.5928721
  • Filename
    5928721