• DocumentCode
    3334357
  • Title

    Two-phase schema matching in real world relational databases

  • Author

    Bozovic, Nikolaos ; Vassalos, Vasilis

  • Author_Institution
    Dept. of Inf., Athens Univ. of Econ. & Bus., Athens
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    290
  • Lastpage
    296
  • Abstract
    We propose a new approach to the problem of schema matching in relational databases that merges the hybrid and composite approach of combining multiple individual matching techniques. In particular, we propose assigning individual matchers to two categories, "strong" matchers that provide a priori higher quality matches, and "weak" matchers that may be more sensitive to the inputs and are less reliable but can still help generate some matches. Matching is correspondingly done in two phases, with strong "matches" being produced by strong matchers being combined using a simple voting combiner, and weak matchers providing additional evidence for attributes left unmatched (again using a voting combiner). We observe that, while many recent advances in schema matching (Madhavan et al., 2005) use composite schema matching and rely on the existence of training schemas to train combiners, in many real-world situations it is not feasible to employ learning techniques because of the unavailability of training data (i.e., schemas or instance data.) We hypothesize that "weak" matchers can often hurt overall accuracy if used in a "single-phase" composite matcher that does not employ learning techniques. We implement our two-stage approach in the ASED system and evaluate it using real life schemas. The experiments validate our hypothesis regarding the negative effect of "weak" matchers and also show ASID performs comparably to state of the art systems while requiring no training schemas. We also demonstrate the benefits of a simple documentation-based matcher. Our experimental data included schemas ranging from 20 to 120 attributes. Note that schemas with 120 attributes are as large or larger than other published evaluations of relational schema matching.
  • Keywords
    learning (artificial intelligence); pattern matching; relational databases; documentation-based matcher; machine learning technique; relational database; training schema; two-phase relational schema matching; voting combiner; Availability; Database systems; Informatics; Internet; Machine learning; Neural networks; Ontologies; Relational databases; Training data; Voting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering Workshop, 2008. ICDEW 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-2161-9
  • Electronic_ISBN
    978-1-4244-2162-6
  • Type

    conf

  • DOI
    10.1109/ICDEW.2008.4498334
  • Filename
    4498334