• DocumentCode
    2866526
  • Title

    Merging interface schemas on the deep Web via clustering aggregation

  • Author

    Wu, Wensheng ; Doan, AnHai ; Yu, Clement

  • Author_Institution
    Illinois Univ., Urbana, IL, USA
  • fYear
    2005
  • fDate
    27-30 Nov. 2005
  • Abstract
    We consider the problem of integrating a large number of interface schemas over the deep Web, The scale of the problem and the diversity of the sources present serious challenges to the conventional manual or rule-based approaches to schema integration. To address these challenges, we propose a novel formulation of schema integration as an optimization problem, with the objective of maximally satisfying the constraints given by individual schemas. Since the optimization problem can be shown to be NP-complete, we develop a novel approximation algorithm LMax, which builds the unified schema via recursive applications of clustering aggregation. We further extend LMax to handle the irregularities frequently occurring among the interface schemas. Extensive evaluation on real-world data sets shows the effectiveness of our approach.
  • Keywords
    Internet; approximation theory; computational complexity; optimisation; LMax algorithm; NP-complete problem; approximation algorithm; clustering aggregation; deep Web; interface schema; optimization problem; schema integration; Approximation algorithms; Clustering algorithms; Constraint optimization; Databases; Merging; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, Fifth IEEE International Conference on
  • ISSN
    1550-4786
  • Print_ISBN
    0-7695-2278-5
  • Type

    conf

  • DOI
    10.1109/ICDM.2005.92
  • Filename
    1565786