• DocumentCode
    2507673
  • Title

    Index-based approximate XML joins

  • Author

    Guha, Sudipto ; Koudas, Nick ; Srivastava, Divesh ; Yu, Ting

  • Author_Institution
    Pennsylvania Univ., Philadelphia, PA, USA
  • fYear
    2003
  • fDate
    5-8 March 2003
  • Firstpage
    708
  • Lastpage
    710
  • Abstract
    XML data integration tools are facing a variety of challenges for their efficient and effective operation. Among these is the requirement to handle a variety of inconsistencies or mistakes present in the data sets. We study the problem of integrating XML data sources through index assisted join operations, using notions of approximate match in the structure and content of XML documents as the join predicate. We show how a well known and widely deployed index structure, namely the R-tree, can be adopted to improve the performance of such operations. We propose novel search and join algorithms for R-trees adopted to index XML document collections. We also propose novel optimization objectives for R-tree construction, making R-trees better suited for this application.
  • Keywords
    XML; data integrity; database indexing; document handling; query processing; tree searching; R-tree index structure; XML data integration tools; XML documents; index-based approximate XML joins; optimization; tree searching; Algorithm design and analysis; Construction industry; Design optimization; Flexible structures; Indexing; Multidimensional systems; Optimized production technology; Pressing; Relational databases; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2003. Proceedings. 19th International Conference on
  • Print_ISBN
    0-7803-7665-X
  • Type

    conf

  • DOI
    10.1109/ICDE.2003.1260843
  • Filename
    1260843