Title :
On mining XML integrity constraints
Author :
Fajt, Stanislav ; Mlýnková, Irena ; Ne aský, Martin
Author_Institution :
Dept. of Software Eng., Charles Univ. in Prague, Prague, Czech Republic
Abstract :
Since XML documents can appear in any semi-structured form, structural and integrity constraints are often imposed on the data that are to be modified or processed. These constraints are formally defined in a schema. But, despite the obvious advantages, the presence of a schema is not mandatory and many XML documents are not joined with any. Consequently, no integrity constrains are specified as well. In this paper we focus on extension of approaches for inferring an XML schema from a sample set of XML documents with mining primary and foreign keys. In particular we consider the keys in the context of XSD, i.e. absolute and relative as well as simple and composite keys. We propose a novel approach called KeyMiner and depict its efficiency experimentally using real-world and synthetic data.
Keywords :
XML; constraint handling; data integrity; data mining; document handling; KeyMiner; XML documents; XML integrity constraints mining; XML schema; XSD; Context; Data mining; Data structures; Inference algorithms; Merging; Unified modeling language; XML;
Conference_Titel :
Digital Information Management (ICDIM), 2011 Sixth International Conference on
Conference_Location :
Melbourn, QLD
Print_ISBN :
978-1-4577-1538-9
DOI :
10.1109/ICDIM.2011.6093314