• DocumentCode
    2773263
  • Title

    Similarity of XML Schema Fragments Based on XML Data Statistics

  • Author

    Mlynkova, Irena ; Pokorny, Jaroslav

  • Author_Institution
    Charles Univ., Prague
  • fYear
    2007
  • fDate
    18-20 Nov. 2007
  • Firstpage
    243
  • Lastpage
    247
  • Abstract
    As XML has become a standard for data representation, it can be found in plenty of information technologies. A possible optimization of XML-based approaches can be exploitation of similarity of XML data. In this paper we propose a technique for evaluating similarity of XML schema fragments focusing on two often omitted aspects - structural level of similarity and tuning of parameters of the similarity measure. In the former case we exploit the results of statistical analysis of real-world XML data. In the latter case we show that the tuning problem is a kind of constraints optimization problem and can be solved using corresponding approaches. We have analyzed (dis)advantages of two of them, genetic algorithms and simulated annealing, and in further experiments we show that appropriate tuning produces a more precise similarity measure.
  • Keywords
    XML; genetic algorithms; simulated annealing; statistical analysis; XML data statistics; XML schema; constraints optimization problem; data representation; genetic algorithms; simulated annealing; statistical analysis; tuning problem; Algorithm design and analysis; Analytical models; Genetic algorithms; Information analysis; Information technology; Mathematics; Simulated annealing; Statistical analysis; Statistics; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovations in Information Technology, 2007. IIT '07. 4th International Conference on
  • Conference_Location
    Dubai
  • Print_ISBN
    978-1-4244-1840-4
  • Electronic_ISBN
    978-1-4244-1841-1
  • Type

    conf

  • DOI
    10.1109/IIT.2007.4430402
  • Filename
    4430402