• DocumentCode
    3227970
  • Title

    Developments in Partitioning XML Documents by Content and Structure Based on Combining Multiple Clusterings

  • Author

    Costa, Gianni ; Ortale, Riccardo

  • Author_Institution
    ICAR, Rende, Italy
  • fYear
    2013
  • fDate
    4-6 Nov. 2013
  • Firstpage
    477
  • Lastpage
    482
  • Abstract
    The combination of multiple clusterings for partitioning XML documents is proposed as a promising method, aimed to decompose the inherently difficult problem of catching structural and content relationships within an XML corpus into a number of simpler subproblems. To verify the validity of such an intuition, a new technique for partitioning XML documents is presented, in which conventional clustering techniques operating on flattened representations of individual aspects of the XML documents (that also include some rare patterns) are used to partition the available XML corpus. The effectiveness of the devised technique is revealed by a comparative empirical evaluation on benchmark XML corpora.
  • Keywords
    XML; document handling; pattern clustering; XML corpus; XML document partitioning; clustering technique; document content; document structure; multiple clustering combination; Electronic publishing; Encyclopedias; Internet; Vectors; Vegetation; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on
  • Conference_Location
    Herndon, VA
  • ISSN
    1082-3409
  • Print_ISBN
    978-1-4799-2971-9
  • Type

    conf

  • DOI
    10.1109/ICTAI.2013.77
  • Filename
    6735288