• DocumentCode
    2054697
  • Title

    Retrieving Representative Structures from XML Documents Using Clustering Techniques

  • Author

    Huang, Yin-Fu ; Liou, Po-Lun

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Yunlin Univ. of Sci. & Technol., Douliou, Taiwan
  • fYear
    2011
  • fDate
    12-14 Sept. 2011
  • Firstpage
    332
  • Lastpage
    339
  • Abstract
    In the paper, we addressed the problem of finding the common structures in a collection of XML documents. Since an XML document can be represented as a tree structure, the problem how to cluster a collection of XML documents can be considered as how to cluster a collection of tree-structured documents. First, we used SOM (Self-Organizing Map) with the Jaccard coefficient to cluster XML documents. Then, an efficient sequential mining method called GST was applied to find maximum frequent sequences. Finally, we merged the maximum frequent sequences to produce the common structures in a cluster.
  • Keywords
    XML; information retrieval; pattern clustering; self-organising feature maps; tree data structures; Jaccard coefficient; SOM; XML documents; clustering techniques; representative structure retrieval; selforganizing map; tree structured documents; Clustering algorithms; Clustering methods; Data mining; Electronic mail; Filtering; Merging; XML; XML document; clustering; common structure; sequential pattern mining; tree-structured;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligence and Security Informatics Conference (EISIC), 2011 European
  • Conference_Location
    Athens
  • Print_ISBN
    978-1-4577-1464-1
  • Electronic_ISBN
    978-0-7695-4406-9
  • Type

    conf

  • DOI
    10.1109/EISIC.2011.16
  • Filename
    6061227